
A security researcher has criticized AI firm Anthropic after catching a second major bypass in Claude’s network sandbox, claiming the company quietly fixed the issue without properly informing users.
While Anthropic claims that its bug hunter AI Mythos is capable of uncovering hundreds of previously undiscovered vulnerabilities, the self-styled moral steward of frontier LLMs is somewhat shyer in admitting flaws in its own systems.
This was demonstrated recently by Aonan Guan, who leads cloud and AI security at Wyze Labs, who reveals how he discovered a major bypass in Claude Code’s network sandbox (the security cage around AI-generated code).
His discovery is the second sandbox flaw that Guan has reported over a six month period.
Both times, Guan adds, the bypasses were fixed without disclosure by Anthropic with nothing flagged to users by the way of any public messaging.
How the bypass works
The report claims the vulnerability worked by exploiting differences in how Claude Code’s sandbox and the operating system interpreted host names.
According to Guan, the flaw would have allowed code running inside the sandbox to bypass outbound network restrictions and connect to an attacker-controlled server.
This could potentially have allowed criminals to exfiltrate information from inside the sandbox, including credentials, source code and internal data.
He adds that the bypass is most dangerous when paired with prompt injection that he has previously detailed in an earlier report, Comment and Control.
“A hidden instruction in a GitHub issue comment, a README, or a doc page Claude Code reads is enough to get it to run attacker-controlled code in-sandbox,” he adds.
This would allow crooks to snaffle cloud and GitHub credentials, the GitHub token Claude authenticated with, cloud metadata and internal APIs.
The report says that the vulnerability affected every Claude Code released from 2.0.24 through 2.1.89.
Painful disclosure experience
Guan disclosed the bypass to Anthropic via bug bounty platform HackerOne on April 3. [HackerOne: Report #3646509 (Anthropic VDP)]
In its next day-reply the AI maker said that Anthropic had already caught and patched the flaw and so closed it as a “duplicate of an internal finding”.
When the researcher asked about a public disclosure, the firm said it had “not yet decided” whether a CVE would be published for this issue and could not share a timeline on that decision either.
Guan argues that this created a situation where organizations may not have realised they were running a vulnerable version of Claude Code.
While fix exists in committed code. “the acknowledgement does not exist anywhere a user could find it.”
To date, Guan says, there has been, “No advisory. No CVE. No Changelog note” as well as no user outreach by way of emails, posts or warnings.
A history of silence
The report also links the latest issue with a previous Claude Sandbox vulnerability (CVE-2025-66479) that he discovered, which Guan says was similarly fixed without a formal advisory.
At a time when many organizations are experimenting with AI, frontier AI companies should be held to high transparency standards, Guan argues, otherwise people are led into a false sense of safety.
"Shipping a sandbox with a hole is worse than not shipping one.” he says.
Aonan Guan, cloud and AI security lead, Wyze Labs.
“The user with no sandbox knows they have no boundary. The user with a broken sandbox thinks they do.”
It’s not the first time that Guan has taken issue with the way AI vendors handle vulnerabilities.
Strong password generator
In an earlier piece of work Aonan Guan working with researchers at John Hopkins hijacked three popular AI agents that integrate with GitHub.
While he received modest bug bounties from Anthropic, Google, and Microsoft, none of the vendors assigned CVEs or published public advisories, a practice that means the burden of securing AI agents and other systems gets pushed to the end users.
Your email address will not be published. Required fields are markedmarked