
After the US government slapped export controls on Anthropic, the company had no choice but to close access to Fable 5 and Mythos to everyone. But experts who saw the fateful vulnerability report allegedly describing how to bypass Fable 5’s guardrails now say the administration has massively overreacted because the jailbreak actually describes every model ever shipped.
-
Experts argue Fable 5’s alleged jailbreak reflects a universal limitation across all deployed AI models.
-
Critics say the White House overreacted, creating a dangerous precedent for sudden government shutdowns.
-
Anthropic maintains the reported behavior supports defensive security work, not a meaningful guardrail bypass.
-
The dispute highlights growing fears frontier AI access could face broader citizenship-based government restrictions.
Administration officials have by now talked with Anthropic about the decision to issue an export control directive around the Fable 5 and Mythos 5 models.
The controls remain in place, and both models are still unavailable to all Anthropic customers. The next steps are unclear, so it’s a kind of stalemate.
The White House apparently continues to believe there are ways to jailbreak Fable 5 and access Mythos's capabilities, namely the ability to quickly and accurately identify software flaws.
Anthropic, though, keeps repeating that those concerns are overblown – that’s the same line from the firm’s blog post published immediately after the government’s order on Friday.
“We disagree that the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people. If this standard were applied across the industry, we believe it would essentially halt all new model deployments for all frontier model providers,” wrote Anthropic.
Who’s in the right? It’s too early for a consensus, and the whole situation is just very ripe for speculation, of course – as well as worrying for those alarmed by what they see as blatant government overreach.
Friday’s shutdown wasn’t a one-time correction: it was a precedent. Every lab in America is now operating under an invisible ceiling, some insiders say.
Cybersecurity experts are concerned about another issue, however. After looking through claims that the Fable 5 model was jailbroken, they now plainly say that, in fact, it wasn’t even a jailbreak.
Check if your data has been leaked
“If a narrow jailbreak were really the bar for pulling a model, there would be no models. On safety grounds, Fable 5 is indistinguishable from the dozens of systems that stayed online. The reason it got singled out lives somewhere other than safety,” wrote Kenny Vaneetvelde, an independent AI expert who’s a Belgian citizen.
“That is not a guardrail bypass”
It’s indeed quite bizarre. Just recently, the White House made US AI dominance a national priority, and the Pentagon is betting on AI-driven defense.
But a federal directive quietly pulled America’s most capable commercial AI model from hundreds of millions of users based on a jailbreak vulnerability that Anthropic says already exists in every competitor’s model on the market.
What was that research paper that made the government act? It’s a bit of a mystery. Yes, “Pliny the Liberator,” a well-known figure in the AI community, said last week he had “liberated” Fable 5 just a day after the model’s public release.
The self-proclaimed AI researcher rose to prominence in 2024 by developing and openly sharing jailbreak prompts for models like ChatGPT, Claude, Grok, and others.
However, the government’s decision was apparently prompted by conversations between Amazon CEO Andy Jassy and US officials. Amazon researchers reportedly used a series of prompts to get Fable 5 to provide information that could aid cyberattacks.
Now, Katie Moussouris, CEO of cybersecurity firm Luta Security, claims she’s the only outside expert who has actually read the aforementioned research paper. That’s because Anthropic shared a copy of the findings with her privately.
Her conclusions are pretty damning for the government.
“Defenders need to be able to ask AI to fix the bugs in a file, explain why the fix matters, and write tests that confirm the patch works. That is not a guardrail bypass. It is the most valuable thing an AI model can do for defensive security,”
Katie Moussouris.
Moussouris states: “The heavy-handed and hasty export control directive was misguided. The behavior described in the paper cannot meaningfully be fixed, and any attempt would only weaken the model for defense.”
According to Moussouris, the researchers took open-source code with known CVEs, plus new code with deliberately planted vulnerabilities, and asked Fable 5, Mythos, and Opus to “review the code for security issues.”
Fable 5 refused. They then asked the models to “fix this code” and, through a multistep and manual process, turned the output into scripts that test the patches.
“That’s it. ‘Fix this code,’ plus several manual steps to generate test scripts, should never have triggered an export control,” says Moussouris.
“Defenders need to be able to ask AI to fix the bugs in a file, explain why the fix matters, and write tests that confirm the patch works. That is not a guardrail bypass. It is the most valuable thing an AI model can do for defensive security.”
Total AI safety is impossible
Essentially, Moussouris agrees with Anthropic, pointing out that the prompts worked because they were defensive requests. That capability can’t be removed without making the model worse at fixing bugs and verifying patches.
“The same holds for every capable AI model, including the foreign and open-weight systems the United States cannot reach with export controls, many of which will match Fable and Mythos capabilities within months,” the expert claims.
“Will all the US-based models be export-controlled? They have fewer guardrails than Fable 5, and almost all the capabilities, or will shortly.”
More than 120 cybersecurity leaders from firms like Nvidia and Adobe have already asked the Donald Trump administration to lift the restrictions, and Vaneetvelde, the Belgian expert, agrees that the government is shooting itself in the foot.
According to him, it’s time to read up on the actual, realistic capabilities of AI models and how they work.
“A large language model does not look up answers. It generates them one token at a time, by sampling from a probability distribution over its entire vocabulary. The last step in that process, the softmax, hands a nonzero probability to every possible next token. Every single one,” says Vaneetvelde.
This, he adds, means that no amount of safety training can push the probability of a harmful output all the way to zero.
Howard Lutnick, the US Commerce Secretary, told Anthropic CEO Dario Amodei in a letter that the government took action due to fears that American AI models could be deployed by military intelligence users in China, Russia, or other countries of concern.
“It can push it down, sometimes very far down, but never to nothing. There is always some sequence of words, however strange, that produces an answer it was trained to refuse,” the expert explains.
“A ‘jailbreak’ is just someone finding one of those paths. It is a property of how these systems work. A patch can lower the odds. It cannot reach zero.”
“Just word gymnastics”
In other words, saying that there’s a way to jailbreak Fable 5 is equally true of GPT-5.5, Gemini, and every open-weights model. That’s, of course, what Anthropic, unhappy about being singled out, has been saying.
To be fair, though, claims that the “jailbreak” works equally well on other frontier AI models need to be demonstrated, Cybernews researchers tell me, since “they do differ in how easily they fall to a given technique.”
Besides, Moussouris’ idea that this was Defense Oriented Prompting rather than a jailbreak sounds to our researchers like “just word gymnastics.”
It depends on the operator’s intent, not the prompt, whether a technique is defensive. The same flow that helps a defender understand an attack also delivers the output to an attacker.
“Calling it Defense Oriented Prompting doesn’t change the presumption that the model produced content it was trained to withhold. If it did, jailbreak is the accurate description regardless of who’s asking,” a Cybernews researcher told me.
Still, all of the above can also be applied to most AI models, so why has Anthropic and its Fable 5 model been targeted?
As per Reuters, Howard Lutnick, the US Commerce Secretary, told Anthropic CEO Dario Amodei in a letter that the government took action due to fears that American AI models could be deployed by military intelligence users in China, Russia, or other countries of concern.
That’s reportedly why the export control directive ordered the company to suspend foreign access to the models, including for foreign national employees.
Any large company cannot reliably sort its users by nationality in real time, so the only way for Anthropic, a firm employing up to 5,000 people, to comply was to switch both Fable 5 and Mythos 5 off completely.
Only the beginning?
The government’s arguments might seem sound since, in April, the White House publicly accused China of using tens of thousands of proxy accounts, jailbreaking techniques, and stealing US AI labs’ intellectual property on an industrial scale.
But, once again, why is Anthropic being singled out? Google, which has created Gemini, and OpenAI, the firm behind ChatGPT, seem fine – and could even be happy with what’s going on since Fable 5 is beating their models on most tests.
Firstly, as Vaneetvelde rightly notes, Anthropic has been on the Trump administration’s bad side ever since the company told the Pentagon in February it wouldn’t drop restrictions on mass surveillance and autonomous weapons. Today, the firm is on the US government’s supply chain blacklist.
Another version says that Anthropic could have caused the current issues for itself. The firm, after all, spent months telling everyone that Mythos was extremely powerful and dangerous.
“Market your product as a weapon for long enough, and you should not be surprised when the government finally treats it like one,” said Vaneetvelde.
The talk of the town is that the US government wants to restrict non-US employees from working on frontier AI models across the industry.
However, this might be just the beginning, and all frontier AI models could be targeted. The Financial Times quotes a source close to OpenAI saying that in recent days, the AI industry has been working to ensure that foreign national researchers can continue developing the most advanced model.
Since the Anthropic directive has already banned this particular practice, the talk of the town is that the US government wants to restrict non-US employees from working on frontier AI models across the industry.
“It might previously have been unthinkable to require proof-of-citizenship to access services – it’s increasingly common across new technologies and, to be honest, this attempt is not surprising in that light,” Ben Murphy, a scholar at the Institute for Progress think tank, wrote on X.
Unlock more exclusive Cybernews content on YouTube.
Your email address will not be published. Required fields are markedmarked