Cloudflare lets Mythos loose on live code, says AI is too powerful for public release

Cloudflare’s CISCO has been evaluating Mythos Preview and appears to have come to broadly the same conclusions as Anthropic about the advanced AI model’s capabilities.
The company has a direct stake in the development of advanced cyber-focused AI because it operates internet infrastructure and security services used by millions of websites and enterprises.
Any kind of outage would be keenly felt, as was proven last November when a software crash caused thousands of major websites to go offline.
Detailing his Mythos findings in a blog post published on Cloudflare’s website on Monday, the cloud giant’s tech boss Grant Bourzikas warned that the system is capable of combining several small software flaws into a serious attack with a working exploit.
Bourzikas added that this was something that earlier AI models were not as capable of, and warned that the model may need stronger safety protections before it is released publicly.
However, despite Mythos’ advanced capabilities, he said that human researchers still performed better at picking up longer, more complex investigations.
Cloudflare tested Mythos across 50 internal repositories
Cloudflare says Mythos Preview was tested on more than 50 production repositories, including infrastructure, networking systems, internal platforms, and open-source software.
Bourzikas noted the most significant difference between Mythos and other frontier AI models was its ability to link together low-severity bugs that would otherwise be invisible into a single, more severe exploit.
“Mythos Preview can take several of these primitives and reason about how to combine them into a working proof,” he said.
“The reasoning it shows along the way looks like the work of a senior researcher rather than the output of an automated scanner.”
Bourzikas pointed out that this could be helpful when triaging potential exploits.
“It means fewer hedge findings and less time spent asking, 'Is this even real?’ A finding that arrives with a PoC is a finding you can act on.”
However, he warned that defenders’ time to prepare for AI-generated attacks was shrinking: “Attacker timelines are shortening, but defenders need more than speed,” he said.
Jailbreaking: Mythos guardrails are “inconsistent”
Cloudflare also warned that Mythos’ safety controls were unreliable and could sometimes be bypassed with prompt changes – a practice sometimes referred to as “jailbreaking.”
In one case, the model refused to conduct vulnerability research, then agreed to do the same research on the same code after Cloudflare researchers deleted the hidden .git folder – even though nothing about the underlying code had changed.
The company said the same request could also produce different results across runs due to the model’s probabilistic nature.
According to Bourzikas, these inconsistencies were the reason why Cloudflare has concluded that any future public release would require “additional safeguards” layered on top.
Human researchers are important for deeper investigations
Despite Mythos Review's capabilities, Cloudflare said human security researchers still perform better when it comes to deep investigations across large codebases.
According to the company, human researchers are able to focus on one feature, attack path, or vulnerability class at a time and investigate it thoroughly across large codebases.
Have thoughts about this topic? Others do, too. Join them in the discussion.
“That one thing might be a single complex feature, transitions across security boundaries, or a specific vulnerability class like common injections," the company said.
It added that Mythos worked best as an assistant for researchers who already had a lead, rather than as a fully autonomous security analyst.
Has your password leaked?
Cloudflare's findings come after the limited release of Mythos Preview by Anthropic in April.
The maker of popular enterprise AI Claude claimed that its new security-focused model had autonomously found thousands of high-severity vulnerabilities across every major OS and web browser.
Deeming it too dangerous to release publicly, Anthropic granted access to 40 organizations to use it defensively via Project Glasswing. The project then widened to include other key tech and security firms, including Cloudflare.
Unlock more exclusive Cybernews content on YouTube