Cloudflare warns Mythos too powerful for public release

Cloudflare’s CISCO has been evaluating Mythos Preview and appears to have come to broadly the same conclusions as Anthropic about the advanced AI model’s capabilities.

The company has a direct stake in the development of advanced cyber-focused AI because it operates internet infrastructure and security services used by millions of websites and enterprises.

Any kind of outage would be keenly felt, as was proven last November when a software crash caused thousands of major websites to go offline.

Cloudflare outage — Cloudflare's November outage demonstrates how keenly felt an attack on its supply chain would be. Smith Collection/Gado/Getty Images

Detailing his Mythos findings in a blog post published on Cloudflare’s website on Monday, the cloud giant’s tech boss Grant Bourzikas warned that the system is capable of combining several small software flaws into a serious attack with a working exploit.

Bourzikas added that this was something that earlier AI models were not as capable of, and warned that the model may need stronger safety protections before it is released publicly.

However, despite Mythos’ advanced capabilities, he said that human researchers still performed better at picking up longer, more complex investigations.

Cloudflare tested Mythos across 50 internal repositories

Cloudflare says Mythos Preview was tested on more than 50 production repositories, including infrastructure, networking systems, internal platforms, and open-source software.

Bourzikas noted the most significant difference between Mythos and other frontier AI models was its ability to link together low-severity bugs that would otherwise be invisible into a single, more severe exploit.

“Mythos Preview can take several of these primitives and reason about how to combine them into a working proof,” he said.

“The reasoning it shows along the way looks like the work of a senior researcher rather than the output of an automated scanner.”

Bourzikas pointed out that this could be helpful when triaging potential exploits.

“It means fewer hedge findings and less time spent asking, 'Is this even real?’ A finding that arrives with a PoC is a finding you can act on.”

However, he warned that defenders’ time to prepare for AI-generated attacks was shrinking: “Attacker timelines are shortening, but defenders need more than speed,” he said.

Jailbreaking: Mythos guardrails are “inconsistent”

Cloudflare also warned that Mythos’ safety controls were unreliable and could sometimes be bypassed with prompt changes – a practice sometimes referred to as “jailbreaking.”

In one case, the model refused to conduct vulnerability research, then agreed to do the same research on the same code after Cloudflare researchers deleted the hidden .git folder – even though nothing about the underlying code had changed.

ai-safeguards-hack — More AI safeguards needed before Mythos released publicly, Cloudflare warns. Image by Cybernews.

The company said the same request could also produce different results across runs due to the model’s probabilistic nature.

According to Bourzikas, these inconsistencies were the reason why Cloudflare has concluded that any future public release would require “additional safeguards” layered on top.

Human researchers are important for deeper investigations

Despite Mythos Review's capabilities, Cloudflare said human security researchers still perform better when it comes to deep investigations across large codebases.

According to the company, human researchers are able to focus on one feature, attack path, or vulnerability class at a time and investigate it thoroughly across large codebases.

Have thoughts about this topic? Others do, too. Join them in the discussion.

“That one thing might be a single complex feature, transitions across security boundaries, or a specific vulnerability class like common injections," the company said.

It added that Mythos worked best as an assistant for researchers who already had a lead, rather than as a fully autonomous security analyst.

Cloudflare's findings come after the limited release of Mythos Preview by Anthropic in April.

Anthropic Mythos Preview — Anthropic released Mythos Preview to a select few in April, deeming its model "too dangerous" for public use. Image by Koshiro K | Shutterstock

The maker of popular enterprise AI Claude claimed that its new security-focused model had autonomously found thousands of high-severity vulnerabilities across every major OS and web browser.

Deeming it too dangerous to release publicly, Anthropic granted access to 40 organizations to use it defensively via Project Glasswing. The project then widened to include other key tech and security firms, including Cloudflare.

Unlock more exclusive Cybernews content on YouTube

Cloudflare lets Mythos loose on live code, says AI is too powerful for public release

Cloudflare tested Mythos across 50 internal repositories

Jailbreaking: Mythos guardrails are “inconsistent”

Human researchers are important for deeper investigations

Has your password leaked?

Cloudflare lets Mythos loose on live code, says AI is too powerful for public release

More from Cybernews

Cloudflare tested Mythos across 50 internal repositories

Jailbreaking: Mythos guardrails are “inconsistent”

Human researchers are important for deeper investigations

Has your password leaked?