Cloudflare lets Mythos loose on live code, says AI is too powerful for public release


Cloudflare’s CISCO has been evaluating Mythos Preview and appears to have come to broadly the same conclusions as Anthropic about the advanced AI model’s capabilities.

The company has a direct stake in the development of advanced cyber-focused AI because it operates internet infrastructure and security services used by millions of websites and enterprises.

jurgita justinasv Izabelė Pukėnaitė vilius Ernestas Naprys Gintaras Radauskas
Don't miss our latest stories on Google News
ADVERTISEMENT

Any kind of outage would be keenly felt, as was proven last November when a software crash caused thousands of major websites to go offline.

Cloudflare outage
Cloudflare's November outage demonstrates how keenly felt an attack on its supply chain would be. Smith Collection/Gado/Getty Images

Detailing his Mythos findings in a blog post published on Cloudflare’s website on Monday, the cloud giant’s tech boss Grant Bourzikas warned that the system is capable of combining several small software flaws into a serious attack with a working exploit.

Bourzikas added that this was something that earlier AI models were not as capable of, and warned that the model may need stronger safety protections before it is released publicly.

However, despite Mythos’ advanced capabilities, he said that human researchers still performed better at picking up longer, more complex investigations.

Cloudflare tested Mythos across 50 internal repositories

Cloudflare says Mythos Preview was tested on more than 50 production repositories, including infrastructure, networking systems, internal platforms, and open-source software.

Bourzikas noted the most significant difference between Mythos and other frontier AI models was its ability to link together low-severity bugs that would otherwise be invisible into a single, more severe exploit.

ADVERTISEMENT

“Mythos Preview can take several of these primitives and reason about how to combine them into a working proof,” he said.

“The reasoning it shows along the way looks like the work of a senior researcher rather than the output of an automated scanner.”

Bourzikas pointed out that this could be helpful when triaging potential exploits.

“It means fewer hedge findings and less time spent asking, 'Is this even real?’ A finding that arrives with a PoC is a finding you can act on.”

However, he warned that defenders’ time to prepare for AI-generated attacks was shrinking: “Attacker timelines are shortening, but defenders need more than speed,” he said.

Jailbreaking: Mythos guardrails are “inconsistent”

Cloudflare also warned that Mythos’ safety controls were unreliable and could sometimes be bypassed with prompt changes – a practice sometimes referred to as “jailbreaking.”

In one case, the model refused to conduct vulnerability research, then agreed to do the same research on the same code after Cloudflare researchers deleted the hidden .git folder – even though nothing about the underlying code had changed.

ai-safeguards-hack
More AI safeguards needed before Mythos released publicly, Cloudflare warns. Image by Cybernews.
ADVERTISEMENT

The company said the same request could also produce different results across runs due to the model’s probabilistic nature.

According to Bourzikas, these inconsistencies were the reason why Cloudflare has concluded that any future public release would require “additional safeguards” layered on top.

Human researchers are important for deeper investigations

Despite Mythos Review's capabilities, Cloudflare said human security researchers still perform better when it comes to deep investigations across large codebases.

According to the company, human researchers are able to focus on one feature, attack path, or vulnerability class at a time and investigate it thoroughly across large codebases.

Have thoughts about this topic? Others do, too. Join them in the discussion.

“That one thing might be a single complex feature, transitions across security boundaries, or a specific vulnerability class like common injections," the company said.

It added that Mythos worked best as an assistant for researchers who already had a lead, rather than as a fully autonomous security analyst.

Has your password leaked?

Enter your password to check if it has leaked. Having a leaked password creates the risk of identity theft, financial damages, and worse!
35,607,543,468
Exposed Passwords
Ad
Protect your personal information from cybercriminals and get 50% off the top-rated password manager
link_title link_title
ADVERTISEMENT

Cloudflare's findings come after the limited release of Mythos Preview by Anthropic in April.

Anthropic Mythos Preview
Anthropic released Mythos Preview to a select few in April, deeming its model "too dangerous" for public use. Image by Koshiro K | Shutterstock

The maker of popular enterprise AI Claude claimed that its new security-focused model had autonomously found thousands of high-severity vulnerabilities across every major OS and web browser.

Deeming it too dangerous to release publicly, Anthropic granted access to 40 organizations to use it defensively via Project Glasswing. The project then widened to include other key tech and security firms, including Cloudflare.

data leak research ad

Unlock more exclusive Cybernews content on YouTube