AI reasoning models suffer from “accuracy collapse”

The race to develop artificial general intelligence (AGI) still has a long way to run. In a new study, Apple researchers say they found that leading AI models still have trouble reasoning and, in fact, collapse completely when faced with increasingly complex problems.

Key takeaways:

Apple researchers say they found that leading AI models still have trouble reasoning and, in fact, collapse completely when faced with increasingly complex problems.
They reached the conclusion after testing LRMs such as OpenAI’s O1/o3, DeepSeek-R1, Claude 3.7 Sonnet Thinking, and Gemini Thinking.
The conclusions contrast radically with the expectations – voiced by OpenAI CEO Sam Altman, for instance – that we’ll reach AGI within the next few years.

In a paper titled “The Illusion of Thinking: Understanding the Strength and Limitations of Reasoning Models via the Lens of Problem Complexity,” Apple says that AI models geared towards reasoning – large reasoning models (LRMs) – had clear gaps in the quality of their reasoning and failed to develop general problem-solving capabilities.

They reached the conclusion after testing LRMs such as OpenAI’s O1/o3, DeepSeek-R1, Claude 3.7 Sonnet Thinking, and Gemini Thinking through increasingly complex problems, which also deviated from standard AI testing benchmarks.

Apple actually hits the industry of state-of-the-art LRMs – which are included in the latest large language models and are characterized by their “thinking” mechanisms – pretty hard.

Get our latest stories today on Google News

Add us as your Preferred Source on Google.

“They still fail to develop generalizable problem-solving capabilities, with accuracy ultimately collapsing to zero beyond certain complexities across different environments,” Apple researchers wrote.

“Frontier LRMs face a complete accuracy collapse beyond certain complexities,” they add before devastatingly pointing out that the models simply mimic reasoning patterns without truly internalizing or generalizing them.

Now, the conclusions laid out in the paper contrast radically with all those expectations – voiced by OpenAI CEO Sam Altman, for instance – that we’ll reach AGI, the holy grail of AI development, within the next few years.

In January, Altman said OpenAI was closer to building AGI than ever before, writing in a blog post: “We are now confident we know how to build AGI as we have traditionally understood it.”

OpenAI acquisition of io — Image by Cybernews

In November, Anthropic CEO Dario Amodei said that AGI would exceed human capabilities in the next year or two: “If you just eyeball the rate at which these capabilities are increasing, it does make you think that we’ll get there by 2026 or 2027.”

Now, the expert community is ruthless in reacting to Apple’s paper. To quote Josh Wolfe, a well-respected venture capitalist at Lux Capital, the tech giant has essentially concluded that the LRMs are simply “super expensive pattern matchers that break as soon as we step outside their training distribution.”

6/ Apple's take is these models ARE NOT reasoning.

they're super expensive pattern matchers that break as soon as we step outside their training distribution...
undefined Josh Wolfe (@wolfejosh) June 7, 2025

According to Gary Marcuys, a US scientist known for his research on AI, even LRM advocates are already conceding the blow.

I think the Apple paper on the limits of reasoning models in particular tests is useful & important, but the “LLMs are hitting a wall” narrative on X around it feels premature at best. Reminds me of the buzz over model collapse - limitations that were overcome quickly in practice
undefined Ethan Mollick (@emollick) June 7, 2025

Marcus himself says on his blog that “what the Apple paper shows, most fundamentally, regardless of how you define AGI, is that LLMs are no substitute for good, well-specified conventional algorithms.”

“Anybody who thinks LLMs are a direct route to the sort of AGI that could fundamentally transform society for the good is kidding themselves. This does not mean that the field of neural networks is dead, or that deep learning is dead. LLMs are just one form of deep learning,” said Marcus.

AI reasoning models are not that capable, actually, Apple says

More from Cybernews