AI reasoning models are not that capable, actually, Apple says


The race to develop artificial general intelligence (AGI) still has a long way to run. In a new study, Apple researchers say they found that leading AI models still have trouble reasoning and, in fact, collapse completely when faced with increasingly complex problems.

Key takeaways:

In a paper titled “The Illusion of Thinking: Understanding the Strength and Limitations of Reasoning Models via the Lens of Problem Complexity,” Apple says that AI models geared towards reasoning – large reasoning models (LRMs) – had clear gaps in the quality of their reasoning and failed to develop general problem-solving capabilities.

ADVERTISEMENT

They reached the conclusion after testing LRMs such as OpenAI’s O1/o3, DeepSeek-R1, Claude 3.7 Sonnet Thinking, and Gemini Thinking through increasingly complex problems, which also deviated from standard AI testing benchmarks.

Apple actually hits the industry of state-of-the-art LRMs – which are included in the latest large language models and are characterized by their “thinking” mechanisms – pretty hard.

Marcus Walsh profile Paulina Okunyte justinasv Stefanie
Get our latest stories today on Google News

“They still fail to develop generalizable problem-solving capabilities, with accuracy ultimately collapsing to zero beyond certain complexities across different environments,” Apple researchers wrote.

“Frontier LRMs face a complete accuracy collapse beyond certain complexities,” they add before devastatingly pointing out that the models simply mimic reasoning patterns without truly internalizing or generalizing them.

Now, the conclusions laid out in the paper contrast radically with all those expectations – voiced by OpenAI CEO Sam Altman, for instance – that we’ll reach AGI, the holy grail of AI development, within the next few years.

In January, Altman said OpenAI was closer to building AGI than ever before, writing in a blog post: “We are now confident we know how to build AGI as we have traditionally understood it.”

OpenAI acquisition of io
Image by Cybernews
ADVERTISEMENT

In November, Anthropic CEO Dario Amodei said that AGI would exceed human capabilities in the next year or two: “If you just eyeball the rate at which these capabilities are increasing, it does make you think that we’ll get there by 2026 or 2027.”

Now, the expert community is ruthless in reacting to Apple’s paper. To quote Josh Wolfe, a well-respected venture capitalist at Lux Capital, the tech giant has essentially concluded that the LRMs are simply “super expensive pattern matchers that break as soon as we step outside their training distribution.”

According to Gary Marcuys, a US scientist known for his research on AI, even LRM advocates are already conceding the blow.

Marcus himself says on his blog that “what the Apple paper shows, most fundamentally, regardless of how you define AGI, is that LLMs are no substitute for good, well-specified conventional algorithms.”

“Anybody who thinks LLMs are a direct route to the sort of AGI that could fundamentally transform society for the good is kidding themselves. This does not mean that the field of neural networks is dead, or that deep learning is dead. LLMs are just one form of deep learning,” said Marcus.