AI still can’t replace corporate workers, but Gemini Flash 3 performs best


Concerns that artificial intelligence (AI) will soon replace corporate workers seem to be overblown – at least for now.

Watching the growing adoption of generative AI may be a source of great anxiety for white-collar workers, the backbone of the modern economy. However, despite bleak predictions, data suggests that AI is nowhere near replacing them.

A new paper published on the preprint server arXiv by the training-data company Mercor introduces the AI Productivity Index for Agents (APEX–Agents).

ADVERTISEMENT

The benchmark assesses how eight AI agents perform realistic, challenging, and diverse tasks. In the study, the agents collectively performed 480 tasks using all the data and software that a human would use.

jurgita justinasv Izabelė Pukėnaitė vilius Ernestas Naprys Eglė Kristopaityte
Don't miss our latest stories on Google News

Unlike similar studies, the tasks were created by real experts – investment banking analysts, management consultants, and corporate lawyers.

Gemini 3 Flash performed best, yet scored only 24% accuracy, followed by GPT-5.2 at 23%.

Claude Opus 4.5 and Gemini 3 Pro both were 18.4% accurate, closely overtaking Grok 4 at 15.2%.

The poorest performance was demonstrated by two open-source models, GPT-OSS-120B and Kimi K2 Thinking, which achieved 4.7% and 4% accuracy, respectively.

Overall, the models were best at performing management consulting tasks, followed by investment banking analysts' tasks. Corporate lawyers seem least likely to be replaced, as agents completed their tasks with the lowest accuracy.

In the study, the agents failed in various ways – from running out of steps to failing to meet any criteria. This led to the conclusion that while they’re capable of performing complex professional services work, they often do so inconsistently.

ADVERTISEMENT

The new paper illustrates what AI critics have long been saying: the technology isn’t yet capable of fully replacing human employees. Sometimes, it can make their work even more complicated.

​A recent survey suggests that employees who use AI the most later spend a significant amount of time redoing its work. For every 10 hours saved by AI, it takes nearly four hours to fix its mistakes.

AI job replacement was also a hot topic at the World Economic Forum in Davos, where industry leaders portrayed a grim future for knowledge workers.

For instance, Alex Karp, the CEO of Palantir, the controversial data broker, emphasized that those with vocational training will have “more than enough jobs,” sending a warning to those in the humanities.


Unlock more exclusive Cybernews content on YouTube.