Kimi K2 Thinking AI model closes on GPT-5

Chinese AI startup Moonshot has claimed its model sets “new records across benchmarks that assess reasoning, coding, and agent capabilities.” If true, this would mark the first time an open-source AI model reaches parity or even surpasses the best proprietary offerings from OpenAI, Google, or Anthropic.

China may be taking the lead in the AI race, with open-weight models leapfrogging proprietary ones. A new disruptor, Kimi K2 Thinking, surpasses GPT-5, Gemini 2.5, Claude Sonnet 4.5, and others in select major benchmarks, while also being cheaper to run.

The new model is the reasoning variant of the previous model Kimi K2 model. The behemoth has a total of 1 trillion parameters. However, Kimi K2 Thinking uses a Mixture-of-Experts architecture and only activates 32 billion parameters at a time. This makes the model less resource-intensive.

Some independent testing has already placed the model at the top of select benchmarks. Kimi K2 achieved the number one spot in the Tau2 Bench Telecom agentic benchmark, which measures how well AI acts as a customer service agent and uses tools.

“This is the highest score we have independently measured. Tool use in long-horizon agentic contexts was a strength of Kimi K2 Instruct, and it appears this new Thinking variant makes substantial gains,” Artificial Analysis said in a post.

The firm acknowledges that the model is likely currently the new leading open weights model. However, Artificial Analysis hasn’t yet updated the overall leaderboard, which incorporates 10 evaluations and is currently dominated by the GPT-5 variants from OpenAI.

The non-thinking Kimi K2 variant previously scored 48%, which is significantly lower than GPT-5’s top result of 68%.

Moonshot AI also claims that the reasoning model achieves new records on Humanity’s Last Exam benchmark, a frontier-level “Google-proof” benchmark comprising 2,500 expert-vetted questions across mathematics, sciences, and humanities, designed to be the final closed-ended academic evaluation. The model achieved the highest score while using a diverse set of tools.

Other benchmarks where the model supposedly beats GPT-5, Claude Sonnet 4.5, and Grok-4s include BrowseComp, Seal-0, IMO-AnswerBench, Frames, and SciCode.

In many other tests, the new AI model is at parity with major proprietary competitors. In GPQA-Diamond, another challenging scientific expertise benchmark, Kimi K2 Thinking scores 84.5%, a bit behind GPT-5 and Grok 4.

“Kimi K2 Thinking can execute up to 200-300 sequential tool calls without human interference, reasoning coherently across hundreds of steps to solve complex problems,” the startup said in an announcement.

Cheap inference

The model has been trained using low-bit quantization, which, combined with the MoE architecture, makes it very cost-effective to run.

Despite the 1 trillion parameter size, enthusiasts on X have already shared that they are able to run the Kimi K2 Thinking on two maxed-out Mac Studio (M3 Ultra) systems, generating 3,500 tokens at a 15-token-per-second speed.

Moonshot AI said that to overcome performance drops due to low quantization, it adopts Quantization-Aware Training (QAT) during the post-training phase, applying INT4 weight-only quantization to the MoE components.

“It allows K2 Thinking to support native INT4 inference with a roughly 2x generation speed improvement while achieving state-of-the-art performance. All benchmark results are reported under INT4 precision,” the firm said.

Priced at $2.50 per million output tokens, it is four times cheaper compared to GPT-5, which costs $10 per million output tokens.

The new model has generated considerable enthusiasm on social media. Hacker News thread participants point out that cutting-edge performance still depends on huge compute and cost, and many expect smaller but capable models.

“The ultimate competition between models will eventually become a competition over energy. China’s open-source models have major advantages in energy consumption, and China itself has a huge advantage in energy resources. They may not necessarily outperform the US, but they probably won’t fall too far behind either,” one user noted.

Don't miss our latest stories on Google News. Add us as your Preferred Source on Google

Add us as your Preferred Source on Google.

While US tech firms pour massive capital into AI infrastructure to secure AI leadership and reshape the entire US economy, the rise of cheap-to-run open-source models from China could undercut that advantage and put investments at significant risk.

Enterprises and developers are no longer locked to proprietary APIs, and access to AI no longer requires huge budgets.

The new Kimi K2 Thinking model can be accessed on Moonshot’s platform, kimi.com. The code is also available on Hugging Face.

However, Google is expected to release the third version of Gemini Pro 3.0, which might once again tip the scales. Google CEO Sundar Pichai, during the Q3 2025 earnings call, confirmed the model will be released “later this year.”

Some developers claim that the preview version is already available for testers (preview-11-2025) on the tech giant’s AI platform Vertex AI.

Unlock more exclusive Cybernews content on YouTube.

China’s great AI leap forward: Kimi K2 Thinking claims to surpass Western rivals

More from Cybernews

Cheap inference