Alibaba’s tiny AI model QwQ-32B is a big threat to Silicon Valley’s AI profits


Alibaba Cloud’s Qwen Team has released its open-source QwQ-32B AI model, which is small enough to run on consumer-grade hardware. The Chinese company claims it outperforms the 20 times larger DeepSeek R1 or OpenAI’s o1-mini across some critical benchmarks.

QwQ-32B has only 32 billion parameters, 21 times fewer than the 671 billion parameters of DeepSeek R1, a very capable chatbot released by a Chinese startup that made waves in Silicon Valley.

This means that QwQ-32B can comfortably run on a powerful desktop computer and is available for free.

ADVERTISEMENT

QwQ-32B reasoning model has integrated agent-related capabilities, enabling it to think critically, utilize tools, and adapt its reasoning based on environmental feedback.

Alibaba is confident that its small model outperforms its bigger rivals – it rivals top-tier models like OpenAI’s o1-mini.

The company provided five benchmark results and QwQ-32B leads or is on par in all of them.

“QwQ-32B has achieved a qualitative leap in mathematics, code, and general capabilities, and its overall performance is comparable to DeepSeek-R1,” the company said.

“While maintaining strong performance, QwQ-32B also significantly reduces deployment costs and enables on-premises deployment on consumer graphics. This time, Alibaba Cloud adopts the loose Apache 2.0.”

If Alibaba is to be believed, QwQ-32B scores 73.1 points out of 100 in LiveBench, which tests multiple capabilities such as reasoning, coding, mathematics, data analysis, and others. This score would put the model near the top, behind the latest Claude 3.7 Sonnet Thinking, OpenAI’s o3-mini and o1 models.

QwQ also claims to lead in Berkeley Function-Calling Leaderboard and demonstrates strong performance in Mathematics benchmark AIME24, coding tasks, and instruction following. There aren’t many third-party benchmarks yet.

ADVERTISEMENT

To achieve this, developers relied on scaling Reinforcement Learning (RL), a type of machine learning in which the model learns to make decisions by receiving feedback in the form of rewards or penalties.

“Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models. For instance, DeepSeek R1 has achieved state-of-the-art performance by integrating cold-start data and multi-stage training, enabling deep thinking and complex reasoning,” the Qwen Team explains.

It's unclear if the team used responses from other more powerful AI models for training.

QwQ-32B is already available on multiple platforms, including HuggingFace, Magic Community, Ollama, and elsewhere.

CNN reports that Alibaba’s stock price surged by 8% after the announcement.

Open-source and cheap-to-deploy AI models from China are competing with closed AI models from companies such as OpenAI, Anthropic, or Google. DeepSeek previously forced the market to reduce prices and threatened returns on billions of dollars on investments in AI training.