Alibaba claims its new Qwen AI model with 80B parameters beats Gemini-2.5 Flash

Alibaba has released its latest free artificial intelligence (AI) model. It claims that it’s extremely fast and efficient, very cheap to train, runs well on consumer hardware, and beats Gemini 2.5 Flash across multiple benchmarks.
Alibaba has introduced its “next-generation” foundational models called Qwen3-Next. Its first installment is the 80-billion-parameter Qwen3-Next-80B-A3B model, which comes in two flavors: thinking and instruct (providing direct responses).
The Chinese firm claims this model is comparable to its three-times-larger previous flagship model Qwen3-235B and outperforms “the proprietary model Gemini-2.5-Flash-Thinking across multiple benchmarks.”
Alibaba takes a different approach to improving chatbots. Instead of scaling towards a larger parameter count, it claims to bring a better, more efficient architecture, “optimized for long-context understanding, large parameter scale, and unprecedented computational efficiency.“
The new model is divided into 512 specialized “expert” modules, but only activates 10 of them at a time. This means that the accelerator (GPU) only needs to process 3 billion parameters for the given task, out of the full 80 billion.
Alibaba Cloud said this “achieves an extreme low activation ratio in Mixture-of-Experts (MoE) layers, drastically reducing FLOPs per token while preserving model capacity.”
“Qwen3-Next is optimized for efficient deployment and operation on consumer-grade hardware.”
The chatbot natively supports a context window of 256 thousand tokens, extendable to 1 million tokens.
Despite being larger, Qwen3-Next-80 B delivers a 10 times higher speed when processing long-context prompts than Qwen3-32B.
“During inference, it delivers more than 10x higher throughput than Qwen3-32B when handling context lengths exceeding 32K tokens, achieving supreme efficiency in both training and inference,” the blog post reads.
The model was trained on a 15 trillion-token subset of Qwen3’s 36 trillion-token pre-training dataset.
According to Alibaba’s own testing, the thinking variation of the model achieves 60.8% accuracy in the SuperGPQA benchmark, which measures graduate-level knowledge across 285 disciplines. It allegedly beats Gemini-2.5-Flash Thinking in all five company-chosen benchmarks. Gemini 2.5 Flash is Google’s best model in terms of price and performance, offering well-rounded capabilities.
The new Qwen models are already available for free on major platforms like Hugging Face, Kaggle, or Alibaba Cloud’s ModelScope.
This week, Alibaba also released its “cutting-edge” AI speech transcription tool, Qwen3-ASR-Flash, which delivers “remarkable accuracy and robustness across 11 major languages.” The company claims that it surpasses other leading automatic speech recognition (ASR) models.
Alibaba also released a preview of a massive model with over 1 trillion parameters. Called Qwen3-Max, it currently ranks number six on LMArena.