Alibaba unveils new Qwen model, which claims to outperform GPT-4o


Qwen2.5-VL has agentic capabilities allowing the control of a PC and a smartphone.

Chinese company Alibaba has released a new version of its Qwen models, Qwen2.5-VL, with improved image and video generation.

The models, available in 3 billion, 7 billion, and 72 billion parameters, can analyze images, texts, charts, icons, graphics, and layouts within images.

ADVERTISEMENT

Qwen2.5-VL also includes agent functionality for direct computer and phone use. Such features are also available on Open AI’s ‘operator mode,’ which is available to Pro subscribers at $200 a month.

In its blog post, Qwen says that the updated models have significantly enhanced general image recognition capabilities and can now recognize intellectual properties from TV, film, and a variety of products.

In addition, Qwen2.5-VL can comprehend videos of over 1 hour and can capture events by pinpointing the relevant video segments. The company highlights that Qwen2.5-VL achieves significant advantages in understanding documents and diagrams.

Qwen team’s own tests show that Qwen2.5-VL, with 72 billion parameters, outperforms Gemini 2 flash, GPT-4o, and Claude 3.5 Sonnet in several tasks, including understanding documents, diagrams, and videos.

Qwen-model-comparison
Image by Alibaba.

In the near future, Qwen will be enhanced with reasoning capabilities.

Qwen2.5-VL can be tried out using the Qwen chat app and downloaded via the AI developer platform Hugging Face.

Marcus Walsh profile Gintaras Radauskas Stefanie justinasv
Don’t miss our latest stories on Google News
ADVERTISEMENT

Qwen2.5-VL model, which adheres to regulation rules in China, is biased towards some topics.

When I asked the Qwen chatbot to generate an image of Xi Jinping, I got an error message. However, it did the same with Joe Biden and Donald Trump.

Techcrunch reports that the chatbot displayed an error message after a prompt about Xi Jinping’s mistakes.

Last week, another Chinese company, DeepSeek, released its own model R1, showing that it is possible to create powerful AI with far fewer resources and costly Nvidia chips compared to investments by Western companies.

Deepseek’s R1 topped downloaded charts in the App Store, surpassing OpenAI’s ChatGPT.