Alibaba releases Qwen Omni, a model that can process images and videos on a phone

Cloud computing company Alibaba Cloud, a subsidiary of the Chinese conglomerate Alibaba Group, has released Qwen2.5-Omni-7B, a new model for AI-agent development.

The open-source Qwen Omni model is multimodal, meaning it accepts various inputs including text, images, audio, and video, and can reportedly deliver real-time responses.

The model is focused on those who want to develop cost-effective AI agents, especially intelligent voice applications.

For example, it could enable developers to offer step-by-step cooking guidance by analyzing video ingredients, or power customer service dialogue.

To achieve real-time responses, Alibaba uses new Talker Architecture, which separates text generation and speech synthesis to minimize interference among different modalities for high-quality output.

Since Qwen Omni has only seven billion parameters, it can be run on smartphones as well as laptops.

Alibaba claims that Omni performs well compared to similarly sized single modality models, including Gemini-1.5-pro and its own Qwen2.5-VL-7B and Qwen2-Audio.

Stay informed and get our latest stories on Google News

Add us as your Preferred Source on Google.

In OmniBench, a benchmark that assesses models’ ability to recognize, interpret, and reason across visual, acoustic, and textual inputs, Omni scores 56.1. For comparison, Google’s Gemini 1.5 Pro scores 42.5.

Meet Qwen2.5-Omni-7B, the unified end-to-end multimodal AI model! With 7B parameters, it handles text, images, audio, and video seamlessly, delivering real-time, natural responses right on your phone or laptop. 💻📱

The #Qwen team conducted a comprehensive evaluation of… pic.twitter.com/QxGdLd5Fea
undefined Alibaba Group (@AlibabaGroup) March 27, 2025

On Wednesday, Google also unveiled its Gemini 2.5 Pro, a model that tops the lists on a wide range of benchmarks and debuted first on LMArena, a platform for crowdsourced AI benchmarking.

Today we’re releasing an experimental version of Gemini 2.5 Pro.

💡2.5 Pro shows strong reasoning and improved code capabilities, with state-of-the-art performance across a range of benchmarks.

📈It’s topped @lmarena_ai's leaderboard by a huge margin pic.twitter.com/8xggKSlvgv
undefined Google (@Google) March 25, 2025

Qwen Omni is now open-sourced on Hugging Face and GitHub, with additional access via Qwen Chat and Alibaba Cloud’s open-source community, ModelScope.

Alibaba released its updated Qwen2.5-VL models in various sizes just days after DeepSeek launched its reasoning model, the R1. The company claimed that they were on par with DeepSeek’s R1 and V3 and some of those created by Western competitors.

At the beginning of March, the company unveiled a smaller, 32 billion parameter model, QwQ-32B, also claiming comparable performance to the much bigger R1 as well as OpenAI’s o3-mini and o1.

Alibaba releases Qwen Omni, a model that can process images and videos on a phone

More from Cybernews