Beyond DeepSeek: local Chinese models to watch out for


We look at models created by Chinese companies Alibaba, Baidu, Tencent, and ByteDance.

Up until Chinese startup DeepSeek released its large language model (LLM) R1 last week, Western artificial intelligence (AI) developers, including OpenAI, Google, and Anthropic, were seen as the major dominant force.

It appeared that the Chinese companies, which can’t get Nvidia chips to train their AI due to US restrictions, will have to do a lot of work to catch up to their Western counterparts.

ADVERTISEMENT

However, everything changed over the course of just one week. While the December release of DeepSeek-V3, which achieves comparable performance to Western competitors in terms of visual capabilities, went unnoticed by the masses, DeepSeek’s R1 model sent shock waves across the industry, wiping out billions of dollars in market value.

The creators state that the DeepSeek LLM's training costs amount only to $5.6 million. Some debated the number, pointing out that the overall expenses were higher. However, the ammount is estimated to be much lower compared to the billions poured into developing LLMs by Western companies.

While Deepseek was in the spotlight, its local competitors didn’t stand still. Alibaba has released updated Qwen family open source models, claiming superior performance against DeepSeek. Meanwhile, ByteDance unveiled Doubao-1.5-pro, which excels at math and coding.

There are a number of other Chinese companies working in the field that already are or may be deploying the same cost-efficient techniques.

Here is a look at some of the Chinese LLMs and their developers that may be worth watching.

Baidu’s Ernie

Months before the release of the latest DeepSeek models, Baidu’s Ernie bot was seen as a Chinese alternative to ChatGPT.

According to a test by a Chinese Tsinghua University, the Ernie bot 4.0, built on the company’s Ernie LLM, topped the Chinese LLMs list last year.

ADVERTISEMENT

However, the researchers claimed that the previous models released by the US companies, including OpenAI’s GPT-4 and Anthropics Claude-3, were leaders in multiple capabilities, such as semantic comprehension and coding.

This June, Baidu launched the updated bot Ernie 4.0 Turbo, which has faster responses and boosted reasoning capabilities, along with the deep learning framework PaddlePaddle 3.0.

A few months after the release, the company claimed that its Ernie LLM-based chatbot had over 300 million users.

Although Baidu hasn’t released new models this year, there is little doubt that it is further improving its Ernie LLMs, though we don’t know whether it could be as advanced as some of its Chinese competitors.

ByteDance’s Doubao

Byt Dance, the owner of the TikTok social network has recently debuted Doubao-1.5-pro. The company claims that its closed-source model outperforms OpenAI’s o1 in AIME tests, which measure advanced multi-step mathematical reasoning of AI models.

In addition, the Chinese tech giant also released UI-TARS, its native agent model, which is capable of reasoning and performing computer interactions.

Another notable feature of the chatbot, aside from its performance, is its price. Like DeepSeek, Byte Dance uses the ‘Mixture of Experts’ (MoE) architecture, which requires fewer activation parameters during training while providing the power of a dense model seven times its size.

The model is reportedly 5 times cheaper than DeepSeek’s and around 200 times cheaper than OpenAI’s o1.

Niamh Ancell BW Konstancija Gasaityte profile Ernestas Naprys Marcus Walsh profile
Don’t miss our latest stories on Google News
ADVERTISEMENT

Alibaba’s Qwen

Another major player in the Chinese market, Alibaba, has recenltyreleased updated Qwen models, Qwen2.5-VL and Qwen2.5-Max.

The company says that Qwen2.5-Max outperforms DeepSeek V3 in benchmarks such as Arena-Hard, LiveBench, LiveCodeBench, and GPQA-Diamond while also demonstrating competitive results in other assessments.

Meanwhile, the Qwen2.5-VL family models have significantly enhanced general image recognition capabilities and can now recognize intellectual properties from TV, film, and a variety of products.

The creators highlight that Qwen2.5-VL achieves significant advantages in understanding documents and diagrams.

Qwen2.5-VL also includes agent functionality for direct computer and phone use.

Tencent’s Hunyuan

China’s gaming and social media giant Tencent released three of its Hunyuan family models last November.

At the time of release, the company said that its open-source Hunyuan-MoE-A52B model was the largest open-source Transformer-based MoE model in the industry, featuring 389 billion parameters and 52 billion active parameters.

At the time, the company claimed superiority in benchmarks such as commonsense understanding, reasoning, and classical NLP tasks such as QA and reading comprehension tasks over the previous version of DeepSeek and Meta’s LLama3.1-405B.

ADVERTISEMENT

Last week, Tencent released Hunyuan3D 2.0, its updated synthesis system, which turns 2D objects into 3D.

“Hunyuan3D 2.0 outperforms previous state-of-the-art models, including the open-source models and closed-source models in geometry details, condition alignment, and texture quality,” the company said in its technical report.