The total cost of DeepSeek’s AI models exceeded $1.5 billion, report estimates


A new report analyzes the cost of DeepSeek’s large language models and compares them to the ones created by OpenAI.

Chinese startup DeepSeek, which recently launched its V3 and R1 models, rivaling some of its Western competitors, has sparked discussions about whether the Western approach to spending billions of dollars on AI is effective.

On GitHub, DeepSeek claims that the cost of training the V3 released in December amounted to only $5.6 million. However, some mistakenly assumed that this was the overall cost of the model.

ADVERTISEMENT

According to a report by Semianalysis, $5.6 million is only the pre-training cost and excludes expenses such as R&D, maintenance, operation, and hardware.

Semianalysis claims that the High-Flyer hedge fund, of which DeepSeek was spun off in 2023 and shares both computing and human resources with the AI startup to this day, invested more than $500M US dollars in Nvidia’s graphic processing units overall.

“Our analysis shows that the total server CapEx for DeepSeek is ~$1.6B, with a considerable cost of $944M associated with operating such clusters,” the report adds.

Deepseek’s advantage

Semianalysis highlights that DeepSeek hires talent only from Chinese universities and allegedly offers yearly salaries of over $1.3 million USD for promising candidates, surpassing most Chinese tech companies.

DeepSeek’s advantage over Western and some of its local Chinese competitors is that the startup, with an estimated 150 employees, can move much more quickly on ideas.

In addition, DeepSeek, like Google, runs its own data centers without relying on an external party or provider. This reportedly opens up further ground for experimentation, allowing it to innovate across the stack.

Ernestas Naprys justinasv Stefanie jurgita
Don't miss our latest stories
ADVERTISEMENT

Deepseek vs OpenAI

Semianalysis also discusses how DeepSeek’s models compare to Western ones. While the V3 model outperforms Open AI’s GPT-4o in some aspects, the report highlights that Open AI’s model was released in May 2024. Since then, algorithms have improved significantly.

“We are not surprised to see less compute to achieve comparable or stronger capabilities after a given amount of time. Inference cost collapsing is a hallmark of AI improvement,” the report reads.

On the other hand, DeepSeek’s R1 model can reason and achieve comparable results to OpenAI’s o1, which was released this September.

Semianalysis explains that the new paradigm, focused on reasoning capabilities through synthetic data generation and reinforcement learning in post-training on an existing model, allows for quicker gains with a lower price.

“Comparing R1 to o1 is tricky because R1 specifically doesn’t mention benchmarks that they are not leading in. And while R1 matches in reasoning performance, it’s not a clear winner in every metric and in many cases, it is worse than o1,” the report adds.

It also highlights that the latest OpenAI model, the o-3, is much more advanced in reasoning and beats the R1.