OpenAI says that DeepSeek illegally used its data to train the R1 model


Microsoft and OpenAI are investigating whether DeepSeek might’ve breached OpenAI’s terms of service.

The capabilities of an open-source model R1 released by Chinese startup DeepSeek have made Western companies scratch their heads.

By employing resource-efficient techniques, the company was able to create a model comparable to OpenAI’s GPT 4-o with far fewer computational resources. DeepSeek reportedly created the model by investing only $6 million, while the expenses of OpenAI and other Western companies stretched to hundreds of billions of dollars.

ADVERTISEMENT

However, it appears that DeepSeek’s cost-efficient techniques could have involved obtaining data illegally.

According to the Financial Times, OpenAI has evidence that DeepSeek breached its terms of service. An unnamed spokesperson in the company said that it had seen some evidence of distillation. This process allows better performance on smaller models by using outputs from larger ones.

Meanwhile, Microsoft security researchers claim that in the fall, they observed individuals allegedly linked to DeepSeek extracting large amounts of data using OpenAI’s API.

Bloomberg notes that software developers can pay for a license to integrate proprietary OpenAI models. However, DeepsSek's activity could indicate that the group acted to remove OpenAI’s restrictions on how much data it could obtain.

Gintaras Radauskas jurgita Niamh Ancell BW Ernestas Naprys
Join 25,260+ followers on Google News

Benchmarks show that DeepSeeks’s model R1, which topped the App Store downloads chart this week, surpassing ChatGPT, achieves similar performance to the best OpenAI, Anthropic’s, and Gemini models.

This week, another Chinese company, Alibaba, released its new Qwen models. These models can analyze images, texts, charts, icons, graphics, and layouts within images.

The company claims that its models Qwen2.5-VL with 72 billion parameters outperform GPT 4-o Gemini 2 flash, GPT-4o, and Claude 3.5 Sonnet in several tasks, including understanding documents, diagrams, images, and videos.

ADVERTISEMENT