
Most AI companies, wherever they’re based, train their models on content they vacuum up on the web. That data doesn’t exactly belong to the firms. But Anthropic has still just accused three prominent Chinese startups of lifting large amounts of data from its chatbot, Claude.
The San Francisco-based AI startup says the three Chinese companies – DeepSeek, Moonshot, and MiniMax – improperly harvested data from its AI technologies in an effort to accelerate the development of their own systems.
According to Anthopric, DeepSeek, Moonshot, and Minimax used about 24,000 fraudulent accounts to generate over 16 million conversations with Claude that could be used to teach skills to their own chatbots.
To be fair, using one AI system’s data to train another – it’s a process called distillation – is pretty common in the industry. After all, that data is still being taken from somewhere.
Generally, AI firms such as Anthropic or OpenAI act under the assumption that “publicly available data” on the internet is fair game. But this doesn’t mean they legally own the underlying content.
However, Anthropic’s terms of service forbid anyone from surreptitiously harvesting data for distillation and don’t allow its technologies to be used in China.
“Distillation is a widely used and legitimate training method. For example, frontier AI labs routinely distill their own models to create smaller, cheaper versions for their customers,” Anthropic said in a blog post.
“But distillation can also be used for illicit purposes: competitors can use it to acquire powerful capabilities from other labs in a fraction of the time, and at a fraction of the cost that it would take to develop them independently.”
According to the firm behind the chatbot Claude, this is a national security risk because this type of theft of intellectual property could allow China to build AI technologies to create bioweapons or tools for mass surveillance.
Anthropic’s rival OpenAI recently also warned US lawmakers that DeepSeek was targeting the ChatGPT maker and other US AI firms to replicate their models and use them for its own training, essentially “free-riding on the capabilities” developed by the Americans.
That’s fair. Through state-sponsored initiatives, cyber espionage, and forced technology transfers, China indeed targets trade secrets and proprietary information to bolster domestic industries, impacting sectors from aerospace to tech.
Have thoughts about this topic? Others do, too. Join them in the discussion.
But in this particular case, a ray of hypocrisy shines through. That’s because Anthropic, now valued at $380 billion, is itself facing multiple lawsuits accusing the startup of illegally using copyrighted internet data to train its systems.
More to the point, the firm was last year accused of using illegally pirated books from “Library Genesis” to train its Claude models. Anthropic agreed to a landmark $1.5 billion settlement.
Anthropic, now valued at $380 billion, is itself facing multiple lawsuits accusing the startup of illegally using copyrighted internet data to train its systems.
“I can't believe someone would just steal from Anthropic like this. The millions of man-hours Anthropic spent hand-writing code, text, art, books, etc., to generate enough data for training must be taken into consideration here. Where is the respect for IP?” one coder ironically asked on X.
The timing of all this might not be an accident. It’s hard to forget early 2025 when Hangzhou-based DeepSeek shook markets with a set of AI models that rivaled some of the best offerings from the US.
Now, the US AI industry is trembling again: DeepSeek is reportedly preparing to launch its new V4 model, and the AI-heavy Nasdaq could be in for a major ride. In 2025, shares for the chip giant Nvidia plummeted 17% and wiped out $600 billion in a flash.
Unlock more exclusive Cybernews content on YouTube.
Your email address will not be published. Required fields are markedmarked