
DeepSeek, China’s answer to ChatGPT, is under scrutiny. Politicians, regulators, competitors, and an array of publicity-seeking experts are questioning its claims, motives, and possible implications on the world economy.
In quite an ironic turn of events, OpenAI accused DeepSeek of illegally using its data to train the R1 model. Italians can no longer download the app via the Google Play Store or Apple App Store, and many cybersecurity firms rushed to give their first impressions of its security.
One of the most common worries we’ve noticed after scouting through countless posts, reports, and pitches by credible cybersecurity experts is that DeepSeek is trained on open-source data from Wikipedia and Github.
This opens up another opportunity for attack – crooks can inject malicious content into Wikipedia.
“While it’s community-controlled, you can still inject malicious “training content” with enough effort. Alternatively, you can inject “biases” into the algorithm,” Aleksandr Yampolskiy, CEO and co-founder of security company SecurityScorecard, wrote on LinkedIn.
Just after the launch, DeepSeek was hit by a large-scale malicious attack that forced the company to limit new user registrations. The attack, however, seemingly has nothing to do with the fact that it was trained on open-source data, as many other AI models, including OpenAI’s ChatGPT, are trained the same way.
“In general, training on open-source data could lead to the model outputting factually incorrect responses (since the open-source data may not be validated) and also opens up the possibility for attacks on the model's parser (since attackers can plant malicious data publicly, which will be picked up as training data for the model),” Shachar Menashe, VP of JFrog Security Research, told Cybernews.
As Menashe noted, it is widely suspected that the attack was a DDoS (distributed denial of service) attack.
“It appears that not only did DeepSeek skimp on the number of Graphics Processing Units (GPU), but failed to design with security in mind,” Kevin Kirkwood, CISO at Exabeam, commented on the cyberattack.
DeepSeek sent shockwaves through the stock market, claiming that it required only a fraction of the money to train its AI model compared to competitors. Whether that is true
remains to be seen. Meanwhile, security experts are further picking at the data they can access on DeepSeek to determine whether it’s safe to use.
Yes, at the moment, it can be jailbroken as researchers already have had their share of fun testing exploit techniques. Not that it’s that much different from the times when ChatGPT came around – hallucinations, biases, propaganda, and attempts to break it – all of it was also a part of public discourse.
The fact that DeepSeek is trained on open-source data is more problematic from an ethical point of view.
“This is the main issue of controversy with AI since the data used by the models is provided by users/platforms, which can be negatively affected by AI using them. For example, many Code-Generative models are based on answers from Stack Overflow, which is recently experiencing a huge decrease in user interaction,” Menashe said, adding that DeepSeek is not different from any existing large-scale AI solutions in this regard.
Steve Povolny, another expert from Exabeam, also doesn’t think that training DeepSeek on GitHub or Wikipedia represents a critical threat.
Despite Wikipedia's size, it's just a tiny part of the vast data used to train the model.
“Given the format of editor community review for Wikipedia, it is unlikely that intentionally introduced bias would be widespread enough to result in a significant inherent bias from the pre-trained models themselves.”
It’s mostly bias and misinformation that we should be on the lookout for. At least, don’t ask it about China – allegedly, DeepSeek refuses to answer 85% of prompts on sensitive topics in Beijing.
Your email address will not be published. Required fields are markedmarked