HuggingFace and GitHub developer platforms may be essential for the development of AI technologies – but they’re also leaving top-level organization accounts from Google, Meta, Microsoft, and WMWare exposed to threat actors.
That’s the shocking reveal from researchers at Lasso Security, who insisted that “the gravity of the situation cannot be overstated.”
Launching its investigation in November, Lasso Security inspected hundreds of application protocol interfaces (APIs) on both expertise-sharing platforms to reach its startling conclusion. APIs are used to allow applications in computing to ‘talk’ to each other and are, therefore, a cornerstone of IT development.
Facebook owner Meta was found to be particularly vulnerable, with its large-language Llama learning model being exposed in many cases. But it was by no means the only LLM research workshop found to be critically vulnerable.
“Notably, our investigation led to the revelation of a significant breach in the supply chain infrastructure, exposing high-profile accounts of Meta,” researchers said. “The ramifications of this breach are far-reaching, as we successfully attained full access, both read and write permissions, to Meta Llama2, BigScience Workshop, and EleutherAI.”
Between them, the compromised parties own models with millions of downloads – leaving them all “susceptible to potential exploitation by malicious actors.”
“The gravity of the situation cannot be overstated,” said the researchers. “With control over an organization boasting millions of downloads, we now possess the capability to manipulate existing models, potentially turning them into malicious entities.”
In what it claims adds up to “a dire threat,” the report says the injection of these corrupted models with malware “could affect millions of users who rely on these foundational models for their applications.”
A key vulnerability in the HuggingFace API setup is the facility provided to developers who need to integrate models, reading, creating, modifying, or deleting repositories and files as needed. Unfortunately, it would appear that this tool can be exploited by a malicious hacker.
“These HuggingFace API tokens are highly significant for organizations and exploiting them could lead to major negative outcomes such as data breaches, malicious models spreading, and more,” researchers said.
The analyst had hoped that its investigation of Hugging Face and GitHib would help shed light on any security measures developers needed to take to protect LLMs against potential threats – now it looks as though those measures will need to be stringent indeed.
“The implications extend beyond mere model manipulation,” said Lasso Security. “Our research also granted us access to 14 datasets with tens of thousands of downloads. Alarming as it is, this opens the door to a malicious technique known as training data poisoning.”
What that means is that by tampering with trusted datasets, attackers could compromise the integrity of machine learning models, “leading to widespread consequences.”
Researchers say they reached out to all concerned parties, who promptly responded to its alert. Hugging Face, Meta, Google, Microsoft, and VMWare followed its advice by revoking or deleting the exposed API tokens, in many cases on the same day as the disclosure was made.
To prevent further such occurrences and forestall data poisoning or theft, Lasso Security recommends that tokens used in LLM development are strictly classified from now on, with cybersecurity solutions put in place that are tailored specifically to these models.
“In addition to classification, frequent monitoring and verifying the integrity of datasets, ensuring that they remain untampered,” Lasso Security added. “By promptly addressing these issues, organizations can fortify their defenses and avert the looming threats posed by these security vulnerabilities.”
More from Cybernews:
Subscribe to our newsletter