Microsoft AI research team allegedly leaks 38TB of private data

A disk backup of two employees’ workstations were also included in the leaked data, researchers at cloud security company Wiz said.

Microsoft’s AI research team accidentally exposed sensitive private data while publishing open-source training data on GitHub.

Microsoft’s GitHub repository named robust-models-transfer was allegedly configured to grant permissions on the entire account, meaning not only open-source models but a treasure trove of private data was accessible to third parties.

According to Wix, the additional 38TB of data that shouldn’t have been exposed included Microsoft employees’ personal computer backups with passwords to Microsoft services, secret keys, and over 30,000 internal Microsoft Teams messages from 359 Microsoft employees.

Attackers could view all the files in the storage account, delete or overwrite them, said the researchers.

“An attacker could have injected malicious code into all the AI models in this storage account, and every user who trusts Microsoft’s GitHub repository would’ve been infected by it.”

Wix also noted that the storage account wasn’t directly exposed to the public. “The Microsoft developers used an Azure mechanism called “SAS tokens,” which allows you to create a shareable link granting access to an Azure Storage account’s data – while upon inspection, the storage account would still seem completely private,” they explained.

Shared Access Signature (SAS) is a token that grants access to Azure Storage data, and the access level can be customized by the user.

“Besides the risk of accidental exposure, the service’s pitfalls make it an effective tool for attackers seeking to maintain persistency on compromised storage accounts,” Wix said.

Microsoft invalidated the token in July.

Microsoft AI research team allegedly leaks 38TB of private data

More from Cybernews