Microsoft engineer: I warned them about OpenAI, but they silenced me

Microsoft’s lead engineer has reached out to state authorities after the tech giant allegedly impeded his attempts to publicly raise concerns about the safety of OpenAI’s image generator DALL-E 3.

Shane Jones, a principal software engineering manager at Microsoft, has publicly said he discovered vulnerabilities in OpenAI’s DALL-E 3 image generator in early December, allowing users to bypass safety regulations to create violent and explicit imagery.

Jones claimed the company thwarted his prior attempts to draw attention to the issue. Consequently, he explained his concerns in a letter addressed to US Senators Patty Murray and Maria Cantwell, Representative Adam Smith, and Washington State Attorney General Bob Ferguson.

According to the letter, published by GeekWire, Jones reported the vulnerability to Microsoft and was instructed to pass the issue directly to OpenAI, which he did.

On December 14th, the engineer published on LinkedIn a letter to OpenAI’s non-profit board of directors, urging them to suspend the availability of DALL-E 3 to “prioritize safety over commercialization.”

“Shortly after disclosing the letter to my leadership team, my manager contacted me and told me that Microsoft’s legal department had demanded that I delete the post,” Jones wrote.

He was informed that Microsoft’ legal department would soon provide a detailed explanation for the takedown request via email. However, he was instructed to delete the post immediately without waiting for a response from the legal department.

“Reluctantly, I deleted the letter and waited for an explanation from Microsoft’s legal team. I never received an explanation or justification from them,” Jones said.

Jones called for a legal solution to monitor AI-related risks and hold tech companies accountable for the safety of AI products. Also, he raised his concerns about ensuring tech employees can independently report the issues without being “intimidated into staying silent.”

“I am asking you to look into the risks associated with DALL-E 3 and other AI image-generation technologies and the corporate governance and responsible AI practices of the companies building and marketing these products,” he added.

Cybernews contacted Microsoft for comment. According to the company's spokesperson, they've implemented an internal reporting tool. This allows employees to raise and escalate any concerns regarding their AI products, including their sensitive uses.

“We have established robust internal reporting channels to properly investigate and remediate any issues, which we recommended that the employee utilize so we could appropriately validate and test his concerns before escalating it publicly,” a spokesperson said.

According to the statement, Jones’ findings concerned an OpenAI product, so the company encouraged him to report through OpenAI’s standard reporting channels, and one of Microsoft’s senior product leaders shared the employee’s feedback with OpenAI, who “investigated the matter right away.”

The Microsoft spokesperson added: “At the same time, our teams investigated and confirmed that the techniques reported did not bypass our safety filters in any of our AI-powered image generation solutions. Employee feedback is a critical part of our culture, and we are connecting with this colleague to address any remaining concerns he may have."

OpenAI’s spokesperson told Cybernews that the company immediately investigated the Microsoft employee’s report when they received it on December 1st, but found that the technique he shared did not bypass the company's safety systems.

“In the underlying DALL-E 3 model, we’ve worked to filter the most explicit content from its training data, including graphic sexual and violent content, and have developed robust image classifiers that steer the model away from generating harmful images,” the spokesperson claimed.

As mentioned in the statement, the company has implemented additional safeguards for ChatGPT and the DALL-E API, including declining requests that ask for a public figure by name.

“We identify and refuse messages that violate our policies and filter all generated images before they are shown to the user,” the spokesperson said.

Last week, explicit AI-generated images of singer Taylor Swift circulated on X receiving 47 million hits before the user was suspended by the platform.

As reported by 404 Media, Microsoft Designer was one of the tools used to make the images. The software also uses DALL-E 3. Reportedly, Microsoft claims to have fixed the security loophole.

The article was updated to incorporate OpenAI and Microsoft's statements on January 31st, 2024.