Lessons learned from ChatGPT’s Samsung leak

Samsung employees reportedly leaking sensitive data via OpenAI’s chatbot ChatGPT presents a timely lesson on preventing future breaches involving Large Language Models (LLMs).

Early April reports allegedly indicate that several Samsung employees inadvertently leaked sensitive company data on three separate occasions. The information that staff of the South Korean tech giant supposedly leaked included the source code of software responsible for measuring semiconductor equipment.

While OpenAI explicitly tells users not to share “any sensitive information in your conversations” in the company’s frequently asked questions (FAQ) section, ChatGPT’s usefulness likely overrode security concerns.

Even though Samsung reportedly banned the use of generative artificial intelligence (AI) tools from its premises, threatening disobedient staff with the termination of a contract, the risk of spilling company secrets over an AI-based chatbot is not going anywhere.

“Specifically, banning ChatGPT might feel like a good response, but it will not solve the larger problem. ChatGPT is just one of many generative AI tools that will be introduced to the workplace in the coming years.”

Payne said.

Loss of data where a ChatGPT-like bot is involved even has its own name: conversational AI leak. These types of leaks concern events where sensitive data input into an LLM is unintentionally exposed, Tyler Young, chief information security officer at data management firm BigID, explained.

According to him, ChatGPT allows for connections based on an application programming interface (API), which allows companies and staff to connect with the AI service and, in turn, reveal sensitive data. For example, an AI expert Santiago Valdarrama, together with researchers at Levatas, an AI company, paired a robot dog with ChatGPT, demonstrating how robot can verbally update on its upcoming tasks and status.

“Because the LLMs are created to generate responses to questions and using learned data, they may expose confidential information unintentionally,” Young told Cybernews.

Output control

One way to prevent OpenAI’s ChatGPT, Google’s Bard, or Baidu’s Ernie Bot from revealing sensitive data LLMs have learned from careless employees is for companies to limit what their creations can and cannot say.

Some have already tried to take that route, demanding that OpenAI limit ChatGPT’s output or delete the chatbot altogether. Alexander Hanff, a privacy advocate also known as That Privacy Guy, served a Cease and Desist letter to OpenAI over “defamatory” statements falsely claiming Haniff was dead.

However, Young believes the problem is not limited to ChatGPT or other better-known LLMs. Open-source or community versions will become more popular, meaning that limiting their output will be severely problematic.

“The more succinct way of approaching this should be companies controlling the data that is being sent to the model, the connections that are able to be made, and ultimately “who” in the org can use the models,” Young said.

Organizations could also follow the lead of Samsung and outright ban staff from chatting with generative AI. However, Joe Payne, the CEO of insider risk software solutions provider Code42, says banning chatbots one by one will start feeling “like playing whack-a-mole” really soon.

“Specifically, banning ChatGPT might feel like a good response, but it will not solve the larger problem. ChatGPT is just one of many generative AI tools that will be introduced to the workplace in the coming years,” Payne said.

ChatGPT Boston Dynamics Spot
ChatGPT paired with Boston Dynamic’s robodog Spot.

Gatekeeping AI access

A different approach to the problem could lie in limiting the input. Young said that companies and organizations should focus on controlling the use of LLMs internally by controlling what type of data is fed into their model.

Additionally, Young argues, organizations could limit who has access to chatbots and who doesn’t. In other words, businesses should avoid outright bans and limit who can access chatbots and what data can be shared.

“Any system producing, storing, or processing data can be impacted if connected or exposed to GPT models. Theoretically, any data type could be impacted if not correctly controlled,” Young explained.

As ChatGPT and its rivals become popular, sensitive data leaks are inevitable. After all, privacy concerns over ChatGPT’s security have been ramping up since OpenAI revealed that a flaw in its bot exposed parts of conversations users had with it, as well as their payment details in some cases.

As a result, the Italian Data Protection Authority has banned ChatGPT, while German lawmakers have said that they could follow in Italy’s footsteps. Later, however, Italy lifted the ban, as the chatbot’s maker met the watchdog’s privacy demands.