When instructed the right way, for example, via a prompt to repeat a word forever, ChatGPT revealed entire segments of the text it was trained with, researchers at Google’s DeepMind discovered.
Instructing the chatbot to repeat the words “poem,” “send,” or “make” forever caused OpenAI’s creation, the popular ChatGPT, to post large swaths of texts it learned from.
Once prompted, the chatbot continued to write the prompt, but after some time, it began generating nonsensical output, some of which included memorized training data, such as a person’s email signature, together with personal contact details.
“Our methods show practical attacks can recover far more data than previously thought and reveal that current alignment techniques do not eliminate memorization,” reads a recently published paper. While ten authors are credited to the paper, seven of them work with Google DeepMind, an AI research laboratory.
Researchers succeeded in leading ChatGPT to reveal all sorts of data, such as personally identifiable information of several individuals, explicit content, whole paragraphs from books and poems, unique user identifiers, and programming code.
“[…] we recover over ten thousand examples from ChatGPT’s training dataset at a query cost of $200 – and our scaling estimate suggests that one could extract over 10× more data with more queries,” researchers said.
Recent findings suggest that there’s still a long way to go before generative AI models reach the desirable level of safety, Alastair Paterson, CEO of Harmonic Security, believes.
“Perhaps the biggest issue here is not ChatGPT but the risk to all the other less protected LLMs, including the Open-Source models in use in many third-party applications. Many appear to be vulnerable to this type of attack and are unlikely to be ‘patched’ as quickly as ChatGPT. Since LLMs are inherently vulnerable to these types of attacks, it underlines how important it is to avoid sensitive material entering third-party LLMs without proper risk management,” Paterson said.
However, at the time of writing, the prompts to coax ChatGPT into revealing training data no longer worked, with the bot refusing to repeat a prompt forever.
Released a year ago, ChatGPT has already experienced issues with leaks, as it was linked to Samsung employees leaking sensitive information as well as chatbots’ user data.
Your email address will not be published. Required fields are markedmarked