Chatbots used to “jailbreak” other chatbots


Researchers in Singapore tricked ChatGPT, Google Bard, and Microsoft Bing into breaking the rules and then turned them against each other.

Multiple chatbots were compromised by a research team at the Nanyang Technological University (NTU) in Singapore to produce content that violates their own guidelines, the school said.

Known as “jailbreaking,” the process involves hackers exploiting flaws in a software’s system to make it do something that its developers deliberately restricted it from doing.

Researchers then used a database of prompts that proved successful in hacking chatbots to create a large language model (LLM) capable of generating further prompts to jailbreak other chatbots.

“Training an LLM with jailbreak prompts makes it possible to automate the generation of these prompts, achieving a much higher success rate than existing methods. In effect, we are attacking chatbots by using them against themselves,” said Liu Yi, co-author of the study.

Developers put guardrails in to prevent chatbots from generating violent, unethical, or criminal content, but AI can be “outwitted,” according to Liu Yang, lead author of the study.

“Despite their benefits, AI chatbots remain vulnerable to jailbreak attacks. They can be compromised by malicious actors who abuse vulnerabilities to force chatbots to generate outputs that violate established rules,” Liu said.

According to researchers, a jailbreaking LLM can adapt to and create new jailbreak prompts even after developers patch their LLMs allowing hackers “to beat LLM developers at their own game with their own tools.”

Researchers reported the issues to the relevant service providers immediately after initiating successful jailbreak attacks, NTU said.


More from Cybernews:

AI tech predictions: what to expect in 2024

Four Chinese individuals arrested for ChatGPT-made ransomware

Microsoft disables App Installer after observing financially motivated threat actor activity

Google accounts may be vulnerable to new hack, changing password won’t help

Spotify music converter puts users at risk

Subscribe to our newsletter



Comments

Homer10
prefix 5 months ago
Here's a Chatbot experiment. Have a friend make a Chatbot, and start conversing with it. Now you make a Chatbot, and ask it the question: "What is the other Chatbot thinking about?". I wonder what the result would be?
Leave a Reply

Your email address will not be published. Required fields are markedmarked