AI jailbreaking: turning chatbots into accomplices


Chatbots are supposed to be good by design and not give you advice on how to take over the world. However, villains have found multiple ways to harness AI.

Actors exploit weaknesses in popular chatbots like ChatGPT and extract uncensored content from them, circumventing built-in safety measures and ethical guidelines.

ADVERTISEMENT

Cloud security firm SlashNext has observed a rise in “jailbreaking” communities where individuals essentially share tactics for gaining unrestricted access to chatbot capabilities.

“The appeal of jailbreaking stems from the excitement of exploring new possibilities and pushing the boundaries of AI chatbots. These communities foster collaboration among users who are eager to expand the limits of AI through shared experimentation and lessons learned,” SlashNext notes.

Unfortunately, a chatbot can be broken with something as simple as a prompt. In this example, researchers wrote one in a demanding tone (known as the “anarchy” method) to trigger or activate unrestricted mode in ChatGPT.

The anarchy method

“By inputting commands that challenge the chatbot's limitations, users can witness its unhinged abilities first hand. Above, you can find an example of a jailbroken session that offers insights into enhancing the effectiveness of a phishing email and augmenting its persuasiveness,” researchers explained.

Malicious actors got so excited about jailbreaking AI tools that they’ve started developing their own versions, such as WormGPT, EscapeGPT, BadGPT, DarkGPT, and Black Hat GPT.

“Nevertheless, our research led us to the conclusion that the majority of these tools do not genuinely utilize custom LLMs, with the exception of WormGPT,” SlashNext researchers explained.

“Instead, they use interfaces that connect to jailbroken versions of public chatbots like ChatGPT, disguised through a wrapper. In essence, cybercriminals exploit jailbroken versions of publicly accessible language models like OpenGPT, falsely presenting them as custom LLMs.”

ADVERTISEMENT

SlashNext spoke to the developer of EscapeGPT, who allegedly confirmed the tool simply provides interface with a jailbroken version of ChatGPT, meaning “the only real advantage of these tools is the provision of anonymity for users.”

SlashNext added: “Some of them offer unauthenticated access in exchange for cryptocurrency payments, enabling users to easily exploit AI-generated content for malicious purposes without revealing their identities.”