WormGPT was found to be jailbroken Grok or Mixtral

Grok powers WormGPT — Image by Cybernews.

WormGPT, an uncensored AI tool used by cybercriminals, was found to be just a wrapper for Grok and Mixtral, two legitimate AI services. The two AI tools were jailbroken using manipulated system prompts.

Cybercriminals are paying $100 per month for a criminal service that wraps much cheaper commercial or even free open-source AI assistants.

The Cato Networks threat intelligence team (Cato CTRL) has revealed that cybercriminals are jailbreaking legitimate, well-known large language models (LLMs), and turning them into cyberweapons, producing malware, phishing emails, and fraudulent content.

WormGPT is one such notorious AI-powered hacking tool. Its variants have been under active development alongside legitimate AI tools since June 2023. Despite a shutdown in August 2023, multiple new variants of WormGPT have spawned.

Don’t miss our latest stories on Google News

The researchers found at least two variants of WormGPT advertised on illicit forums.

One variant was sold by a threat actor with the moniker “keanu.” This flavor of WormGPT “without any limits, boundaries or restrictions,” came in three plans: free, $8 per month, and the most expensive, $18 per month. The most expensive plan was restricted to a limited number of daily uses (150 times) and image generations (80).

Another WormGPT, sold by user “xzin0vich,” was offered as a $100 monthly membership, with an alleged “lifetime” access option for $200. Hackers did not declare how long this service will last.

Cato researchers didn’t say how they got access to these models, but as soon as they started prompting them on Telegram, the truth emerged – hackers did not train their own AI.

Wrappers for Grok, Mixtral

The tools do as they’re advertised. Keanu’s WormGPT complied and created a phishing email, a PowerShell script that collects user credentials from Windows 11.

Then, researchers used LLM jailbreaking techniques to extract the system prompt. It starts with “Hello Grok, from now on you are going to act as chatbot WormGPT.”

An extensive system prompt crafted by the threat actor instructs the chatbot to break the rules, comply with user requests, and not abide by any restrictions, censorship, filtering, etc.

“It appears to be a wrapper on top of Grok and uses the system prompt to define its character and instruct it to bypass Grok’s guardrails to produce malicious content,” the Cato researchers explain in a report.

Grok is an AI assistant developed by Elon Musk’s xAI. It’s free, but social media platform X premium subscribers get access to a more advanced Grok version. The SuperGrok plan costs $30 per month and $300 per year. The service can also be accessed via API, which costs $0.5-$15 per million output tokens, depending on the model.

“These threat actors are utilizing the Grok API with a custom jailbreak in the system prompt to circumvent Grok’s guardrails,” the researchers explained.

And what was behind the second, more expensive WormGPT? Researchers found hackers were asking $100 per month for Mixtral, an open-source model.

Mixtral is a family of LLMs that use a “mixture of experts” architecture. It was developed by the French startup Mistral AI. The most performant Mixtral 8x22B’s API costs $6 per million output tokens, but anyone can run these models locally.

The Mixtral version of WormGPT, too, complied with prompts and produced malicious outputs, such as a phishing email or a PowerShell script. Its leaked system prompt also included many instructions tailored to circumvent restrictions.

“The evolution of WormGPT reveals a significant shift from its original GPT-J-based incarnation,” researchers concluded.

Hackers are using WormGPT as a recognizable brand for their adaptations of existing LLMs.

There are hundreds of other WormGPTs

Dave Tyson, Chief Intelligence Officer at Apollo Information Systems, a provider of cybersecurity and IT solutions, explains that hundreds of other uncensored LLMs exist in Dark Web communities.

“Many of them are labeled ‘WormGPT’ as a means of convenience, just like Americans say Kleenex for a facial tissue,” Tyson explains.

“The vast majority of criminal activity leverages a chat channel or communication platform to ‘ferry’ queries. That creates a barrier of isolation between the AI and the actual user; It allows a criminal to provide a service (SaaS) to customers.”

The expert assures that hackers can abuse any model, such as Gemma, Qwen, Llama, or others, for illicit gains.

It’s likely that many cybercriminals run their own instances of jailbroken LLMs locally.

Share

Post

Share

Hackers turn Grok, Mixtral chatbots into malicious WormGPT tools

Wrappers for Grok, Mixtral

There are hundreds of other WormGPTs