Single prompt grants complete control of AI models

Any major large language model (LLM) can be tricked into generating dangerous or malicious content with just one simple universal prompt, research by HiddenLayer reveals.

ChatGPT, Gemini, Copilot, Claude, Llama, DeepSeek, Qwen, and Mistral were all found to be vulnerable to a novel technique, which researchers named the “Policy Puppetry Prompt Injection.”

A single universal prompt made chatbots provide instructions on how to enrich uranium, make a bomb, or methamphetamine at home.

“It exploits a systemic weakness in how many LLMs are trained on instruction or policy-related data and is thus difficult to patch,” the researchers explain.

The malicious prompt combines a few things.

More from Cybernews

Sainsbury’s installs more facial rec tech in stores: Should shoppers be worried?

"Phantom squatting” uses AI hallucinated domains for cyber attacks

What they’re doing to gamers is a robbery in broad daylight

Irish parliament expands Microsoft use, despite the EU’s efforts to ditch it

Can government AI actually scrub UAP footage from the internet?

Apple CEO Tim Cook, right hand on heart, white male, grey white hair, left hand on heart

The EU and Apple CEO Tim Cook held "constructive" talks after their Siri AI dispute in Europe

First, the prompt is formatted like a policy file, such as XML, INI, or JSON. This tricks a chatbot into subverting alignments or instructions.

“Attackers can easily bypass system prompts and any safety alignments trained into the models. Instructions do not need to be in any particular policy language. However, the prompt must be written in a way that the target LLM can interpret as policy,” the paper explains.

Second, for “some particularly heinous requests,” you’ll need to rewrite the desired harmful behaviour in “leetspeak,” which replaces letters with similar-looking numbers or symbols.

Researchers found that advanced reasoning models, such as Gemini 2.5 or ChatGPT o1, need slightly more complex prompts to produce more consistent results.

So instead of “enrich and sell uranium,” the attacker could write “3nrich 4nd s3ll ur4n1um.”

Third, the final prompt also includes “the well-known roleplaying technique,” which entails directing the LLM to “adopt” a specific role, job, or function in a fictional setting.

Despite specific training to refuse all user requests instructing them to generate harmful content, emphasizing content related to CBRN threats (Chemical, Biological, Radiological, and Nuclear), violence, and self-harm, all major generative AI models succumbed to the attack. This novel technique was also used to extract full system prompts

Don’t miss our latest stories on Google News

Add us as your Preferred Source on Google.

The paper explains that chatbots are incapable of truly self-monitoring dangerous content. External monitoring is required to detect and respond to malicious prompt injection attacks in real time.

“The presence of multiple and repeatable universal bypasses means that attackers will no longer need complex knowledge to create attacks or have to adjust attacks for each specific model,” the researchers warn.

“Anyone with a keyboard can now ask how to enrich uranium, create anthrax, commit genocide, or otherwise have complete control over any model.”

The research suggests that additional security tools and detection methods will be needed to keep chatbots safe.

Major AI vulnerability discovered: single prompt grants researchers complete control

More from Cybernews