AI agents have hard brakes to stop them from nuking your drive – but they don’t work
AI agents are blocked from running “rm -rf /” and wiping your drive. But “r’’m -rf /” is fine, even though it does the same thing.

Image by Cybernews.
- Researchers discovered that most open-source AI agents use easily bypassed command filters (denylists) that fail to stop obfuscated destructive commands.
- Ten out of eleven tested open-source AI agents failed to stop basic shell injection bypasses.
- Developers should isolate AI agents from sensitive directories (like SSH/AWS keys) and avoid "auto-yes" execution modes.
If your AI bot misbehaves and tries to delete your root directory, the hardcoded command sanitizer is the last line of defense. Researchers show that these guards and safety filters can be easily bypassed in Hermes, opencode, Goose, and nearly every other open-source agentic AI software tested.
AI coding agents are commonly vulnerable to shell command injection vulnerability, warns a new study by Adversa.ai.
The researchers tested the last line of defense, which has nothing to do with AI models – the filters that validate commands before they ever reach the terminal.
All large language models (LLMs) hallucinate and are vulnerable to prompt injection attacks. However, agentic AI software has implemented command filters as barriers intended to prevent the agent from executing the most destructive commands.
Ten out of eleven popular open-source agentic AI tools outright failed against decades-old shell tricks.
Stay updated with our latest stories and follow us on social media
Be the first to discover new stories, ideas, and updates from our team.
For example, all the guards would stop a “rm -rf /” command that would simply delete all files and folders on the system.
But it can be rewritten differently.
The AI agent (or attacker) can add quotes to it. For example, r''m, might look like nonsense to a human eye, but it's still the same rm (a shorthand for remove) command in a terminal, because it removes the quotes.
rm$IFS-rf$IFS/ also doesn’t read like rm -rf /, but in the terminal, they’re equivalent, because $IFS (Internal Field Separator) simply counts as a white space splitting words apart.
And there are many other ways to pass the same command to the shell. Before running any command, the terminal performs several steps, such as removing quotes, expanding parameters, substituting commands, and field splitting – all of which can be abused to trick AI software guards.
“Decades-old shell bypasses (quote removal, $IFS, command substitution, base64-to-sh, destructive argv flags) systematically defeat the guards of today’s most popular open-source AI agents,”the research about the vulnerability class dubbed GuardFail.
In the real world, the AI agent wouldn’t even need to hallucinate or rewrite anything – it can pull a poisoned file with a malicious command from a repository and execute it, compromising the entire underlying system.
AI agents failed in different ways
Hermes agent, built by NousResearch, includes a denylist that only checks AI agents’ commands for 30 known-dangerous patterns. It is called “approval gate.”
“It was the trigger for this research,” the researchers said.
No surprise that Hermes succumbed to all 5 tested bypass classes.
Then, researchers tested 10 other popular AI agents; all of them combined have 548,000 GitHub stars.
Goose and opencode also ship a similar easily-defeated guard.
Five AI agents came with no guard at all: Aider, Plandex, Open-interpreter, Cline, and OpenHands.
Two AI agents – Cline (with an opt-in controller) and Roo-code – do the smarter thing: they have a tokenized guard that first decodes disguised commands to check the real command underneath. But they failed to recognize commands nested inside a quoted substitution, $(...), and didn’t recognize some destructive flags.
Check if your data has been leaked
However, researchers noted one exception.
“The eleventh, Continue, is the only one with a correctly implemented guard,” the researchers noted.
Continue’s guard has implemented a logic that understands the command itself rather than scanning its text, breaking it into pieces the way an actual shell would: tokenizing, expanding variables, evaluating command substitutions, checking pipe destinations, and only then matching against the filter list.
The researchers highlighted that proprietary AI coding agents had also been found vulnerable to similar issues in the past. Claude Code's deny-rule could be bypassed if the dangerous command was preceded by dozens of harmless statements. The researchers also cited Antigravity’s command-injection RCE bug, the CrewAI agentic-framework RCE chain, and others.
“An agent that can run arbitrary shell commands on the operator’s host, gated by a regex matching the LLM’s emitted string, is not a defense. It fails while fully enabled and correctly configured,” the researchers concluded.
They also urge developers to isolate their AI agents so they can’t access sensitive directories, such as those containing SSH or AWS keys, and to disable every “auto-yes” flag, such as “yolo” mode, among other recommendations.