Crypto AI agents can be tricked into giving away their money, study finds


As a new class of crypto tools is emerging, a study has showcased how crypto artificial intelligence (AI) agents can be injected with malicious instructions, forcing them to transfer all the funds under their management to the attacker.

Crypto AI agents are mostly semi-autonomous AI agents that operate in the cryptoasset space, helping to execute various tasks, such as market monitoring or even portfolio management.

However, researchers have now demonstrated that these programs have a serious vulnerability that can cost their users millions of cryptoassets.

ADVERTISEMENT

In their study, a group of Princeton University researchers introduced the concept of context manipulation. It means that attackers can manipulate context by injecting malicious instructions into prompts or historical interaction records. These instructions can be used to force an AI agent to transfer the funds under their management to attacker-controlled wallets or violate protocol in other ways, for example, by interacting with harmful smart contracts.

The researchers said they've managed to trick a popular crypto agent, ElizaOS, into sending them 0.01 Ethereum (ETH) on one of the Ethereum blockchain's testnets, Sepolia. However, the authors of the study also managed to repeat the same experiment on the mainnet as well, meaning that this time, real ETH was involved.

"Alarmingly, ElizaOS executed the transaction, transferring real funds to the attacker’s account," the researchers said, providing a full scheme of how this attack works.

Niamh Ancell BW Gintaras Radauskas justinasv Paulina Okunyte
Stay informed and get our latest stories on Google News

As an attacker performed memory injection on Discord, ElizaOS only responded to the final line of the input, which was a normal query, while malicious instructions were left in memory. Next, on X, the attacker instructs the agent to send ETH to their address.

"However, since the memory is shared among all applications, the retrieved history contains the malicious instructions. As a result, ElizaOS ends up sending ETH to the injected address," the researchers explained.

a graph showing how and AI agent can be inject with malware
Source: Real AI Agents with Fake Memories

According to them, in the case of this crypto AI agent, there might be two solutions, each with its own downsides. Developers should either limit plugin functionality to reduce the attack surface area or maintain full functionality while implementing defenses against prompt injection.

ADVERTISEMENT

"Our findings indicate that prompt-based defenses are insufficient when adversaries corrupt stored context, achieving significant attack success rates despite the presence of these defenses. Finetuning-based defenses offer a more robust alternative, substantially reducing attack success rates while preserving utility on single-step tasks," the researchers concluded.