AI honeypot hit 800K times and managed to trap six AI-powered hacking agents


A new study highlights the risk of AI-powered hack attacks on the open internet.

The internet can be a dangerous place, with countless examples of illicit materials and attempts to gain personal information. Hackers are always monitoring traffic flows to see who might be vulnerable to an attack – or alternatively, who may have information that people want to get, such as military or governmental secrets.

A new paper published on the arXiv pre-print server earlier this month has uncovered just how significant the scale of those searches for would-be victims is.

ADVERTISEMENT

The study, conducted by two researchers at Palisade Research, lays out a honeypot specifically designed to elicit the attention of would-be hackers and then get them to spill their secrets.

The LLM Agent Honeypot deployed a vulnerable SSH version on the publicly accessible web and developed strategies to attract hackers' attention.

It was configured to point and ping sensitive domain names, such as gov-defense-dev.il, which looks like an Israeli defense website, to try and attract hackers. It was also indexed on Shodan and Censys, both of which are used by hackers to try and find vulnerable buckets on the internet.

The honeypot was attached to 10 separate IP addresses to try and increase the likelihood that hackers trawling the web would come across it.

Sit and wait

Once deployed, the researchers then waited to see what happened. In the time the honeypot was deployed, it received 813,202 total interactions.

To distinguish between what was and what wasn’t an AI-powered agent or an LLM-powered hacker, the researchers configured the SSH version to include a number of prompt injections designed to invoke easy tells that would suggest interacting with them was or wasn’t an AI system, as opposed to a web crawling bot.

The prompt injections were secretly hidden within banner messages displayed during the initial SSH connection, in common shell commands that a hacker would try and use to get access to the system. In the system files themselves they would likely be pored over to try and understand what they were looking at and what system they had just managed to unlock.

ADVERTISEMENT

In each of those instances, the prompt injections tried goal hijacking – where you manipulate the agent’s behavior to direct it towards a new goal – and prompt stealing, where you ask the LLM to reveal its initial system instructions.

Uncovering secrets

Both methods worked well, identifying several different types of agents powered by LLMs. In all, the test managed to identify six broad areas of LLM agents that hit the honeypot and got caught in its trap.

But the researchers behind the work suggest this is just the beginning as more and more hackers begin to deploy LLMs and associated AI systems to automate large parts of their work.

“As AI agents grow more sophisticated, our approach offers insights into emerging cybersecurity threats and new strategies to counter them,” the researchers write. It seems likely that in the months and years to come when, other hackers start putting more effort and work into developing similar systems, which is why the team behind this honeypot has chosen to keep the system online and will be regularly updating their data.

“As the dataset grows, we will conduct more in-depth analyses to better understand LLM hacking behaviors and refine our methods,” the researchers write.