Red-teamers unleash AI agent on McKinsey’s chatbot, gain full access in two hours

An offensive AI agent, created by red-team security startup CodeWall, autonomously chose McKinsey’s AI chatbot as a target and then hacked it in just two hours, gaining full read and write access to the system. This was just an experiment, but clearly, malicious machine-speed intrusions are possible.
-
CodeWall demonstrated that AI agents can now autonomously select targets and execute full cyberattacks at machine speed, completing complex hacks in just two hours.
-
Writable AI system prompts represent a critical new attack surface that most organizations are failing to secure adequately.
-
Publicly exposed API documentation with unauthenticated endpoints enabled access to millions of confidential messages and files.
McKinsey, a sprawling global management consultancy, built an internal AI platform called Lilli back in 2023.
It’s a purpose-built system: chat, document analysis, RAG (retrieval-augmented generation) over decades of proprietary research, and AI-powered search across 100,000+ internal documents. More than 40,000 employees use Lilli, the company boasts.
“So we decided to point our autonomous offensive agent at it. No credentials. No insider knowledge. And no human-in-the-loop. Just a domain name and a dream,” explained CodeWall, a startup that uses AI agents for attacking customers’ infrastructure and then helping them improve their security posture.
The agent performed really well, hacking Lilli and gaining full read and write access to the “entire production database” in just two hours.
Interestingly, CodeWall says its offensive AI agent autonomously selected McKinsey as a target, citing the company’s public responsible disclosure policy and recent updates to Lilli.
The numbers are impressive. CodeWall says it accessed 46.5 million chat messages about strategy, mergers and acquisitions, and client engagements, all in plaintext.
The agent also grabbed 728,000 files containing confidential client data, 57,000 user accounts, and 95 system prompts controlling the AI’s behavior.
Moreover, all those prompts were writable. This means that an attacker could poison every conversation Lilli, the chatbot, had with McKinsey consultants.
Even subtly altering financial models, strategic recommendations, or risk assessments would be extremely damaging to McKinsey. Confidential data could also be simply exfiltrated.
“Lilli’s system prompts – the instructions that control how the AI behaves – were stored in the same database the agent had access to. These prompts defined everything: how Lilli answered questions, what guardrails it followed, how it cited sources, and what it refused to do,” CodeWall said in the blog post.
“An attacker with write access through the same injection could have rewritten those prompts. Silently. No deployment needed. No code change. Just a single UPDATE statement wrapped in a single HTTP call.”
The consequences would potentially be devastating. Needless to say, even subtly altering financial models, strategic recommendations, or risk assessments would be extremely damaging to McKinsey. Confidential data could also be simply exfiltrated.
“Organizations have spent decades securing their code, their servers, and their supply chains. But the prompt layer – the instructions that govern how AI systems behave – is the new high-value target, and almost nobody is treating it as one,” says CodeWall.
CodeWall says its offensive AI agent autonomously selected McKinsey as a target.
The agent got into Lilli by mapping the attack surface and then finding the API documentation publicly exposed – over 200 endpoints, fully documented. Most required authentication, but 22 didn’t.
According to CodeWall, their agent found the way in because it doesn’t follow checklists: “It maps, proves, chains, and escalates – the same way a real highly capable attacker could, but continuously and at machine speed.”
McKinsey must have learnt the lesson. A mere day after CodeWall sent a responsible disclosure email to the consultancy’s security team, the company patched all unauthenticated endpoints, took the development environment offline, and blocked API documentation.
Has your password leaked?
What happened?
An offensive AI agent, created by red-team security startup CodeWall, autonomously chose McKinsey’s AI chatbot as a target and then hacked it in just two hours, gaining full read and write access to the system.
What did CodeWall's AI agent access?
The agent accessed 46.5 million chat messages about strategy, mergers and acquisitions, and client engagements, all in plaintext. It also grabbed 728,000 files containing confidential client data, 57,000 user accounts, and 95 system prompts controlling the AI’s behavior.
What does it mean?
Writable AI system prompts represent a critical new attack surface that most organizations are failing to secure adequately.
Unlock more exclusive Cybernews content on YouTube.