Hackers broke Google’s AI. Here’s how they did it

Published: 31 March 2025

Paulina Okunytė
Senior Journalist

google-ai-gemini — Image by Shutterstock.

Hackers cracked Google’s AI and got rewarded.

As companies rush to deploy AI assistants, classifiers, and a myriad of other LLM-powered tools, a critical question remains: are we actually building this stuff securely?

Spoiler alert – we aren’t.

Lupin & Holmes researchers got early access to a preview of the next Gemini update and tried to hack it. The team of ethical hackers has previously attended Google’s own AI security challenge, LLM bugSWAT 2024, and walked out with $50,000 after cracking the source code and finding serious vulnerabilities in Google’s flagship AI model. Guess what, they’ve done it again.

Cracking Google’s AI box

Joseph “rez0” Thacker, Justin “Rhynorater” Gardner, and Roni “Lupin” Carta went to work testing the limits of Gemini’s coding environment. They quickly realized they could list files inside the sandbox – a big no.

That led them to a massive 579MB binary file that shouldn’t have been accessible. But stealing a huge file from a locked-down system? That takes finesse.

Using a mix of Python scripting and Caido, a security testing tool, they exfiltrated the binary in chunks. What they found inside was wild:

Internal Google Source Code: fragments of Google3, the company’s private code repository.

Gemini’s Internal APIs: a tool that allows AI with Google Flights or YouTube.
Security Protos: basically, Google’s internal classification and data protection blueprints.

At first glance, they might seem harmless, but leaking these files can give a pretty detailed peek into Google’s internal architecture.

Don’t miss our latest stories on Google News

“We reported these proto leaks because we know that Google treats them as highly confidential information that should never be exposed,” said researchers.

The hackers also managed to weaponize the AI’s own reasoning process against itself. Inspired by a research paper called ReAct, they manipulated Gemini’s chain-of-thought execution trying to make the AI give them even more access.

“With the help of the Google Security Team, we tested this idea and observed that, depending on factors like the generation seed and temperature, we could occasionally access what appeared to be a more privileged sandbox,” said the researchers.

They didn’t quite bust out of the sandbox entirely, but they got close enough to make Google sweat.

“Even though our tests were limited, the core idea still has some real potential if we push it further,” they said.

So, how screwed are we?

The good news? Google patched the flaws, and the researchers got paid. The bad news? This is just the tip of the iceberg for AI security.

These models aren’t just generating text, they’re accessing internal systems, pulling sensitive data, and making real-world decisions. When they get it wrong – or when hackers figure out how to exploit them – the consequences could be devastating.

Share

Post

Share