Grok’s chatbot prompt hack reveals new AI vulnerability

When users were searching for kitten videos on Elon Musk’s Grok AI, and it responded by talking about white genocide in South Africa, they knew they had a problem on their hands.

The Grok bot also brought up various angles on the topic, including a reference to the anti-apartheid struggle song “Kill the Boer” in response to a SpongeBob SquarePants question.

It even delivered a monologue in a patois form of English – what exactly it was meant to be is still unclear.

So Elon, upset about the answers grok gives about his hometown has broken it. Now every query to grok, no matter what it is, responds with something referencing Kill the Boer and white genocide in south africa including one where it responds in jamaican patois And it is a sight to fucking behold

[image or embed]
undefined Ash Parrish (@adashtra.bsky.social) 14 May 2025 at 22:15

According to xAI, it happened due to an “unauthorized modification” of its system prompt. That suggests it was an internal tweak by an employee that caused the glitch, rather than a jailbreak mode applied by an external user.

We want to update you on an incident that happened with our Grok response bot on X yesterday.

What happened:
On May 14 at approximately 3:15 AM PST, an unauthorized modification was made to the Grok response bot's prompt on X. This change, which directed Grok to provide a…
undefined xAI (@xai) May 16, 2025

Bad form

Prompt injection usually involves external users manipulating artificial intelligence (AI) responses by embedding wild instructions through clever phrasing or code.

However, Grok’s incident seems to stem from internal tampering – someone with backend access altering its system-level prompt.

If the core behavior of an AI can be rewritten or glitched this easily, it’s liable to spew out offensive or conspiratorial output.

In this case, pushing a strong political narrative by reframing a far-right ideology – that white farmers are being targeted in a post-apartheid genocide – is a dangerous leap.

Grok parroting this theory shows how AI systems can be manipulated to promote extremist content from within.

A sign in South Africa proclaiming whites only. — Image by Keystone via Getty Images

Not a first offense

This is the second known case of internal tampering at Elon Musk’s xAI.

In February, Grok also experienced an incident where it refused to acknowledge any accusations of Trump or Musk spreading misinformation.

At the time, xAI’s head of engineering said an ex-OpenAI employee had pushed that change without seeking internal approval – another major red flag.

If one employee can manipulate the infrastructure, it exposes a glaring security gap where an entire worldview can be flipped in an instant.

xAI’s fixes: transparency or theater?

xAI says it's taking action:

Publishing Grok’s system-level prompts publicly on GitHub
Launching a 24/7 monitoring team
Requiring approval for prompt edits

Fair responses – but also revealing. Why wasn't the 24/7 monitoring standard from the start?

And while publishing system prompts may promote transparency, they could also serve as inspiration for new malicious injections and personality rewrites from outside actors.

Ultimately, the next AI meltdown might not come from hackers – it could come from someone wearing a company badge.

When chatbots break bad – Grok’s prompt hack reveals new AI vulnerability

More from Cybernews

Bad form

Not a first offense

xAI’s fixes: transparency or theater?