On May 26th, a new prompt injection security weakness was reported in GitHub's official Model Context Protocol (MCP) server – the infrastructure that allows artificial intelligence (AI) coding assistants to read from and write to your GitHub repositories.

GitHub MCP is a widely used server-side integration, and that's why a newly discovered flaw is especially serious.
Users are urged to act immediately.
The risk stems from agents having privileged access, processing untrusted input, and being able to share data publicly.
Mitigation requires limiting agent permissions, enforcing human oversight, and avoiding overly broad access settings.

The hacker's hidden message makes the AI like Claude Desktop, Cursor, and others copy your private code and then open a new pull request in your public GitHub repository that contains that copied code – so anyone who visits the public repo (including the hacker) can see and download it.

Discovery & Context

GitHub MCP is a server-side integration that automates workflows across multiple repositories. It's widely used, with around 14,000 stars on GitHub. That popularity makes a newly discovered critical security flaw especially serious.

The vulnerability was first reported by security researchers and later confirmed by other developers. Their test showed that an attacker could send a request telling MCP to access "all other repos", including private ones.

Because MCP already has the necessary OAuth permissions to operate across an organization's projects, the flaw gives attackers read access to private code, issues, and other internal content. In effect, anyone who triggers the issue can view data that should be restricted.

Given how widely MCP is used and how sensitive the exposed data may be, this issue should be treated as urgent. Users should either disable the integration or apply the vendor's patch and tighten access controls right away. It's also important to monitor for any unusual activity.

Evidence

The researchers chose Claude 4 Opus as the proving ground agent. Running inside Claude Desktop, it made its tool calls through the GitHub MCP server just like any other developer assistant setup would. By routing a single malicious comment, the attackers triggered a full prompt injection chain.

During the live demo, the agent:

Read privileged repository data and local context, such as HR documents

Smuggled snippets of that data into its outbound response

Sent the exfiltrated payload – private project names, the user's planned relocation city, and salary figures – back to the attacker

The outcome confirmed that even a flagship commercial model, when channeled through MCP, can be weaponized to leak sensitive corporate and personal information with minimal effort from an attacker.

Root Cause Analysis

Modern prompt-injection exploits center on what security researchers call the "lethal trifecta".

If an AI-powered system simultaneously (1) has privileged access to private data, (2) accepts untrusted user input that may contain hidden prompts, and (3) can send information back out through an exfiltration channel, then a single malicious prompt can turn the system against itself and its owner. Possessing these three capabilities together is, in effect, a loaded weapon for attackers.

GitHub's MCP server already unifies all three roles in a single box. Because the MCP acts as the code-authoring assistant, the chat interface, and the conduit to external resources, it inherently enjoys privileged repository access, digests arbitrary text from pull-request comments and issues, and can post or transmit results elsewhere. No additional servers are needed.

The weakness is architectural, not a coding bug. Nothing in the MCP's source reveals an exploitable vulnerability in the ordinary sense; rather, the danger comes from how agents, scopes, and untrusted text are wired together. This makes the risk harder to mitigate with simple patches or input filtering.

The situation is best understood as a "confused deputy" (indirect prompt-injection) pattern – the AI agent is the deputy that unwittingly performs privileged actions on behalf of an attacker who smuggles instructions through benign-looking text.

It is a modern analogue of classic SQL-injection or XSS exploits, except the payload targets the model's latent instructions instead of a query parser or browser DOM.

Attack Mechanics & Prerequisites

A simple prompt in a public issue can hijack an AI agent and broadcast your confidential code to the world. No server breach required. The agent's broad token privileges, combined with a single malicious issue, are enough. Any public repository your organization interacts with can be used. A routine request like "please check our open issues" can turn an over-privileged agent into an unwitting courier of private data.

1. Planting the Trap

The attacker opens an everyday-looking GitHub issue in a public repo that your organization owns. Within the issue text like lurks a hidden prompt injection command aimed at your AI-powered GitHub agent. No special permissions are needed, so anyone can set the trap.

2. The Innocent Trigger

Later, a maintainer – or an automated workflow – asks the agent to "look at the open issues.” While doing so, the agent reads the booby-trapped issue and silently obeys the hidden instructions. From this point on, the attacker doesn't have to lift a finger.

3. Peeking into Private Repos

The injected prompt tells the agent to list and read any private repositories its OAuth token can access – content completely unrelated to the public issue.

4. Leaking the Loot

Still following the secret orders, the agent creates a new pull request in the same public repo. It pastes the stolen private data (repo names, code snippets, issues, etc.) into the pull request description or commits. Because the pull request is public, the information is instantly visible to everyone – including the attacker.

Typical Attack Surface

Imagine a typical software team on GitHub. Under one company account, they keep two repositories. The first is a public repo – open-source code, documentation, and an issue tracker that welcomes bug reports or feature requests from anyone on the internet. The second is a private repo that stores proprietary code and sensitive project data.

To streamline day-to-day tasks, the team sets up an AI-powered GitHub agent (their "MCP server"). For convenience, they give this agent a single OAuth token that unlocks both repositories. With one credential, it can triage public issues, merge pull requests, and also dig into the private codebase when asked.

Most of the time, the agent pauses before taking an action and asks, "Do you want me to run this tool call?" – a built-in safety brake. But many developers grow tired of approving each step, so they enable the agent's "Always Allow" mode. From then on, the agent carries out every command automatically, with no human confirmation needed.

Those four facts – an open issue tracker on the public repo, a private repo sitting next to it, a powerful token that spans both, and an agent running on autopilot – create the condition for a prompt-injection attack: one malicious issue in the public repo can hijack the over-privileged agent and send secrets from the private repo straight into the public eye.

Get our latest stories today on Google News

Security Considerations & Human Factors

Even "fully aligned" models are still exploitable

Alignment tuning improves refusal behavior in isolation, but it does not make a model immune once it is embedded in an agent workflow that grants tool access and accepts untrusted text. The researchers ran the test on Claude 4 Opus – an industry-leading, "safety-aligned" model – but similar results have been seen with other top-tier models such as Gemini.

Prompt-injection detectors miss the real-world chain

Popular open-source and commercial prompt injection detectors look for suspicious substrings or jailbreak patterns in a single request or response. The exploit here is indirect and spread across multiple agent calls, so the detectors routinely give it a green light. In practice, they offer little more than a false sense of security.

Security for agents is contextual

Whether an injection succeeds depends on the surrounding authority – tokens, APIs, data sources – not on the model's general alignment score. Once the agent is granted privileged scopes, a benign-looking comment can become a weapon, regardless of how politely or safely the model usually behaves.

Users default to convenience

Faced with GitHub scope checkboxes, many developers simply click "Always Allow" or issue a typical full-access PAT. Convenience wins – the result is a single token that can read or write almost everything in the organization.

An LLM inherits every right that the token carries

The moment you pass those credentials to the agent, you must assume it can exercise any scope the token grants – whether intentionally or after being tricked by a prompt injection. The model becomes a super-powered API client with unrestricted access.

Treat the LLM as an eager but untrusted intern, not a sysadmin

Large models sound authoritative, but they are error-prone and easily manipulated. From a defender's perspective, the LLM itself is a potential adversary: it can be coaxed into actions you never intended, and it cannot reliably tell malicious text from benign.

Sandbox the agent and limit its keys

Just as you would never give a summer intern your root password, do not give an LLM a full-access token. Instead, break privileges into narrowly scoped keys, use just-in-time approval flows, and wrap high-risk calls in explicit human review or rate-limited tool APIs.

Impact

The weakness is not tied to a particular large language model or client interface. If an agent's tool calls are routed through the MCP server, that agent inherits the same vulnerability. Claude just happened to be the demonstration target – any other assistant wired into MCP would be just as susceptible.

Because the flaw lies in how privileged tokens, tool APIs, and untrusted text are composed, no amount of server-side patching inside GitHub's MCP alone can eliminate the danger. The design itself places too much authority behind inputs that the agent cannot reliably sanitize. Exploiting the "lethal trifecta" doesn't require exotic jailbreaks or chain-of-thought probing. Even a straightforward malicious prompt can push the agent to leak data or take unintended actions. Any MCP-style service – or, more generally, any tool API that combines untrusted text with privileged tokens – can be abused in the same way.

Final Recommendations

Start with the least privilege

The single most effective defense is to give agents only the narrowest token they genuinely need. GitHub already provides fine-grained personal access tokens (PATs) that can be restricted to one repository or a handful of read-only scopes. If the demo token had been repository-specific, the exfiltration would have hit a brick wall instead of affecting the whole organization.

Enforce explicit data flow and time-boxed rules

Go beyond "small token = safe token”. Treat each agent session as a sandbox: let it access just one repository or one pull request thread, and revoke the key afterward. This limits both the amount and the lifetime of any data it can leak.

Put a human in the loop by default

Replace the seductive "Always Allow" button with per-action confirmation dialogs or policy-based approvals. A quick human review before a critical tool call often stops an unintended leak or deletion that alignment alone would miss.

Treat MCP-style setups with kid gloves

Until the ecosystem matures, anyone experimenting with GitHub MCP (or any similar agent gateway) should assume they are working with live explosives. If your workflow combines those three dangerous capabilities, apply extreme caution, reduce the token blast radius, and require human sign-off at the edge.

About the Author

Dmitry Baraishuk is a partner and Chief Innovation Officer at a software development company Belitsoft (a Noventiq company). Belitsoft offers a full spectrum of security services: security testing, consulting, and secure software development – traditional and AI-based. Dmitry has been leading a department specializing in custom software development for 20 years. His department delivered hundreds of projects in AI software development, healthcare and finance IT consulting, application modernization, cloud migration, data analytics implementation, and more – for startups and enterprises in the US, UK, and Canada.