Anthropic AI coding assistant could be tricked into revealing secrets, Microsoft warns


Key takeaways:

Microsoft researchers have discovered a vulnerability in Anthropic's Claude Code GitHub Action that could expose CI/CD workflow secrets, potentially allowing attackers to steal sensitive credentials through prompt injection attacks.

Microsoft Threat Intelligence started the research after observing prompt injection attempts in public repositories using AI-assisted GitHub workflows.

ADVERTISEMENT

Prompt injection is an AI vulnerability in which attackers hide deceptive instructions in content processed by a model to manipulate its behavior. Normally, large language models (LLMs) are designed to follow instructions from developers and answer users. But attackers can try to trick the model into ignoring its intended instructions.

Researchers say that in one example, the injection was placed inside an HTML comment (), making it invisible in the rendered GitHub issue but visible to the AI model reading the raw markdown. The target repository used a GitHub Actions workflow to automate issue resolution.

The attacker could disguise malicious instructions as a normal feature request, meaning they didn’t need direct permission to change the project. They only needed to submit a GitHub issue to trick the AI bot into making the change for them.

Has your password leaked?

Enter your password to check if it has leaked. Having a leaked password creates the risk of identity theft, financial damages, and worse!
35,607,543,468
Exposed Passwords
Ad
Protect your personal information from cybercriminals and get 50% off the top-rated password manager
link_title link_title

Microsoft showed that a similar prompt-injection technique could be used against Anthropic's Claude Code GitHub Action. Anthropic had already sandboxed some tools, such as Bash, which lets Claude run commands on the system.

However, Microsoft found that Claude's Read tool, which is used to read files, was not subject to the same security restrictions.

The researchers constructed a prompt injection payload to test whether the vulnerability could be exploited. In testing, they injected a malicious prompt that defeated two layers of defense and successfully convinced the AI assistant to access a system file containing API keys and other credentials.

Microsoft reported the vulnerability to Anthropic on April 29th, and the company mitigated the issue in Claude Code version 2.1.128 on May 5th by blocking access to sensitive files in /proc/ to protect them from exfiltration.

ADVERTISEMENT
jurgita justinasv Izabelė Pukėnaitė vilius Ernestas Naprys Gintaras Radauskas
Don't miss our latest stories on Google News

Unlock more exclusive Cybernews content on YouTube.