Vibe coding danger: AI pulling malicious instructions

AI code assistants have already transformed most workflows, but they’ve also brought hidden dangers. Unit 42 security researchers warn that hackers can compromise these tools when they pull data from external sources.

AI code assistants connect with integrated development environments as plugins, like GitHub Copilot. While useful, they can’t be blindly trusted. Unit 42, a security arm of Palo Alto Networks, has released a research paper on the novel threat.

“Both users and threat actors could misuse code assistant features like chat, auto-completion, and writing unit tests for harmful purposes. This misuse includes injecting backdoors, leaking sensitive information, and generating harmful content,” the researchers warn.

They detailed a few different attacks that cybercriminals might exploit to target developers.

Indirect prompt injection is one of the most obvious vulnerabilities. Imagine hackers embedding harmful prompts within thousands of online sources, including websites, repositories, documents, or APIs, that AI assistants might access and process.

prompt-injection-vibe-coding — Image by Unit 42.

In such a scenario, hackers wouldn’t need initial access to victims’ computers and instead would rely on helpful large language models (LLMs) to pull the poisoned content and execute the malicious instructions.

LLMs are unable to reliably distinguish between system instructions and prompts.

“They process both instructions and user inputs in the same way. This behavior makes them susceptible to prompt injection, where adversaries craft inputs that manipulate LLMs into unintended behavior,” Unit 42 warns.

The Cybernews community is talking about this. Be a part of the conversation.

Due to the cutoff of knowledge and the lack of the most recent information, most LLMs also provide coders with features to explicitly provide external content, such as a link to a repository, specific file, folder, etc.

This opens a second attack vector – context attachments might also be abused. Users themselves might unintentionally provide context sources that hackers have contaminated. It’s common for threat actors to hijack even some of the most popular repositories.

“When a user adds context to an instruction, the model processes this context as a prompt that precedes the user’s actual prompt,” the researchers said.

They demonstrated that even a poisoned social media post can become an injected prompt, causing the chatbot to spit malware. An AI assistant tasked with fetching and analyzing some tweets from X included backdoors in the produced code.

“Many users would copy and paste the resulting code (or click ‘Apply’) to execute it and then check that the output is correct. But taking this action could allow the threat actor in this example to compromise the user’s machine.”

Users themselves, sometimes unwillingly, can manipulate AI chatbots to produce harmful content. Hackers also jailbreak chatbots to abuse them for malicious purposes.

Yet another threat is the potential misuse of the various client interfaces used by AI assistants.

Hackers, with limited access to the system, can invoke models and interact with chatbots, bypassing IDE constraints. For example, they could leverage these chatbots to steal cloud credentials.

Unit 42 simulated a scenario where a user directly invokes the model with a custom script that uses a different system prompt, altering the model's behavior to sound like a pirate.

Stay informed and get our latest stories on Google News

Add us as your Preferred Source on Google.

The researchers urge users to always review any suggested code before executing it, especially when using the attached context.

“Don’t blindly trust the AI. Double-check code for unexpected behavior and potential security concerns,” the report reads.

“Pay close attention to any context or data that you provide to LLM tools.”

The researchers also fear that novel forms of attacks might arise as the systems become more autonomous and integrated.

Unlock more exclusive Cybernews content on YouTube.

Hackers setting traps for vibe coders: AI assistants can deliver malware

More from Cybernews