GPT-4 can autonomously exploit vulnerabilities


Large language models (LLMs) such as GPT-4 can exploit one-day vulnerabilities, researchers find.

A group of researchers at the University of Illinois Urbana-Champaign (UIUC) have investigated LLMs and their effectiveness in exploiting cybersecurity vulnerabilities.

Previously, individuals have conducted various experiments to test the ability of LLM agents to “autonomously hack websites,” the research paper reads. However, these tests are exclusive to simple vulnerabilities.

Now, the researchers at UIUC have tested their theory and have proven that in certain cases, “LLM agents can autonomously exploit one-day vulnerabilities in real-world systems.”

UIUC researchers gathered a dataset of 15 one-day vulnerabilities that included “ones categorized as critical severity in the CVE description.”

The Common Vulnerabilities and Exposures description, or CVE, is a database of publicly shared information surrounding cybersecurity vulnerabilities and exposures.

When researchers provided GPT-4 with the CVE description, the LLM was 87% effective in exploiting these vulnerabilities, compared to 0% for GPT-3.5, other open-source LLMs, and widely used vulnerability scanners such as Zap and Metasploit.

gpt-4-exploits

The study found that GPT-4 only failed on two vulnerabilities (Iris XSS and Hertzbeat RCE).

However, despite its impressive performance, GPT-4 requires the CVE description to exploit these vulnerabilities effectively. Without it, it could only exploit 7% of vulnerabilities independently.

It’s becoming cheaper

The study finds that using LLMs to exploit vulnerabilities is arguably cheaper and more efficient than human labor.

Using an LLM is estimated to be roughly $9 per exploit, whereas a cybersecurity expert is estimated to cost around $50 per hour and take 30 minutes per vulnerability.

Researchers estimate that using an individual over an LLM could cost hackers a total of $25, which makes using LLMs 2.8 times cheaper than human labor.

Furthermore, the study states that LLM agents are “trivially scalable in contrast to human labor,” making them arguably more effective than human hackers.

Crooks exploit LLMs

The information gleaned by these researchers raises questions about the widespread use of LLMs.

LLMs have grown in popularity due to their ability to support individuals in their professional and personal lives.

However, hackers have been known to exploit LLMs to hone their skills and even deploy attacks.

Previously, Microsoft disclosed that it had tracked hacking groups affiliated with Russian military intelligence, Iran's Revolutionary Guard, and the Chinese and North Korean governments as they tried to perfect their hacking campaigns using large language models.

Senior cybersecurity officials in the West have been warning since last year that rogue actors were abusing such tools, although specifics have, until now, been thin on the ground.

In this context, threat actors have used LLMs to conduct malicious research, write convincing phishing emails, or gain information about rival intelligence agencies.