OpenAI releases another AI agent, with a focus on software security


The new agent is currently available only to select partners in private beta.

The company has introduced Aardvark, a “security researcher” that’s powered by ChatGPT-5.

The aim of the new agent is to help identify potential vulnerabilities and, in doing so, enhance software security for companies.

ADVERTISEMENT

OpenAI calls Aardvark “an autonomous agent” that can aid security teams and developers in finding and fixing security vulnerabilities.

For now, Aardvark is available in private beta.

How does the AI agent for software security work?

According to OpenAI, Aardvark looks for errors in the same way that an actual security researcher would: by reading the code, analyzing it, using tools, and running tests.

For this, Aardvark utilizes “LLM-powered reasoning and tool-use” to identify vulnerabilities and devise fixes.

It begins with the agent monitoring source code repositories, analyzing commits, and then scanning for vulnerabilities, prioritizing those that pose the greatest risk.

Once the agent identifies a possible vulnerability, it will test it in a “sandboxed environment” to assess its exploitability. After this, Aardvark provides fixes through Codex, OpenAI’s AI coding agent for developers, which helps write, review, and ship code.

The company shares that Aardvark can also find bugs “such as logic flaws, incomplete fixes, and privacy issues.”

ADVERTISEMENT
jurgita justinasv Izabelė Pukėnaitė vilius Ernestas Naprys Gintaras Radauskas
Add us as your Preferred Source on Google

Aardvark is already in use

OpenAI has been using the feature to strengthen its own systems. The feature also received positive feedback from the company’s external alpha partners.

According to the company, Aardvark identified 92% of faux vulnerabilities during testing, demonstrating “high recall and real-world effectiveness.”

The Cybernews community is talking about this. Be a part of the conversation.

The feature has already been utilized in open-source projects, where Aardvark identified numerous vulnerabilities, ten of which have been assigned as Common Vulnerabilities and Exposures (CVE) identifiers.

In its statement, the company also shares that it’s updating its outbound coordinated disclosure policy, which explains how the company will report on security vulnerabilities found in third-party software or systems to the developers and vendors, making it more lenient towards developers.

Aardvark: scope vs accuracy

Although OpenAI’s new agent is still in beta, it may be made publicly available in the future. More companies could place their trust in Aardvark “because the productivity gains are too great to ignore,” says Philippe Dourassov, co-founder and CEO of Haicker.

ADVERTISEMENT

“An agentic security researcher can analyze millions of lines of code that no human team could ever fully review. Human experts excel at depth, not at constant, exhaustive coverage,” explains Dourassov.

“AI agents can operate continuously, scale across entire codebases, and identify issues in minutes that would take humans months to uncover. This level of scalability will make adoption inevitable as early adopters prove clear improvements in both speed and security outcomes.”

While it might be easy to imagine Aardvark opening the gates to potential security perils, Dourassov notes that its biggest challenge isn’t related to malicious intent, but rather with accuracy.

“Identifying real vulnerabilities from code alone is extremely complex,” says the expert, sharing an example when Google’s internal security team shared that it found 20 security vulnerabilities thanks to AI, but still had to use human specialists to look for hallucinations and false positives.

He also notes that while Aardvark states it uses a sandbox to look for vulnerabilities, this is something that’s hard to execute in real life.

“Most companies don’t have easily reproducible test environments that fully mirror their production setups. Applications often depend on complex chains of microservices, databases, and configurations that are impossible to emulate in isolation,” says Dourassov.


Unlock more exclusive Cybernews content on YouTube.

ADVERTISEMENT