
Large language models (LLMs) outperformed seasoned human experts in a multiple-choice cybersecurity test. But why shouldn’t human professionals worry too much about it?
AI is not (yet) replacing workers en masse, and cybersecurity experts aren’t on the line either.
Such a conclusion can be drawn from an experiment conducted at the RSAC 2025 Conference, a major cybersecurity event held annually in San Francisco.
The experiment revolves around the game AI Showdown, which is based on a multiple-choice test across 21 cybersecurity categories.
The authors invited 279 cybersecurity experts, nearly half of whom have 10 or more years of experience, to answer 625 difficulty-calibrated questions. Their answers were compared with those produced by 39 LLMs.
LLMs were found to outperform humans across all 21 topical categories.
The human experts’ best average performance had a failure rate of 19% in the “Law” category and couldn’t match the LLMs’ worst effort, with a 17% failure rate for “Open Source Tools.”
Of the 39 LLMs used in the experiment, only three had a higher failure rate than that of humans: qwen2-0.5b-instruct, qwen1.5-0.5b-chat, and llama-7b.
At the individual level, top human professionals achieved 100% in at least one category, usually the one aligned with their expertise. Meanwhile, LLMs achieved perfect performance across a broader range of topics.
Humans are losing, but remain irreplaceable
But if AI puts human cybersecurity experts to shame, why shouldn’t they be worried about being replaced by the technology?
The experiment authors emphasize that the test assessed the ability to recognize the single correct answer among a set of four rather than recall the answer from memory.
Given that LLMs have been trained on vast volumes of human-generated knowledge, pattern recognition is what they excel at. Meanwhile, real-world cybersecurity problems rarely resemble well-designed multiple-choice questions.
Moreover, data suggests that when LLMs are asked the same questions in an open-ended format, their accuracy drops significantly.
There is also a difference in the conditions under which the test was performed. Humans answered questions under time pressure at a conference, whereas LLMs were not under time constraints.
Moreover, the questionnaire was created using LLMs, potentially giving AI tools an advantage in answering them.
The experiment's authors say that LLMs could help solve the issue of many human professionals – the lack of perfect recall.
They conclude: “Instead of searching the internet for something they used to know but have forgotten the details of, cybersecurity experts can just ask their favorite (search-capable) LLM.”
Unlock more exclusive Cybernews content on YouTube.
Your email address will not be published. Required fields are markedmarked