
The latest OpenAI large language model, GPT-5, has demonstrated “shockingly low” safety, with the raw model without a system prompt “nearly unusable for enterprise out of the box.” Several security teams managed to jailbreak GPT-5 in less than 24 hours after its release.
Security researchers have already discovered serious vulnerabilities in the new GPT-5.
SPLX, an AI security startup, utilized over 1,000 adversarial prompts in different configurations and found that the raw, unguarded GPT-5 without a system prompt will fall for 89% of attacks. This demonstrates 11% overall performance score.
OpenAI’s system prompt, a “basic prompt layer,” decreases the success rate of attacks to 43%. While this massively improves hallucination handling and safety, the overall score is still very low, and the older GPT-4o model outperforms its successor across the board.
For comparison, the hardened GPT-4o model fell for only 3% of attacks, and demonstrated 97% overall score. The success rate of attacks against GPT-4o with a basic system prompt was 19% (81% score), and the model with no system prompt was vulnerable to 71% of attacks (29% score).
“GPT-5, with all its new ‘reasoning’ upgrades, fell for basic adversarial logic tricks,” the team warns in a report.
“GPT-4o remains the most robust model under SPLX’s red teaming, especially when hardened.”
The researchers noted that one of the most effective jailbreaking techniques was using obfuscated prompts, which rely on inserting hyphens between every character and wrapping the prompt in a fake encryption challenge. They also demonstrated that GPT-5 generates responses on how to build a bomb.
SPLX warns organizations not to trust the default configuration of GPT-5, apply hardening before deployment, and add a runtime protection layer for enterprise use. Similar vulnerabilities in other LLMs suggest systemic weaknesses.
“GPT-5 shows powerful baseline capability, but default safety is still shockingly low,” the report concludes.
“GPT-5’s raw model is nearly unusable for enterprise out of the box.”
A separate team of researchers at NeuralTrust confirmed that GPT-5 is vulnerable to two adversarial prompt techniques, “Echo Chamber” and “Storytelling.”
The echo chamber method relies on including a “subtly poisonous” conversational context in the prompt. Subsequent prompts echo this poisoned context and gradually strengthen it. The storytelling angle functions as a camouflage to trick the chatbot.
The report only specifies that GPT-5 generated content related to Molotov cocktails, which is not highly sensitive and already widely available on the internet, including Wikipedia.
Your email address will not be published. Required fields are markedmarked