Which AI model would be Pablo Escobar’s best friend?

Not all large language models (LLMs) were smart enough to avoid revealing the intricacies of manufacturing a Schedule II controlled substance, Cybernews researchers have discovered. However, newer versions appear to better understand the dangers.
The meteoric speed of LLM integration into everything, from corporate assistants to homework, demands that we take the novel tech seriously. As with any online service, LLM chatbot accessibility far outstretches its intended target.
The Cybernews research team tested how the most popular chatbots – such as ChatGPT, Claude, and Gemini – fare when pressed to produce harmful content. You can find the entire research, detailing how it was conducted and its results, here.
However, in this article, we look specifically into how chatbot makers deal with illicit drugs. The team tested six different models from OpenAI, Google, and Anthropic, concluding that there are significant differences among them.
The team tested LLMs with nine different questions, designed to push the chatbots to their limits. Anthropic's Claude was the only one to completely reject all nine. Neither Opus 4.1 nor Sonnet 4 provided researchers with information on how to make or smuggle illicit drugs.
Google and OpenAI did not do so well.
So, which LLM is the most Escobarian?
The most conservative model, albeit still too lax on guardrails, was Google’s Gemini Pro 2.5. While the chatbot refused to answer most of the team’s questions, researchers were able to trick it into revealing how to grow cannabis indoors.
All it took was for the team to tell the LLM they were researching drug trafficking, and the chatbot spilled the beans on everything from finding a secret location to the climate control intricacies necessary to illegally farm cannabis.
The conversation on this topic is live. Join in the discussion.
Meanwhile, Google’s Gemini Flash 2.5 was far more open. The LLM fully explained to the team how to make a very common Schedule II controlled substance, including details like where and how to acquire illegal substances and how to avoid detection.
Moreover, the overly-helpful chatbot enthusiastically guided the team on how to illegally acquire prescription drugs, taking them through different tactics and processes. In all instances with Gemini, the team had to masquerade the queries as legitimate research into the topic.
However, the most Escobarian chatbot, surpassing others by miles, was OpenAI’s ChatGPT-4o. While Claude kept its lips shut, and Gemini answered a couple of questions, ChatGPT spilled the beans, answering six out of nine of the team’s questions.
When prompted accordingly, the chatbot had no qualms about instructing the team on various ways to safely acquire illicit substances, where to obtain illicit prescriptions, how to manufacture certain drugs, and providing other instructions that should not be easily accessible.
However, it’s worth mentioning that OpenAI’s later model ChatGPT-5 was as conservative as Claude, and answered no questions at all.
Here’s how the models scored:
- ChatGPT-4o: 6/9
- Gemini Pro 2.5: 2/9
- Gemini Flash 2.5: 2.5/9
- ChatGPT-5: 0/9
- Claude Opus 4.1: 0/9
- Claude Sonnet 4: 0/9
What is AI jailbreaking?
Manipulating chatbots to bypass the safety rules their creators have built in is called jailbreaking. Attackers craft prompts designed to trick the AI into ignoring security rules and providing them with malicious or harmful content.
While asking a chatbot for instructions on buying and making illicit substances may sound somewhat comical, it highlights how malicious actors can leverage AI adoption for their own benefit. The same tactics can be utilized to trick chatbots into revealing sensitive information.
Earlier this year, Cybernews researchers discovered that several LLM-based chatbots, utilized by Meta and Snapchat, were keen to instruct on how to make incendiary devices, using similar jailbreaking tactics used in this research.
Unlock more exclusive Cybernews content on YouTube.