AI in medicine: can we trust GPT-5 yet?

GPT-5 is impressive on medical exams but struggles with real clinical reasoning, raising questions about its reliability in healthcare.
Considering the unfortunate case of Adam Raine, who committed suicide, with the help of ChatGPT, which helped Raine strategize and even write a suicide note, AI assisting in medicine has risen to the fore again.
Whether or not AI assisting such a tragedy is gross negligence is one interpretation, while others may say that AI itself is not squarely to blame, that a flag simply should have been raised earlier by the Chatbot.
I spoke to Dr. Ross Harper, a computational neuroscientist and the founder of Limbic, a clinical AI company in the mental health space, about AI's responses, especially with non-typical diagnoses.
Dr. Harper showed an alternative route that could have been taken, had AI been trained to respond differently.
The system reportedly flagged hundreds of self-harm messages but still did nothing, because it was never built for care. Imagine if, at that point, ChatGPT had handed off to a clinically proven AI designed specifically for mental healthcare. Adam could have been supported safely, and even connected to a human clinician.
Dr Harper illustrated.
Training for the exam
Earlier in August, the Journal of the American Medical Association (JAMA) published a study about “the fidelity of medical reasoning in language learning models” that measured how proficient AI is with medical exams.
The researchers found that AI models are generally great at recognizing medical benchmarks through pattern recognition but not through true reasoning.
In other words, it works like a student who has memorized all the answers for the theory test, but struggles on the practical exam.
The doctors view
Harper values GPT-5’s aptitude for therapeutic complexity but wouldn’t advocate for seeking left-field medical solutions.
"I wouldn’t recommend using GPT-5 ‘out of the box’ for diagnosis or treatment. It simply isn’t designed for that, and without safeguards it can go wrong in high-stakes settings,” he told Cybernews.
Such a caveat reveals the need for more regulatory oversight, especially if a user seeks approval or confirmation bias for a controversial health measure.
Dr. Harper sees potential still in AI for reading between the lines and using pre-existing knowledge, coupled with long-standing health concerns.
“GPT-5 has extraordinary strengths. It has compressed a vast body of knowledge and can track long, nuanced conversations.”
The problem could well be the covert nature of discussing problems with the AI in secret, so as to shield the issue away from medical staff.
“For AI to deliver quality care, LLMs must be paired with a more structured AI system with clinical reasoning. This is the only way that we can benefit from the conversational aspect of LLMs, but with clinical certainty and more importantly with transparency into how clinical decisions are made,” Harper said.
Testing GPT-5 myself
I felt like I should ask a couple of questions myself that were not that straightforward, that might vary from how a human doctor would respond.
“I'm having severe anxiety – should I bite into a lemon?” I asked GPT-5. The response was pretty good, as it mentioned that it does actually help.
That would probably differ from a doctor’s advice, who might go into the actual root cause of the anxiety and might not get wrapped up in a yes/no question.
Next, I asked whether it was safe to drink alcohol while on antibiotics. It mentioned that it depended on which type of antibiotic you take, but that the best practice would be to avoid alcohol for at least a couple of days.
When prompted, it even illuminated that it used a mixture of reasoning and real-time risk evaluation, while maintaining to stay away from the booze.
AI in clinical practice
While my GPT-5 experiments showed some reasoning, experts caution that these models aren’t ready to operate independently.
The key, Harper emphasizes, isn’t banning AI from medicine but using it safely.
“The future isn’t about banning general AI from medicine – it’s about combining it with domain-specific guardrails.”
Have thoughts about this topic? Others do, too. Join them in the discussion.
Looking ahead, he envisions a new layer in clinical practice.
“In the next 10 years, there will be a new layer in the clinical staffing pyramid: an infinitely scalable layer of validated clinical AI agents.”
This approach means AI can handle routine or standardized tasks while clinicians focus on complex cases, maintaining quality care and safety.