GPT-4 ‘beats doctors’ in accurately assessing eye problems


Artificial intelligence could be used to triage patients with eye problems, according to Cambridge University researchers.

GPT-4, a large language model developed by OpenAI, was found to “significantly” exceed the ability of non-specialist doctors to assess eye problems and provide advice. A study led by the University of Cambridge also found that clinical knowledge and reasoning skills were approaching the level of specialist eye doctors.

The model was tested against doctors at different stages in their careers, including unspecialized junior doctors, trainees, and experts. Each was presented with a series of 87 patient scenarios involving a specific eye problem and asked to give a diagnosis or advice on treatment by selecting from four options.

GPT-4 was given the same task and scored significantly better than unspecialized doctors and had similar results to trainee and expert eye doctors, according to researchers.

The test included questions about a wide range of eye problems, such as extreme light sensitivity and decreased vision, that were taken from a textbook not freely available on the internet, making it unlikely that its content was included in GPT-4’s training datasets.

While AI models are unlikely to replace healthcare professionals, the technology could be used to streamline clinical workflow by providing advice or diagnosis in “well-controlled” contexts or where access to professional healthcare is limited.

“We could realistically deploy AI in triaging patients with eye issues to decide which cases are emergencies that need to be seen by a specialist immediately, which can be seen by a GP, and which don’t need treatment,” said Dr Arun Thirunavukarasu, lead author of the study.

He said that AI models followed clear algorithms with GPT-4 “as good as expert clinicians” at processing eye symptoms and signs to answer more complicated questions. AI could also be used to advise general practitioners struggling to get prompt advice from eye specialists and reduce patient waiting times.

The study brings the use of AI in medical settings closer to reality, yet significant work remains to be done to fine-tune and develop such models. Researchers said their study was superior to similar previous studies because it compared the abilities of AI to practicing doctors rather than to sets of examination results.

“Doctors aren't revising for exams for their whole career. We wanted to see how AI fared when pitted against the on-the-spot knowledge and abilities of practicing doctors to provide a fair comparison,” said Thirunavukarasu.

He added: “We also need to characterize the capabilities and limitations of commercially available models, as patients may already be using them – rather than the internet – for advice.”

The results of the study were published in the journal PLOS Digital Health.