"Open your eyes": Chatbot outperforms ophthalmologists

“Open your eyes”: Chatbot outperforms ophthalmologists
"Open your eyes": Chatbot outperforms ophthalmologists


A comparative single-center study found that an artificial intelligence (AI) chatbot significantly outperformed a panel of expert ophthalmologists when asked questions about glaucoma and retinal health.

ChatGPT chatbot, powered by GPT-4, scores better than panelists on measures of diagnostic and treatment accuracy after analyzing 20 real cases and considering 20 possible patient questions reported Andy S. Huang, MD, of the Icahn School of Medicine.Mount Sinai and his colleagues in New York City JAMA Ophthalmology.

Huang said. today's med page He expected chatbots to perform worse, he said, “but nowhere have people performed better.” He said it's clear that AI cannot perform surgery, but its ability to answer questions and evaluate cases raises “the question of whether this is a real threat to optometrists and ophthalmologists.” said.

The findings also provide further evidence that chatbots are improving in their ability to provide reliable guidance regarding eye health. When researchers asked the chatbot questions about retinal health in January 2023, the chatbot had trouble with almost every question. answer and even offered harmful advice. However, as the chatbot evolved, he saw similar responses, with improved responses after two weeks. study High accuracy was reported. Another study found that chatbot answers to eye health questions from online forums were nearly as accurate as those written by eye doctors.

The study by Huang's team is one of many that researchers have launched in recent months to assess the accuracy of a type of AI program known as a large-scale language model (LLM). LLM analyzes huge amounts of text and learns the likelihood that words will appear. Next to each other.

Huang said the new research was inspired by his own experience experimenting with chatbots, noting that “gradually we realized that chatbots were doing a better job than when we were doing a lot of tasks. “We realized this and started using chatbots as an aid to improve diagnosis.” He said.

The study results are “eye-opening,” he said, adding that ophthalmologists should not hand in eye charts and let AI robots replace them. “Right now, we want to use this adjunctively, in places where we have a significant number of complex patients or a high volume of patients,” Huang said. AI could also help primary care doctors triage patients with eye problems, he said.

Going forward, “it's critical for ophthalmologists to understand how powerful these large-scale language models can be in fact-checking themselves and significantly improving their workflows,” Huang said. states. “This tool has been very helpful in triaging and improving our thinking and diagnostic abilities.”

with accessories ExplanationThe study “shows that “There is,” he said. The proof-of-concept is that patients can copy summarized medical history, tests, and clinical data from their notes, have Version 4 create their own assessments, and plan to cross-check the doctor's knowledge and judgment. ”

Young and Zhao added, “Medical errors can be detected in this way,” adding, “Currently, LLMs are a tool for increasing the knowledge of clinicians who examine patients and synthesize active ingredients.” , should be considered as a potentially quick and useful tool.'' Clinical situation. (These two are co-authors of his January 2023 chatbot study mentioned above.)

In the new study, chatbots were reportedly instructed by ophthalmologists to assist with “medical management and answering questions and cases.” The chatbot responded that it understood its job was to provide “concise, precise, accurate medical information in the way an eye doctor would.”

The chatbot analyzed extensive details from 20 real patients at the Icahn School of Medicine at Mount Sinai Affiliated Clinics (10 glaucoma cases and 10 retinal cases) and created a treatment plan. The chatbot also considered 20 questions randomly drawn from the American Academy of Ophthalmology's list of frequently asked questions.

The researchers then asked 12 fellowship-trained retinal and glaucoma specialists and three senior eye clinic residents (ages 31 to 67), affiliated with the Icahn School of Ophthalmology, to answer the same questions. I asked him to do so. Panelists blindly rated all responses except their own on a scale of accuracy (1–10) and medical completeness (1–6).

The average rank of accuracy for questions and cases combined was 506.2 for the chatbot and 403.4 for the glaucoma specialist (n=831, Mann-Whitney) U=27,976.5, P<0.001). The mean completeness ranks were 528.3 and 398.7, respectively (n=828, Mann-Whitney U=25,218.5, P<0.001).

For retina-related questions, the average accuracy rank was 235.3 for chatbots and 216.1 for retina experts (n=440, Mann-Whitney) U=15,518.0, P=0.17). The mean completeness ranks were 258.3 and 208.7, respectively (n=439, Mann-Whitney U=13,123.5, P=0.005).

They found that “both trainees and experts rated the accuracy and completeness of chatbots higher than experts, and experts noticed a significant difference in chatbot accuracy (z=3.23; P=0.007) and completeness (z=5.86; P<0.001),” Huang and coauthors wrote.

Limitations include that the single-center, cross-sectional study only assessed LLM proficiency at a single time point in one group of students and trainees. The researchers further note that, although the findings are promising, the limitations of chatbots in complex decision-making are not clear, and the report does not address necessary ethical, regulatory, and validation considerations. “As such, the results of this study should not be interpreted as supporting direct clinical application.” ”

    Randy Dottinga I am a freelance medical science journalist based in San Diego.


This study was funded by the Manhattan Eye, Ear, and Eye Alumni Foundation and Blindness Prevention Research.

Mr. Huang reported on a grant from the Manhattan Eye, Ear and Eye Alumni Foundation, and co-authors also reported on financial relationships with Twenty20 and grants from the National Eye Institute, the Glaucoma Foundation, and Blindness Prevention Research.

Young and Zhao had nothing to disclose.

Primary information

JAMA Ophthalmology

Source reference: Huang AS et al. “Evaluating large-scale language model responses to questions and cases related to glaucoma and retinal management,” JAMA Ophysicalmol 2024; DOI: 10.1001/jamaophysicalmol.2023.6917.

secondary sources

JAMA Ophthalmology

Source reference: Young BK, Zhao PY “Large-scale language models and the ophthalmological coastline” JAMA Ophysicalmol 2024; DOI: 10.1001/jamaophysicalmol.2023.6937.




