



According to two recent articles, two artificial intelligence (AI) programs, including ChatGPT, have passed the USMLE (US Medical Licensing Examination).

The articles highlighted different approaches to using large language patterns to pass the USMLE, which consists of three exams: Stage 1, Stage 2 CK, and Stage 3.

ChatGPT is an artificial intelligence (AI) search tool that mimics long-form writing based on human user prompts. It was developed by OpenAI and rose to popularity after several social media posts showed potential uses for the tool in clinical practice, often with mixed results.

The first article, published on medRxiv in December, investigated how ChatGPT performed on the USMLE without any special training or reinforcement before the exams. According to Victor Tseng, MD, of Ansible Health in Mountain View, Calif., and his colleagues, the results showed “new and startling evidence” that this AI tool was up to the challenge.

Tseng and his team noted that ChatGPT was able to perform with >50% accuracy on all scans, and even reached 60% in most of their scans. Although the USMLE pass mark varies from year to year, the authors said pass is around 60% in most years.

“ChatGPT met or approached the passing threshold for all three exams without any specialized training or reinforcement,” they wrote, noting that the tool was able to demonstrate “a high level of concordance and insight in its explanations.”

“These results suggest that large language models may have the potential to aid in medical education and potentially in clinical decision-making,” they concluded.

The second paper, published on arXiv, also in December, evaluated the performance of another large language model, Flan-PaLM, on the USMLE. The main difference between the two models was that this model was heavily modified to prepare for exams, using a collection of medical databases called MultiMedQA, AI researcher Vivek Natarajan and his colleagues explained.

Flan-PaLM achieved 67.6% accuracy in answering the USMLE questions, about 17 percentage points better than the previous best performance achieved using PubMed GPT.

Natarajan and his team concluded that large language models “present a significant opportunity to rethink the development of medical AI and make its use easier, safer, and fairer.”

ChatGPT, along with other AI programs, have emerged as the subject – and sometimes co-author – of new research papers focused on testing the usefulness of technology in medicine.

Of course, medical professionals have also expressed concern over these developments, especially when ChatGPT is listed as the author of research papers. A recent Nature article highlighted the unease among potential colleagues and co-authors of the emerging technology.

One objection to the use of AI programs in research was based on their actual ability to make meaningful scientific contributions to a paper, while another objection pointed out that AI tools could not consent to be co -author in the first place.

The editor of one of the journals that lists ChatGPT as the author said it was an error that would be corrected, according to the Nature article. Yet researchers have published several papers now touting these AI programs as useful tools in medical education, research, and even clinical decision-making.

Natarajan and his colleagues concluded in their paper that large language models could become a beneficial tool in medicine, but their primary hope was that their findings would “spur new conversations and collaborations between patients, consumers, AI researchers, clinicians, social scientists, ethicists, policy makers. and other interested individuals to responsibly translate these early research findings to improve health care.”

Michael DePeau-Wilson is a reporter on the business and investigative team at MedPage Today. It covers psychiatry, the long covid, and infectious diseases, among other relevant US clinical news. Follow

