Doctor ChatGPT? AI-bot almost passes the US Medical Licensing Exam
Increasingly it seems there is nothing that ChatGPT cannot do, even consulting judges in cases and boosting research.
Now, the AI chatbot has been found to score at or around the approximately 60 percent passing threshold for the United States Medical Licensing Exam (USMLE), “with responses that make coherent, internal sense and contain frequent insights.”
This is according to a study published on Thursday in the open-access journal PLOS Digital Health by Tiffany Kung, Victor Tseng, and colleagues at AnsibleHealth.
A test for medical disciplines
The USMLE is a highly standardized and regulated series of three exams required for medical licensure in the United States. Taken by medical students and physicians-in-training, the USMLE assesses knowledge spanning most medical disciplines, ranging from biochemistry, to diagnostic reasoning, to bioethics.
To see how the language mode would perform on this very complex exam, Kung and colleagues tested ChatGPT’s performance on the test. They removed image-based questions and proceeded to ask ChatGPT 350 of the 376 public questions available from the June 2022 USMLE release.
ChatGPT scored between 52.4 percent and 75.0 percent across the three USMLE exams. These scores bode particularly well as the passing threshold each year is approximately 60 percent.
ChatGPT also demonstrated 94.6 percent concordance across all its responses and produced at least one significant insight for 88.9 percent of its responses.
Ultimately, ChatGPT even exceeded the performance of PubMedGPT, a counterpart model trained exclusively on biomedical domain literature, which scored only 50.8 percent on an older dataset of USMLE-style questions.
The authors concluded that ChatGPT has the potential to enhance medical education, and eventually, clinical practice. In fact, clinicians at AnsibleHealth already use ChatGPT to rewrite jargon-heavy reports for easier patient comprehension.
A milestone in clinical AI
“Reaching the passing score for this notoriously difficult expert exam, and doing so without any human reinforcement, marks a notable milestone in clinical AI maturation,” noted the authors.
"ChatGPT contributed substantially to the writing of [our] manuscript... We interacted with ChatGPT much like a colleague, asking it to synthesize, simplify, and offer counterpoints to drafts in progress...All of the co-authors valued ChatGPT's input,” concluded Kung in a press release.
ChatGPT is a new artificial intelligence system referred to as a large language model that has the ability to produce human-like writing by predicting upcoming word sequences. ChatGPT does not search the internet, instead, it focuses on generating text using word relationships predicted by its internal processes.