ChatGPT Is Great at Taking Medical Licensing Exams. But Can It Replace Doctors?


Leon Neal / Getty Images

Key Takeaways

  • ChatGPT, a large language model developed by OpenAI, went viral after its release in November 2022.
  • This AI tool is great at taking medical licensing exams, according to a new preprint study.
  • Experts say this tool could have many promising uses in health care, but for now, it’s prone to errors and biases.

ChatGPT took the internet by storm within weeks of its release. This AI-powered chatbot could potentially save time for clinicians by generating diagnoses, test results, and authorization letters.

Some scientists have already been testing ChatGPT in medical education. In a preprint study, ChatGPT scored over 50% accuracy in the United States Medical Licensing Exam.

ChatGPT only has data available up until the end of 2021. The researchers used more recent exam questions in 2022, and ChatGPT still delivered an impressive performance.

“Random guessing would be 20%. A well-trained non-medical professional would not be able to exceed 40% to 50%. But ChatGPT was getting consistently in the high 50s to mid 60s, and even 70%,” said Victor Tseng, MD, a co-author of the study and medical director of AnsibleHealth based in Atlanta.

Despite its stellar capabilities in taking exams, ChatGPT is far from being employable in the medical field.

ChatGPT is a large language model that predicts how sentences would fit together based on the text data it’s been fed, but it doesn’t mean the tool has “good judgment or common sense,” according to Ignacio Fuentes, executive director of the Jameel Clinic at MIT, an initiative on AI machine learning technologies at the intersection of health care and life sciences.

Léonard Boussioux, a final year PhD student in operations research at MIT, said large language models are just threading information together, sometimes resulting in absurd mistakes said in absolute confidence.

“They make it look like they really know what they’re saying when, in fact, no, it’s mostly based on correlations,” Boussioux told Verywell.

How Can ChatGPT Help Improve Health Care?

While ChatGPT is still learning and improving through its interactions with users, experts say this tool can already be implemented in health care in useful ways.

For example, Tseng and his team use ChatGPT to write appeal letters to insurance agencies and to translate complex “medicalese” into a format that is easier for patients to understand.

“It is these small, incremental things that are already changing practice in many ways,” Tseng said.

ChatGPT can also help researchers brainstorm ideas and expedite workflow. In the future, Boussioux said, a more advanced version of ChatPT might be able to accurately diagnose certain medical conditions.

It might also be able to help administrative workers and clinicians select the correct codes when billing insurance companies for care, an otherwise time-consuming and tedious task.

We gave ChatGPT some vague symptoms, and it gave us a possible diagnosis of a cold and advised us to rest and see a doctor when necessary.


Why Is ChatGPT Problematic Though?

ChatGPT is prone to factual errors, and OpenAI states this limitation clearly on the homepage. Not only is ChatGPT occasionally wrong, but it can also “produce harmful instructions or biased content,” according to the homepage.

Like many other language models, ChatGPT is trained on text data from the internet, which means it can reflect human biases, stereotypes, and misinformation.

“You put this in a clinical setting and it can have a lot of trouble. So we need to make sure that we do this in a safe way,” Fuentes said.

A recent Time investigation revealed that the company OpenAI hired low-wage workers in Kenya to filter toxic and harmful content from ChatGPT—a reminder that AI innovation still relies on human moderation and exploitation.

In addition to the concerns about errors and misinformation, Tseng said patient privacy is something that needs to be addressed if ChatGPT is used in a medical setting. Since ChatGPT is not HIPPA compliant, it can potentially leak patient data.

“I had to resist a lot of the momentum to push it more forcefully into patient care and actually step back and say: What are the actual milestones and benchmarks of transparency and fairness we want to make sure it hits before taking the next step?” Tseng said.

What This Means For You

While ChatGPT is an innovative AI tool, it should not be used in place of medical advice from a healthcare provider. Since this chatbot pulls massive text data from the internet and it’s still in the early days of development, it’s prone to biases, stereotypes, and misinformation.

1 Source
Verywell Health uses only high-quality sources, including peer-reviewed studies, to support the facts within our articles. Read our editorial process to learn more about how we fact-check and keep our content accurate, reliable, and trustworthy.
  1. Kung TH, Cheatham M, ChatGPT, et al. Performance of ChatGPT on USMLE: potential for ai-assisted medical education using large language models. medRxiv. Preprint posted online December 21, 2022. doi:10.1101/2022.12.19.22283643