Racial bias exists in photo-based medical diagnosis despite AI assistance
As in many fields, medical professionals are still figuring out whether artificial intelligence will help or hinder their work. Experts are also investigating how prejudices within society creep into the machines we create. A new study from Northwestern University finds that while a doctor-machine partnership improves diagnostic accuracy in dermatology, disparities in accuracy based on skin color still exist.
In a large-scale digital experiment that combines doctor-to-physician dermatology diagnosis with AI, the research team Matthew Groh Researchers at Northwestern University's Kellogg School of Management sought to benchmark how well doctors diagnose skin conditions with images and assess how AI assistance would impact doctors' diagnoses.
of study“Deep learning-assisted decision support for diagnosis of skin diseases according to skin tone” was published in Nature Medicine on February 5th.
Research shows that decision support with deep learning systems (DLS) can help improve the diagnostic accuracy of both primary care physicians and dermatologists, but the accuracy of GPs by skin color The difference will widen. This study compares the performance of both humans and AI. This is because both humans and AI are susceptible to systematic errors, especially in diagnosing underrepresented populations.
“While research shows that dark skin is underrepresented in textbooks and dermatology training programs, research on how accurate doctors are about whether skin is light or dark when it comes to diagnosing disease. There are very few,” said Groh, an assistant professor of management and organizations. “Our study found that doctors differ in their accuracy when it comes to determining whether someone has light or dark skin. And in this case, it's not the AI that's biased, but the way doctors use it. is.”
Human vs. AI
Without AI assistance, dermatologists had a diagnostic accuracy of 38% and primary care physicians had a diagnostic accuracy of 19% across all skin tones and skin conditions in the experiment. The results were particularly alarming because there was little experience with dark-skinned patients. Primary care providers who reported seeing most or all white patients were less accurate in determining whether their skin was dark or light.
“We suspected bias, but specialists don't have the bias exacerbated by AI, but primary care physicians do,” Groh says. “Experts look at the advice from AI and take into account their own vast knowledge to make a diagnosis. On the other hand, primary care physicians may not have deep intuition about the patterns that match their , follow the AI's suggestions about the patterns it recognizes.”
When deep learning system (DLS) decision support was implemented, diagnostic accuracy increased by 33% for dermatologists and 69% for primary care physicians. Among dermatologists, DLS enjoys support for improving accuracy relatively evenly across skin tones. However, the same was not true for primary care physicians. Their accuracy was higher for light skin tones than for dark skin tones. AI assistance increased the accuracy gap by primary care physicians by 5 percentage points. This is statistically significant.
Until now, many researchers in machine learning for healthcare have approached medical problems by training models to classify diagnoses from images or series of images. But perhaps he could train the DLS to generate a list of possible diagnoses, or even a description of the skin condition (size, shape, color, texture, etc.) that would guide the doctor's diagnosis. Even that would be more helpful, Groh said. .
Patients often send images virtually to their doctors for evaluation. Apps like VisualDX, RXPhoto, and Piction Health are used by both doctors and patients. This study shows how accurate her diagnosis is on a single image without additional context.
Challenges to other medical fields
Groh said he wants to continue looking at how society can solve problems more effectively with the help of AI. He hopes this research will lay the foundation for challenges in other medical fields.
“Future research in human-computer interaction, machine learning, and psychology will need to assess how well models perform and how humans and machines adapt when problems arise.” “I guess so,” Groh said.
“We must find ways to incorporate underrepresented demographics into research, so we can accurately implement these models in the real world and avoid the kinds of pitfalls we know humans and machines are prone to.” We will be ready to build AI systems that act as tools designed to avoid systematic errors, so we can update curricula, change norms in various fields, and hopefully improve everyone. can do.”
