Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Closing the AI generalisation gap by adjusting for dermatology condition distribution differences across clinical settings
2
Zitationen
27
Autoren
2025
Jahr
Abstract
BACKGROUND: Generalisation of artificial intelligence (AI) models to a new setting is challenging. In this study, we seek to understand the robustness of a dermatology (AI) model and whether it generalises from telemedicine cases to a new setting including both patient-submitted photographs ("PAT") and clinician-taken photographs in-clinic ("CLIN"). METHODS: A retrospective cohort study involving 2500 cases previously unseen by the AI model, including both PAT and CLIN cases, from 22 clinics in the San Francisco Bay Area, spanning November 2015 to January 2021. The primary outcome measure for the AI model and dermatologists was the top-3 accuracy, defined as whether their top 3 differential diagnoses contained the top reference diagnosis from a panel of dermatologists per case. FINDINGS: The AI performed similarly between PAT and CLIN images (74% top-3 accuracy in CLIN vs. 71% in PAT), however, dermatologists were more accurate in PAT images (79% in CLIN vs. 87% in PAT). We demonstrate that demographic factors were not associated with AI or dermatologist errors; instead several categories of conditions were associated with AI model errors (p < 0.05). Resampling CLIN and PAT to match skin condition distributions to the AI development dataset reduced the observed differences (AI: 84% CLIN vs. 79% PAT; dermatologists: 77% CLIN vs. 89% PAT). We demonstrate a series of steps to close the generalisation gap, requiring progressively more information about the new dataset, ranging from the condition distribution to additional training data for rarer conditions. When using additional training data and testing on the dataset without resampling to match AI development, we observed comparable performance from end-to-end AI model fine tuning (85% in CLIN vs. 83% in PAT) vs. fine tuning solely the classification layer on top of a frozen embedding model (86% in CLIN vs. 84% in PAT). INTERPRETATION: AI algorithms can be efficiently adapted to new settings without additional training data by recalibrating the existing model, or with targeted data acquisition for rarer conditions and retraining just the final layer. FUNDING: Google.
Ähnliche Arbeiten
Dermatologist-level classification of skin cancer with deep neural networks
2017 · 13.468 Zit.
Tumor Angiogenesis: Therapeutic Implications
1971 · 10.111 Zit.
Improved Survival with Vemurafenib in Melanoma with BRAF V600E Mutation
2011 · 7.670 Zit.
Pembrolizumab versus Ipilimumab in Advanced Melanoma
2015 · 5.808 Zit.
Overall Survival with Combined Nivolumab and Ipilimumab in Advanced Melanoma
2017 · 5.360 Zit.
Autoren
- Rajeev Rikhye
- Aaron Loh
- Grace Hong
- Preeti Singh
- Margaret A. Smith
- Vijaytha Muralidharan
- Doris Wong
- Rory Sayres
- Ayush Jain
- Michelle Phung
- Nicolas J. Betancourt
- Bradley Fong
- Rachna Sahasrabudhe
- Khoban Nasim
- Alec Eschholz
- Basil Mustafa
- Jan Freyberg
- Terry Spitz
- Yossi Matias
- Greg S. Corrado
- Katherine Chou
- Dale R. Webster
- Peggy Bui
- Yuan Liu
- Yun Liu
- Justin Ko
- Steven Lin