Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating commercial multimodal AI for diabetic eye screening and implications for an alternative regulatory pathway
1
Zitationen
3
Autoren
2025
Jahr
Abstract
Autonomous AI for diabetic eye examination is among the most validated and trusted medical AI systems, supported by extensive real-world evidence demonstrating safety, efficacy, improved outcomes, increased productivity, and cost savings. Yet its adoption remains limited. In contrast, commercially available off-the-shelf generative AI models (OTSAIs) are being rapidly tested in medical settings despite a lack of such real-world validation. These models have shown strong performance on medical reasoning tasks, prompting interest in their potential for clinical deployment. We evaluated four OTSAIs-GPT-4o and GPT-4o-mini (OpenAI, San Francisco, CA), Grok (xAI, San Francisco, CA), and Gemini (Google, Mountain View, CA)-on a specific diagnostic task: diabetic eye examination. The OTSAIs were bundled to ensure consistency, and performance was assessed using a level 3 reference standard, the publicly available Messidor-2 dataset. GPT-4o achieved the highest area under the receiver operator characteristic curve (AUC), 0.83. Grok achieved 0.63, and AUC was not calculable for Gemini. The AUC of retina specialists on the same task was estimated at 0.94, so the emergent performance of OTSAIs does not match that of clinical experts, nor does it approach FDA endpoints for consideration as a medical device. Nevertheless, as the performance of these OTSAIs approaches theoretical limits in the future, there might be a regulatory path through task-specific licensing by State Medical Boards for specific clinical tasks. This path may be modeled after licensing for physician assistants, where trust in the bundled OTSAI, to be used in an assistive fashion, is achieved through rigorous validation for safety and efficacy according to widely accepted regulatory considerations for both patient-facing AI, as well as for SaMD processes.
Ähnliche Arbeiten
Optical Coherence Tomography
1991 · 13.620 Zit.
Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs
2016 · 7.298 Zit.
Global Prevalence of Glaucoma and Projections of Glaucoma Burden through 2040
2014 · 6.771 Zit.
YOLOv3: An Incremental Improvement
2018 · 5.887 Zit.
Ranibizumab for Neovascular Age-Related Macular Degeneration
2006 · 5.826 Zit.