Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating commercial multimodal AI for diabetic eye screening and implications for an alternative regulatory pathway

2025·1 Zitationen·npj Digital MedicineOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Autonomous AI for diabetic eye examination is among the most validated and trusted medical AI systems, supported by extensive real-world evidence demonstrating safety, efficacy, improved outcomes, increased productivity, and cost savings. Yet its adoption remains limited. In contrast, commercially available off-the-shelf generative AI models (OTSAIs) are being rapidly tested in medical settings despite a lack of such real-world validation. These models have shown strong performance on medical reasoning tasks, prompting interest in their potential for clinical deployment. We evaluated four OTSAIs-GPT-4o and GPT-4o-mini (OpenAI, San Francisco, CA), Grok (xAI, San Francisco, CA), and Gemini (Google, Mountain View, CA)-on a specific diagnostic task: diabetic eye examination. The OTSAIs were bundled to ensure consistency, and performance was assessed using a level 3 reference standard, the publicly available Messidor-2 dataset. GPT-4o achieved the highest area under the receiver operator characteristic curve (AUC), 0.83. Grok achieved 0.63, and AUC was not calculable for Gemini. The AUC of retina specialists on the same task was estimated at 0.94, so the emergent performance of OTSAIs does not match that of clinical experts, nor does it approach FDA endpoints for consideration as a medical device. Nevertheless, as the performance of these OTSAIs approaches theoretical limits in the future, there might be a regulatory path through task-specific licensing by State Medical Boards for specific clinical tasks. This path may be modeled after licensing for physician assistants, where trust in the bundled OTSAI, to be used in an assistive fashion, is achieved through rigorous validation for safety and efficacy according to widely accepted regulatory considerations for both patient-facing AI, as well as for SaMD processes.

Autoren

Institutionen

Themen

Retinal Imaging and AnalysisArtificial Intelligence in Healthcare and EducationMachine Learning in Healthcare

Volltext beim Verlag öffnen

Evaluating commercial multimodal AI for diabetic eye screening and implications for an alternative regulatory pathway

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen