Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Effect of Region-of-Interest Prompting on Gemini 2.5 Pro in MRI Classification of Anterior Cruciate Ligament Injury

2026·0 Zitationen·CureusOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

BACKGROUND: Artificial intelligence (AI) has shown promise in musculoskeletal imaging, yet the diagnostic contribution of large language models (LLMs) remains unclear. Prompt engineering may critically shape performance. OBJECTIVE: To evaluate the diagnostic accuracy of Google Gemini 2.5 Pro in classifying anterior cruciate ligament (ACL) status on knee magnetic resonance imaging (MRI) and to compare three prompting strategies; the primary endpoint was weighted F1-score. METHODS: A retrospective diagnostic study used 150 proton-density fat-suppressed (PD-FS) knee MRI volumes (50 each: healthy, partially injured, completely ruptured) drawn from a publicly available dataset (Clinical Hospital Centre Rijeka, Croatia; 2006-2014). Gemini 2.5 Pro received multimodal inputs via the official Python software development kit (SDK). Three prompts were tested: (P1) general series prompt, (P2) technical-description prompt, and (P3) region-of-interest (ROI)-focused prompt. Outputs (A = healthy, B = partial, C = ruptured) were compared with radiologist labels. Accuracy, precision, recall, specificity, F1 score, confusion matrices, and mean inference time were computed (scikit-learn v1.5.0). Ethical approval was waived because the data were de-identified and publicly available. RESULTS: Mean inference time was 2.1 ± 0.3 seconds per volume. ROI prompting (P3) yielded the highest weighted F1-score (0.31), while macro recall (0.35) and macro specificity (0.67) were similar across prompts. Confusion matrices showed improved discrimination of completely ruptured ACLs with P3. CONCLUSIONS: Despite a minor improvement in the weighted F1-score with Prompt 3, all prompts demonstrate poor overall classification performance, with low sensitivity and accuracy. The consistently overlapping confidence intervals indicate that prompt variations alone are insufficient to meaningfully enhance model performance. These findings suggest fundamental limitations in the model's ability to handle this task rather than suboptimal prompting.

Autoren

Institutionen

Themen

Knee injuries and reconstruction techniquesArtificial Intelligence in Healthcare and EducationOsteoarthritis Treatment and Mechanisms

Volltext beim Verlag öffnen

Effect of Region-of-Interest Prompting on Gemini 2.5 Pro in MRI Classification of Anterior Cruciate Ligament Injury

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen