Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Effect of Region-of-Interest Prompting on Gemini 2.5 Pro in MRI Classification of Anterior Cruciate Ligament Injury
0
Zitationen
8
Autoren
2026
Jahr
Abstract
BACKGROUND: Artificial intelligence (AI) has shown promise in musculoskeletal imaging, yet the diagnostic contribution of large language models (LLMs) remains unclear. Prompt engineering may critically shape performance. OBJECTIVE: To evaluate the diagnostic accuracy of Google Gemini 2.5 Pro in classifying anterior cruciate ligament (ACL) status on knee magnetic resonance imaging (MRI) and to compare three prompting strategies; the primary endpoint was weighted F1-score. METHODS: A retrospective diagnostic study used 150 proton-density fat-suppressed (PD-FS) knee MRI volumes (50 each: healthy, partially injured, completely ruptured) drawn from a publicly available dataset (Clinical Hospital Centre Rijeka, Croatia; 2006-2014). Gemini 2.5 Pro received multimodal inputs via the official Python software development kit (SDK). Three prompts were tested: (P1) general series prompt, (P2) technical-description prompt, and (P3) region-of-interest (ROI)-focused prompt. Outputs (A = healthy, B = partial, C = ruptured) were compared with radiologist labels. Accuracy, precision, recall, specificity, F1 score, confusion matrices, and mean inference time were computed (scikit-learn v1.5.0). Ethical approval was waived because the data were de-identified and publicly available. RESULTS: Mean inference time was 2.1 ± 0.3 seconds per volume. ROI prompting (P3) yielded the highest weighted F1-score (0.31), while macro recall (0.35) and macro specificity (0.67) were similar across prompts. Confusion matrices showed improved discrimination of completely ruptured ACLs with P3. CONCLUSIONS: Despite a minor improvement in the weighted F1-score with Prompt 3, all prompts demonstrate poor overall classification performance, with low sensitivity and accuracy. The consistently overlapping confidence intervals indicate that prompt variations alone are insufficient to meaningfully enhance model performance. These findings suggest fundamental limitations in the model's ability to handle this task rather than suboptimal prompting.
Ähnliche Arbeiten
Treatment of Deep Cartilage Defects in the Knee with Autologous Chondrocyte Transplantation
1994 · 5.500 Zit.
Rating Systems in the Evaluation of Knee Ligament Injuries
1985 · 4.568 Zit.
Rationale, of The Knee Society Clinical Rating System
1989 · 4.534 Zit.
Knee Injury and Osteoarthritis Outcome Score (KOOS)—Development of a Self-Administered Outcome Measure
1998 · 3.827 Zit.
Biomechanical Measures of Neuromuscular Control and Valgus Loading of the Knee Predict Anterior Cruciate Ligament Injury Risk in Female Athletes: A Prospective Study
2005 · 3.487 Zit.