OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 12.05.2026, 16:17

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Bot vs. doc—who is better at reading proximal humerus fracture x-rays?

2025·0 Zitationen·JSES InternationalOpen Access
Volltext beim Verlag öffnen

0

Zitationen

6

Autoren

2025

Jahr

Abstract

Background: Artificial intelligence is becoming increasingly utilized as a source of convenient, efficient, and cost-effective information. Considering the potential utility of ChatGPT as an adjuvant in clinical decision making, the current study evaluates (1) the accuracy of ChatGPT-5 at evaluating shoulder x-rays containing either normal or proximal humerus fracture (PHFx) diagnoses and (2) interrater reliability between ChatGPT and orthopedic surgeons at different levels of training. Methods: The Stanford University Musculoskeletal Radiographs publicly accessible dataset was utilized, and 70 x-rays (35 PHFx, 35 normal) were analyzed after inclusion and exclusion criteria were applied. X-rays were reviewed independently by an orthopedic surgery junior resident, senior resident, shoulder/elbow fellow, and shoulder/elbow fellowship-trained attending. X-rays for each patient were uploaded to ChatGPT-5 and questions were asked using a response-based algorithm. Results: ChatGPT-5 demonstrated a sensitivity of 61.8%, specificity of 74.3%, and an overall accuracy of 67.1% for PHFx x-rays. ChatGPT incorrectly diagnosed 25.7% of normal x-rays with a fracture or dislocation. ChatGPT incorrectly diagnosed 23.5% of isolated PHFx x-rays as normal, 8.8% with an isolated glenohumeral dislocation without fracture, and 5.7% with a PHFx dislocation. Inter-rater reliability for ChatGPT was slight for displaced parts and poor for fractured part, Neer parts, and located glenohumeral joint. Junior and senior residents had moderate to substantial agreement with the attending reads (fractured part, displaced parts, Neer parts), while the fellow had substantial to almost perfect agreement. Conclusion: This study demonstrates that ChatGPT-5 is highly inaccurate at identifying PHFx on shoulder x-rays, characterizing the fracture patterns, and providing accurate interpretations of shoulder x-rays. Over-reliance on generative artificial intelligence to guide clinical decisions risks harm to the patients and should be approached with limited credence.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)AI in Service Interactions
Volltext beim Verlag öffnen