Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Benchmarking a deep learning model against healthcare practitioners for hip fracture detection in the emergency department

2026·0 Zitationen·Singapore Medical JournalOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

INTRODUCTION: This study aimed to validate a deep learning (DL) model for automated hip fracture detection on pelvic X-rays in emergency departments (EDs) and benchmark its performance against that of junior doctors and radiographers in the ED. METHODS: We analysed 600 frontal pelvic radiographs for external validation of a DenseNet-121 DL model developed to detect hip fracture. The performance of the DL model was also compared to that of radiographers and junior doctors in the ED, with or without acesss to the DL model's reading outputs before their reading decisions. The performance was assessed in terms of area under the receiver operating characteristic curve (AUROC), area under the precision recall curve (AUPRC), sensitivity, specificity, and positive and negative predictive values. Ground truth of all sampled radiographs was based on the consensus findings of two musculoskeletal radiologists. The difference in classification errors was assessed using McNemar's test. RESULTS: The DL model trained on 512 by 512 images achieved an AUROC of 0.96 and AUPRC of 0.91, showing reduced performance compared with development metrics (AUROC 0.99, AUPRC 0.95). On original high-resolution images, radiographers significantly outperformed the DL model (McNemar's test: P < 0.001), achieving a sensitivity of 99% compared to the model's sensitivity of 85%. There was no significant difference in performance between the DL model and ED junior doctors, who read the original radiographs independently or with support from the DL model. CONCLUSION: The DL model could not match radiographers' performance, highlighting the importance of clinical context in fracture detection. While the model's short reading time could reduce diagnostic delays, further development incorporating higher-resolution images and multimodal clinical data integration is needed before clinical deployment.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationHip and Femur FracturesBone health and osteoporosis research

Volltext beim Verlag öffnen

Benchmarking a deep learning model against healthcare practitioners for hip fracture detection in the emergency department

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen