Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Benchmarking a deep learning model against healthcare practitioners for hip fracture detection in the emergency department
0
Zitationen
6
Autoren
2026
Jahr
Abstract
INTRODUCTION: This study aimed to validate a deep learning (DL) model for automated hip fracture detection on pelvic X-rays in emergency departments (EDs) and benchmark its performance against that of junior doctors and radiographers in the ED. METHODS: We analysed 600 frontal pelvic radiographs for external validation of a DenseNet-121 DL model developed to detect hip fracture. The performance of the DL model was also compared to that of radiographers and junior doctors in the ED, with or without acesss to the DL model's reading outputs before their reading decisions. The performance was assessed in terms of area under the receiver operating characteristic curve (AUROC), area under the precision recall curve (AUPRC), sensitivity, specificity, and positive and negative predictive values. Ground truth of all sampled radiographs was based on the consensus findings of two musculoskeletal radiologists. The difference in classification errors was assessed using McNemar's test. RESULTS: The DL model trained on 512 by 512 images achieved an AUROC of 0.96 and AUPRC of 0.91, showing reduced performance compared with development metrics (AUROC 0.99, AUPRC 0.95). On original high-resolution images, radiographers significantly outperformed the DL model (McNemar's test: P < 0.001), achieving a sensitivity of 99% compared to the model's sensitivity of 85%. There was no significant difference in performance between the DL model and ED junior doctors, who read the original radiographs independently or with support from the DL model. CONCLUSION: The DL model could not match radiographers' performance, highlighting the importance of clinical context in fracture detection. While the model's short reading time could reduce diagnostic delays, further development incorporating higher-resolution images and multimodal clinical data integration is needed before clinical deployment.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.740 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.649 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.202 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.886 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.