Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluation of commercial AI algorithms for the detection of fractures, effusions, and dislocations on real-world clinical data: A prospective registry study
4
Zitationen
14
Autoren
2025
Jahr
Abstract
PURPOSE: To prospectively evaluate and directly compare the performance of three commercial AI algorithms (Gleamer, AZmed, and Radiobotics) for detecting fractures, dislocations, and joint effusions across multiple anatomical regions in real-world adult clinical radiography. MATERIAL AND METHODS: In this single-center, prospective technical performance evaluation study, we assessed these algorithms on radiographs from adult patients (n = 1037; 2926 radiographs; 22 anatomical regions) at the Technical University of Munich (January-March 2025). Radiologists' reports served as the reference standard, with CT adjudication when available. Sensitivity, specificity, accuracy, and AUC were calculated; AUCs were compared using Bonferroni-corrected DeLong tests. RESULTS: Fractures were identified in 29.60 % of patients; 13.69 % had acute fractures and 6.65 % had multiple fractures. For all fractures, Gleamer (AUC 83.95 %, sensitivity 75.57 %, specificity 92.33 %) and AZmed (AUC 84.88 %, sensitivity 79.48 %, specificity 90.27 %) outperformed Radiobotics (AUC 77.24 %, sensitivity 60.91 %, specificity 93.56 %). For acute fractures, AUCs were comparable (range: 84.81-87.78 %). For multiple fractures, performance was limited (AUCs 64.17-73.40 %). AZmed had higher AUC for dislocation (61.85 % vs. 54.48 % for Gleamer), while Gleamer and Radiobotics outperformed AZmed for effusion (AUC 69.59 % and 73.63 % vs. 57.99 %). No algorithm exceeded 91 % accuracy for acute fractures. CONCLUSION: In this real-world, single-center study, commercial AI algorithms showed moderate to high performance for straightforward fracture detection but limited accuracy for complex scenarios such as multiple fractures and dislocations. IMPLICATIONS FOR PRACTICE: Current tools should be used as adjuncts rather than replacements for radiologists and reporting radiographers. Multicenter validation and more diverse training data are necessary to improve generalizability and robustness.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.774 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.685 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.244 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.898 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.