Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Deriving the OTA/AO fracture classification from routinely collected radiology reports using a large language model
0
Zitationen
11
Autoren
2026
Jahr
Abstract
Objectives: Fracture classification plays a pivotal role in research and quality assurance; despite its wide acceptance, the OTA/AO classification is seldom documented in patients' electronic medical records, which impedes fracture registry creation and effective interdisciplinary communication. In this study, we investigate "off-the-shelf" large language models (LLMs) in translating free text in radiology reports into OTA/AO classification labels. Methods: We employed a Health Insurance Portability and Accountability Act-compliant LLM to classify 109 fracture descriptions from randomly selected radiology reports in a deidentified electronic medical record database. Ground-truth classifications were assigned by expert orthopaedic traumatologists based on corresponding radiographs. Multiple prompting strategies were tested, including zero-shot prompting, zero-shot chain-of-thought prompting, and retrieval-augmented generation. We additionally asked the LLM to assign classification labels to "ideal" fracture descriptions written according to the 2018 OTA/AO Fracture and Dislocation Classification Compendium. Model performance was assessed using Cohen kappa and accuracy against ground-truth labels. Results: levels. Performance declined to slight agreement at the subgroup level. The best performance was observed using ideal fracture descriptions with retrieval-augmented generation, in which the agreement between the full LLM-generated and ground-truth labels remained moderate. Classification errors were largely due to imprecise descriptions, hallucinated information, or incorrect application of factually correct information. Conclusions: Our study demonstrates some potential for LLMs to translate free-text fracture descriptions into OTA/AO classifications, allowing for efficient labeling of large datasets of radiology reports. Future work should focus on refining model classification capabilities using more sophisticated prompting methods. Level of Evidence: Level III.
Ähnliche Arbeiten
Refinement and reassessment of the SERVQUAL scale.
1991 · 3.967 Zit.
Radiobiology for the Radiologist.
1974 · 3.502 Zit.
ACR Thyroid Imaging, Reporting and Data System (TI-RADS): White Paper of the ACR TI-RADS Committee
2017 · 2.446 Zit.
Accuracy of Physician Self-assessment Compared With Observed Measures of Competence
2006 · 2.331 Zit.
Technology as an Occasion for Structuring: Evidence from Observations of CT Scanners and the Social Order of Radiology Departments
1986 · 2.258 Zit.