Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Diagnostic Performance of a Large Language Model ( <scp>ChatGPT</scp> ‐4o) in Chronic Rhinosinusitis <scp>CT</scp> Scan Interpretation
0
Zitationen
5
Autoren
2026
Jahr
Abstract
ABSTRACT Background Large language models (LLMs), such as ChatGPT, are increasingly utilized by physicians for clinical decision support due to their ease of use and versatility. However, their performance in diagnostic imaging remains largely untested. This study prospectively evaluates ChatGPT's ability to interpret sinus computed tomography (CT) scans for chronic rhinosinusitis (CRS), using radiologist assessment as the reference standard. Methods In this prospective cohort study, 102 coronal sinus CT scans were evaluated by both a board‐certified radiologist and ChatGPT‐4o. Each scan was screen recorded and uploaded twice to ChatGPT to assess repeatability, resulting in 306 total interpretations. The radiologist reviewed the same screen recordings provided to ChatGPT. Both raters assessed 11 predefined binary anatomical features and generated Lund‐Mackay scores. Diagnostic performance was assessed using standard accuracy metrics, and inter‐rater agreement was evaluated using established reliability coefficients. Results ChatGPT demonstrated variable performance across anatomical features. Sensitivity ranged from 0.00 to 0.89, and specificity from 0.26 to 0.95. The model demonstrated relatively high sensitivity for mucosal thickening (0.84) and sinus expansion (0.73), as well as strong agreement with the radiologist for the lamina papyracea (AC1 = 0.92) and anterior ethmoid artery (AC1 = 0.77). However, performance was poor for air‐fluid levels and bone thinning. Agreement with the radiologist was low across most features (AC1 < 0.4 in 82% of variables), and repeatability between ChatGPT versions was limited (mean AC1 = 0.29). Correlation between runs for Lund‐Mackay scores was weak ( r = 0.11), and agreement with the radiologist was poor (ICC < 0.07). Conclusion ChatGPT demonstrates partial capability in identifying specific sinus CT findings; however, it lacks overall diagnostic consistency. Human radiologists remain essential, and the clinical use of LLMs in imaging should be approached with caution.
Ähnliche Arbeiten
European Position Paper on Rhinosinusitis and Nasal Polyps 2020
2020 · 5.460 Zit.
Therapy of CF-Patients with Amitriptyline and Placebo - a Randomised, Double-Blind, Placebo-Controlled Phase IIb Multicenter, Cohort-Study
2013 · 2.534 Zit.
EPOS 2012: European position paper on rhinosinusitis and nasal polyps 2012. A summary for otorhinolaryngologists
2012 · 2.403 Zit.
Efficacy, safety and immunogenicity of heptavalent pneumococcal conjugate vaccine in children
2000 · 2.298 Zit.
Prognosis of Cerebral Vein and Dural Sinus Thrombosis
2004 · 2.284 Zit.