Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
1344 Assessing the Diagnostic Accuracy of LLM (ChatGPT-4) in Sleep Medicine Cases
0
Zitationen
3
Autoren
2025
Jahr
Abstract
Abstract Introduction Large language models (LLMs), such as ChatGPT-4, have demonstrated promising potential as diagnostic tools across various medical disciplines. In sleep medicine, artificial intelligence tools are already used to analyze physiological sleep data and are being studied for their potential in phenotyping, endotyping, and predicting treatment responses. However, the effectiveness of LLMs in accurately diagnosing sleep disorders based on clinical history has not yet been studied. This study evaluates ChatGPT-4’s diagnostic performance using clinical vignettes. Methods Nineteen clinical cases were selected from the Case Book of Sleep Medicine, Third Edition (AASM, 2019). Each case included patient history, examination findings, and test results. The vignettes were input into ChatGPT-4 using the following standardized prompt: “Based on the provided case details, generate the top 5 differential diagnoses and identify the most likely final diagnosis: (copy and paste clinical vignette).” The model was tasked with generating both differential diagnoses and a final diagnosis for each case. The AI’s results were compared to reference diagnoses from the case book. Differential diagnoses were measured as the number of matches, and accuracy was reported as a percentage. Final diagnoses were scored as 0 (no match), 1 (partial match), or 2 (full match). Results The mean number of AI-generated differential diagnoses matching the AASM case differential diagnoses was 2.79 ± 0.71 (95% CI: 2.45–3.13). The mean accuracy percentage for differential diagnoses was 63.27% ± 15.61% (95% CI: 55.75%–70.79%), with scores ranging from 33.33% to 100%. For final diagnoses, ChatGPT-4 scored a total of 30 out of a possible 38, with a mean score was 1.58 ± 0.61 (95% CI: 1.29–1.87) out of 2, with 74% of cases achieving a full match. Performance was higher in cases with fewer differential diagnoses, whereas accuracy decreased in more complex cases. Conclusion ChatGPT-4 showed moderate to high accuracy in generating both differential and final diagnoses for sleep disorders. These findings suggest that AI could become a valuable clinical decision-support tool in sleep medicine. However, its inconsistent performance in complex cases highlights the need for further refinement and clinical testing. Support (if any)
Ähnliche Arbeiten
Classification of subtype of acute ischemic stroke. Definitions for use in a multicenter clinical trial. TOAST. Trial of Org 10172 in Acute Stroke Treatment.
1993 · 12.198 Zit.
Correspondence - Tranexamic acid for traumatic brain injury
2005 · 11.737 Zit.
Tissue Plasminogen Activator for Acute Ischemic Stroke
1995 · 11.648 Zit.
Aspirin plus Clopidogrel as Secondary Prevention after Stroke or Transient Ischemic Attack: A Systematic Review and Meta-Analysis
2014 · 11.555 Zit.
Age-specific relevance of usual blood pressure to vascular mortality: a meta-analysis of individual data for one million adults in 61 prospective studies
2002 · 10.209 Zit.