Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

1344 Assessing the Diagnostic Accuracy of LLM (ChatGPT-4) in Sleep Medicine Cases

2025·0 Zitationen·SLEEPOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Abstract Introduction Large language models (LLMs), such as ChatGPT-4, have demonstrated promising potential as diagnostic tools across various medical disciplines. In sleep medicine, artificial intelligence tools are already used to analyze physiological sleep data and are being studied for their potential in phenotyping, endotyping, and predicting treatment responses. However, the effectiveness of LLMs in accurately diagnosing sleep disorders based on clinical history has not yet been studied. This study evaluates ChatGPT-4’s diagnostic performance using clinical vignettes. Methods Nineteen clinical cases were selected from the Case Book of Sleep Medicine, Third Edition (AASM, 2019). Each case included patient history, examination findings, and test results. The vignettes were input into ChatGPT-4 using the following standardized prompt: “Based on the provided case details, generate the top 5 differential diagnoses and identify the most likely final diagnosis: (copy and paste clinical vignette).” The model was tasked with generating both differential diagnoses and a final diagnosis for each case. The AI’s results were compared to reference diagnoses from the case book. Differential diagnoses were measured as the number of matches, and accuracy was reported as a percentage. Final diagnoses were scored as 0 (no match), 1 (partial match), or 2 (full match). Results The mean number of AI-generated differential diagnoses matching the AASM case differential diagnoses was 2.79 ± 0.71 (95% CI: 2.45–3.13). The mean accuracy percentage for differential diagnoses was 63.27% ± 15.61% (95% CI: 55.75%–70.79%), with scores ranging from 33.33% to 100%. For final diagnoses, ChatGPT-4 scored a total of 30 out of a possible 38, with a mean score was 1.58 ± 0.61 (95% CI: 1.29–1.87) out of 2, with 74% of cases achieving a full match. Performance was higher in cases with fewer differential diagnoses, whereas accuracy decreased in more complex cases. Conclusion ChatGPT-4 showed moderate to high accuracy in generating both differential and final diagnoses for sleep disorders. These findings suggest that AI could become a valuable clinical decision-support tool in sleep medicine. However, its inconsistent performance in complex cases highlights the need for further refinement and clinical testing. Support (if any)

Autoren

Institutionen

Themen

Acute Ischemic Stroke ManagementArtificial Intelligence in Healthcare and EducationTrauma and Emergency Care Studies

Volltext beim Verlag öffnen

1344 Assessing the Diagnostic Accuracy of LLM (ChatGPT-4) in Sleep Medicine Cases

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen