OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 10.04.2026, 22:18

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Generative AI in degenerative lumbar spinal stenosis care: A NASS guideline-compliant comparative analysis of ChatGPT and DeepSeek

2025·0 Zitationen·Journal of orthopaedic surgeryOpen Access
Volltext beim Verlag öffnen

0

Zitationen

7

Autoren

2025

Jahr

Abstract

BackgroundThis study aims to compare the performance of two artificial intelligence (AI) models, ChatGPT-4.0 and DeepSeek-R1, in addressing clinical questions related to degenerative lumbar spinal stenosis (DLSS) using the North American Spine Society (NASS) guidelines as the benchmark.Methods15 clinical questions spanning five domains (diagnostic criteria, non-surgical management, surgical indications, perioperative care, and emerging controversies) were designed based on the 2013 NASS evidence-based clinical guidelines for the diagnosis and management of DLSS. Responses from both models were independently evaluated by two board-certified spine surgeons across four metrics: accuracy, completeness, supplementality, and misinformation. Inter-rater reliability was assessed using Cohen's κ coefficient, while Mann-Whitney U and Chi-square tests were employed to analyze statistical differences between models.ResultsDeepSeek-R1 demonstrated superior performance over ChatGPT-4.0 in accuracy (median score: 3 vs 2, <i>P</i> = 0.009), completeness (2 vs 1, <i>P</i> = 0.010), and supplementality (2 vs 1, <i>P</i> = 0.018). Both models exhibited comparable performance in avoiding misinformation (<i>P</i> = 0.671). DeepSeek-R1 achieved higher inter-rater agreement in accuracy (κ = 0.727 vs 0.615), whereas ChatGPT-4.0 showed stronger consistency in ssupplementality (κ = 0.792 vs 0.762).ConclusionsWhile both AI models demonstrate potential for clinical decision support, DeepSeek-R1 aligns more closely with NASS guidelines. ChatGPT-4.0 excels in providing supplementary insights but exhibits variability in accuracy. These findings underscore the need for domain-specific optimization of AI models to enhance reliability in medical applications.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMedical Imaging and AnalysisSpine and Intervertebral Disc Pathology
Volltext beim Verlag öffnen