Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Generative AI in degenerative lumbar spinal stenosis care: A NASS guideline-compliant comparative analysis of ChatGPT and DeepSeek
0
Zitationen
7
Autoren
2025
Jahr
Abstract
BackgroundThis study aims to compare the performance of two artificial intelligence (AI) models, ChatGPT-4.0 and DeepSeek-R1, in addressing clinical questions related to degenerative lumbar spinal stenosis (DLSS) using the North American Spine Society (NASS) guidelines as the benchmark.Methods15 clinical questions spanning five domains (diagnostic criteria, non-surgical management, surgical indications, perioperative care, and emerging controversies) were designed based on the 2013 NASS evidence-based clinical guidelines for the diagnosis and management of DLSS. Responses from both models were independently evaluated by two board-certified spine surgeons across four metrics: accuracy, completeness, supplementality, and misinformation. Inter-rater reliability was assessed using Cohen's κ coefficient, while Mann-Whitney U and Chi-square tests were employed to analyze statistical differences between models.ResultsDeepSeek-R1 demonstrated superior performance over ChatGPT-4.0 in accuracy (median score: 3 vs 2, <i>P</i> = 0.009), completeness (2 vs 1, <i>P</i> = 0.010), and supplementality (2 vs 1, <i>P</i> = 0.018). Both models exhibited comparable performance in avoiding misinformation (<i>P</i> = 0.671). DeepSeek-R1 achieved higher inter-rater agreement in accuracy (κ = 0.727 vs 0.615), whereas ChatGPT-4.0 showed stronger consistency in ssupplementality (κ = 0.792 vs 0.762).ConclusionsWhile both AI models demonstrate potential for clinical decision support, DeepSeek-R1 aligns more closely with NASS guidelines. ChatGPT-4.0 excels in providing supplementary insights but exhibits variability in accuracy. These findings underscore the need for domain-specific optimization of AI models to enhance reliability in medical applications.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.418 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.288 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.726 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.516 Zit.