Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A cross-sectional study on ChatGPT’s alignment with clinical practice guidelines in musculoskeletal rehabilitation
11
Zitationen
2
Autoren
2025
Jahr
Abstract
BACKGROUND: AI models like ChatGPT have the potential to support musculoskeletal rehabilitation by providing clinical insights. However, their alignment with evidence-based guidelines needs evaluation before integration into physiotherapy practice. OBJECTIVE: To evaluate the performance of ChatGPT (GPT-4 model) in generating responses to musculoskeletal rehabilitation queries by comparing its recommendations with evidence-based clinical practice guidelines (CPGs). DESIGN: This study was designed as a cross-sectional observational study. METHODS: Twenty questions covering disease information, assessment, and rehabilitation were developed by two experienced physiotherapists specializing in musculoskeletal disorders. The questions were distributed across three anatomical regions: upper extremity (7 questions), lower extremity (9 questions), and spine (4 questions). ChatGPT's responses were obtained and evaluated independently by two raters using a 5-point Likert scale assessing relevance, accuracy, clarity, completeness, and consistency. Weighted kappa values were calculated to assess inter-rater agreement and consistency within each category. RESULTS: ChatGPT's responses received the highest average score for clarity (4.85), followed by accuracy (4.62), relevance (4.50), and completeness (4.20). Consistency received the lowest score (3.85). The highest agreement (weighted kappa = 0.90) was observed in the disease information category, whereas rehabilitation displayed relatively lower agreement (weighted kappa = 0.56). Variability in consistency and moderate weighted kappa values in relevance and clarity highlighted areas requiring improvement. CONCLUSIONS: This study demonstrates ChatGPT's potential in providing guideline-aligned information in musculoskeletal rehabilitation. However, due to observed limitations in consistency, completeness, and the ability to replicate nuanced clinical reasoning, its use should remain supplementary rather than as a primary decision-making tool. While it performed better in disease information, as evidenced by higher inter-rater agreement and scores, its performance in the rehabilitation category was comparatively lower, highlighting challenges in addressing complex, nuanced therapeutic interventions. This variability in consistency and domain-specific reasoning underscores the need for further refinement to ensure reliability in complex clinical scenarios. CLINICAL TRIAL NUMBER: Not applicable.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.707 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.613 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.159 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.875 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.