Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Quantitative evaluation of GPT-4’s performance on US and Chinese osteoarthritis treatment guideline interpretation and orthopaedic case consultation
6
Zitationen
6
Autoren
2024
Jahr
Abstract
OBJECTIVES: To evaluate GPT-4's performance in interpreting osteoarthritis (OA) treatment guidelines from the USA and China, and to assess its ability to diagnose and manage orthopaedic cases. SETTING: The study was conducted using publicly available OA treatment guidelines and simulated orthopaedic case scenarios. PARTICIPANTS: No human participants were involved. The evaluation focused on GPT-4's responses to clinical guidelines and case questions, assessed by two orthopaedic specialists. OUTCOMES: Primary outcomes included the accuracy and completeness of GPT-4's responses to guideline-based queries and case scenarios. Metrics included the correct match rate, completeness score and stratification of case responses into predefined tiers of correctness. RESULTS: In interpreting the American Academy of Orthopaedic Surgeons and Chinese OA guidelines, GPT-4 achieved a correct match rate of 46.4% and complete agreement with all score-2 recommendations. The accuracy score for guideline interpretation was 4.3±1.6 (95% CI 3.9 to 4.7), and the completeness score was 2.8±0.6 (95% CI 2.5 to 3.1). For case-based questions, GPT-4 demonstrated high performance, with over 88% of responses rated as comprehensive. CONCLUSIONS: GPT-4 demonstrates promising capabilities as an auxiliary tool in orthopaedic clinical practice and patient education, with high levels of accuracy and completeness in guideline interpretation and clinical case analysis. However, further validation is necessary to establish its utility in real-world clinical settings.
Ähnliche Arbeiten
Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement
2009 · 63.529 Zit.
Cochrane Handbook for Systematic Reviews of Interventions
2008 · 25.073 Zit.
GRADE: an emerging consensus on rating quality of evidence and strength of recommendations
2008 · 21.558 Zit.
The National Comprehensive Cancer Network
1998 · 16.869 Zit.
Evidence based medicine: what it is and what it isn't
1996 · 15.607 Zit.