Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Let’s chat (GPT) about adolescent idiopathic scoliosis: accuracy and reliability of chat responses to frequently asked questions
0
Zitationen
6
Autoren
2025
Jahr
Abstract
BACKGROUND: Patients increasingly seek online medical information, with artificial intelligence (AI) chatbots like ChatGPT emerging as potential resources for adolescent idiopathic scoliosis (AIS); however, their accuracy and reliability need assessment. This study aimed to evaluate the accuracy and reliability of ChatGPT, an AI model, in answering questions related to AIS. METHODS: Sixty-four questions across four categories (general information, diagnosis and screening, treatment and follow-up, and quality of life [QoL]) were adapted from FAQs on professional association websites, the SOSORT consensus article, and QoL questionnaires. Two reviewers rated ChatGPT's responses on a scale from 1 (correct and comprehensive) to 4 (completely incorrect). Descriptive statistics were calculated to demonstrate the percentages of responses per score as well as the percentages of scores across categories. Each question was entered twice to assess reliability and response similarity. The percentage of responses that differed when the same query was entered twice into the system was calculated. The Cohen's Kappa statistic was utilized to assess the level of agreement between the two reviewers. RESULTS: Of all the responses, 53.1% were rated as "correct and comprehensive," while 34.4% were rated as "correct but not comprehensive." ChatGPT performed best in the QoL category, with 13 out of 15 (86.7%) responses rated as correct. The second-best performance was in the diagnosis and screening category, with 7 out of 13 (53.8%) correct responses, followed by the general information category, with 9 out of 17 (52.9%) correct responses. The lowest performance was in the treatment and follow-up category, with 5 out of 19 (26.3%) correct responses. Consistency in ChatGPT's responses when questions were entered twice was 76.6%. Agreement between the reviewers' scores was excellent, as indicated by Cohen's Kappa statistic (Kappa: 0.82, 95% CI: 0.59 to 1.04; p = 0.0001). CONCLUSIONS: ChatGPT demonstrated strong accuracy in addressing questions related to QoL in AIS, but its accuracy in treatment-related areas remains insufficient. Therefore, patients and parents are advised to consult medical professionals rather than rely solely on AI-generated information for AIS treatment and management.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.774 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.685 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.244 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.898 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.