Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Can Chatbots Please Both Patients and Experts? Benchmarking AI and Clinical Guidelines for Hearing Loss
3
Zitationen
6
Autoren
2025
Jahr
Abstract
Objective: To evaluate the reliability, accuracy, and clarity of responses generated by 7 contemporary artificial intelligence chatbots in answering patient-focused questions about age-related hearing loss and sudden sensorineural hearing loss, and to compare these outputs to expert-authored guideline responses as well as layperson ratings. Study design: Cross-sectional. Setting: Academic medical center. Patients: Not applicable. Ten independent layperson raters, all over the age of 18, recruited from personal networks, assessed a subset of chatbot and expert responses. Interventions: Patient-centered questions, derived from official clinical practice guidelines for hearing loss, were submitted to 7 artificial intelligence chatbots from 3 major development groups. Responses were rated by a blinded panel of 5 otolaryngologists for accuracy, extensiveness, misleading content, quality of cited references, and overall reliability. A panel of 10 independent layperson raters, all over the age of 18, recruited from personal networks, assessed a subset of chatbot and expert responses. Main outcome measures: Proportion of chatbot answers rated fully accurate by expert panel; mean layperson clarity and trustworthiness scores; frequency of misleading information and high-quality references. Results: The most advanced chatbots achieved full guideline-concordant accuracy for up to 50% of questions, while earlier models ranged from 25% to 37.5%. All models performed highly for extensiveness and reference quality. Layperson ratings were highest for gold-standard expert answers, the latest chatbots approached these levels for both clarity and trustworthiness (mean scores: 4.7 to 4.8 out of 5; 95% CI: 4.67–4.85), and differences between models were of moderate-to-large effect size (η 2 =0.29 to 0.30). Misleading content was rare and typically not clinically significant. Conclusions: Modern artificial intelligence chatbots can provide clear and generally reliable patient education for hearing loss, but full guideline concordance remains inconsistent. Expert oversight is advised to ensure clinical accuracy.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.719 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.628 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.176 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.880 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.