OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 21.05.2026, 07:31

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Can Chatbots Please Both Patients and Experts? Benchmarking AI and Clinical Guidelines for Hearing Loss

2025·3 Zitationen·Otology & Neurotology
Volltext beim Verlag öffnen

3

Zitationen

6

Autoren

2025

Jahr

Abstract

Objective: To evaluate the reliability, accuracy, and clarity of responses generated by 7 contemporary artificial intelligence chatbots in answering patient-focused questions about age-related hearing loss and sudden sensorineural hearing loss, and to compare these outputs to expert-authored guideline responses as well as layperson ratings. Study design: Cross-sectional. Setting: Academic medical center. Patients: Not applicable. Ten independent layperson raters, all over the age of 18, recruited from personal networks, assessed a subset of chatbot and expert responses. Interventions: Patient-centered questions, derived from official clinical practice guidelines for hearing loss, were submitted to 7 artificial intelligence chatbots from 3 major development groups. Responses were rated by a blinded panel of 5 otolaryngologists for accuracy, extensiveness, misleading content, quality of cited references, and overall reliability. A panel of 10 independent layperson raters, all over the age of 18, recruited from personal networks, assessed a subset of chatbot and expert responses. Main outcome measures: Proportion of chatbot answers rated fully accurate by expert panel; mean layperson clarity and trustworthiness scores; frequency of misleading information and high-quality references. Results: The most advanced chatbots achieved full guideline-concordant accuracy for up to 50% of questions, while earlier models ranged from 25% to 37.5%. All models performed highly for extensiveness and reference quality. Layperson ratings were highest for gold-standard expert answers, the latest chatbots approached these levels for both clarity and trustworthiness (mean scores: 4.7 to 4.8 out of 5; 95% CI: 4.67–4.85), and differences between models were of moderate-to-large effect size (η 2 =0.29 to 0.30). Misleading content was rare and typically not clinically significant. Conclusions: Modern artificial intelligence chatbots can provide clear and generally reliable patient education for hearing loss, but full guideline concordance remains inconsistent. Expert oversight is advised to ensure clinical accuracy.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationAI in Service InteractionsDigital Mental Health Interventions
Volltext beim Verlag öffnen