Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
The AI Doctor Is In: A Comparative Analysis of Chatbot Responses to Patient Questions About Fibroids
0
Zitationen
3
Autoren
2026
Jahr
Abstract
INTRODUCTION: Over the past several years, use of and trust in generative artificial intelligence (AI) and, specifically, in large language model (LLM) chatbots has increased among the general population. Surveys suggest that 17% of U.S. adults now use chatbots to seek healthcare advice, with highest uptake (25%) among younger adults aged 18–29. However, there is concern among users and the medical professional community alike that the information provided by these chatbots may sometimes be incomplete or inaccurate. In their study, Cohen et al (2024, AJOG) demonstrated that there was inter-chatbot variability as well as room for improvement in both the correctness and comprehensiveness of chatbots’ responses to commonly asked patient questions about endometriosis. Few other studies to date have examined the completeness and accuracy of medical information provided by chatbots. OBJECTIVE: Inspired by the work of Cohen et al, our study seeks to assess how correct and thorough the responses of three leading LLM chatbots are to frequently asked patient questions regarding fibroids. METHODS: The authors assigned eleven frequently asked patient questions regarding fibroids to three LLM chatbots: Chat GPT-4 (Open-AI), Claude (Anthropic), and Gemini (Google). Five minimally invasive gynecologic surgeons independently reviewed and rated on the following grading scale the chatbots’ respective responses compared to current guidelines and expert opinion on fibroids: (1) Completely inaccurate, (2) Mostly inaccurate and some accurate, (3) Mostly accurate and some inaccurate, (4) Accurate but incomplete, (5) Accurate and comprehensive. The five graders’ scores were averaged to calculate final scores. RESULTS: Average scores were 4.44 (standard deviation 0.34) for Claude, 3.98 (0.48) for ChatGPT, and 3.80 (0.48) for Gemini. Claude was able to answer all (100%) questions accurately and 7 (64%) questions both accurately and comprehensively according to a majority (≥3) of reviewers, compared to 7 (64%) and 4 (36%) for ChatGPT and 5 (45%) and 1 (9%) for Gemini, respectively. The question “How common are fibroids?” received the highest scoring response (average 4.8) across all chatbots, followed by “Can fibroids cause bleeding?” (4.5). There was greater variability in inter-reviewer scoring for questions pertaining to symptoms and diagnosis (e.g., “How do I know if I have fibroids?”) or treatment (e.g., “What is the treatment for fibroids?”) than for general questions (e.g., “How common are fibroids?”). CONCLUSIONS: While chatbots may provide mostly accurate information regarding patient questions about fibroids, their responses often lack comprehensiveness. Generative AI has the potential to supplement the information provided by medical professionals to the public, but as its presence within the medical field grows, so too should investigation into how this new technology shapes patients’ health literacy.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.436 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.311 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.753 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.523 Zit.