Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Quality of artificial intelligence‐generated responses on pediatric celiac disease: Comparative assessment of Open AI ChatGPT and Google Gemini

2026·0 Zitationen·JPGN ReportsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Abstract Objectives Many individuals use artificial intelligence tools such as Open AI's ChatGPT and Google's Gemini to assess their symptoms or learn about a disease. However, studies to evaluate responses provided by artificial intelligence tools are lacking. This study aims to evaluate and compare ChatGPT and Gemini in answering pediatric celiac disease‐specific questions in terms of performance across accuracy, safety, completeness, and understandability. Methods We developed 34 questions that mimicked questions that parents ask about pediatric celiac disease in four categories: disease symptoms and definition, diagnosis, gluten challenge, and management. Questions were posed to ChatGPT 5.0 and Gemini Pro 2.5. Responses were compared with North American Society for Pediatric Gastroenterology, Hepatology, and Nutrition (NASPGHAN) and European Society for Pediatric Gastroenterology, Hepatology, and Nutrition (ESPGHAN) guidelines, or clinical best practices when guidelines were absent. Three board certified Pediatric Gastroenterologists rated each response on a 5‐point Likert scale across four domains: accuracy, safety, completeness, and understandability. Word count and Flesch–Kincaid grade level (FKGL) were recorded. Results Average scores were ≥4.5 for ChatGPT and ≥4.3 for Gemini across all four domains. Gemini responses were significantly longer and had higher FKGL. Interclass correlation coefficients showed good agreement for all rating domains except for Gemini safety ratings and understandability ratings for both models. Conclusions ChatGPT and Gemini both generated highly accurate, safe, complete, and understandable information for common questions about celiac disease. Performance decreased when guidelines were lacking or there was clinical ambiguity. However, readability scores (FKGL) for both models exceeded standards for patient education recommendations for sixth to eighth grade reading level or below suggesting there may be barrier to understandability for the general population.

Autoren

Institutionen

Themen

Celiac Disease Research and ManagementArtificial Intelligence in Healthcare and EducationNutrition, Genetics, and Disease

Volltext beim Verlag öffnen

Quality of artificial intelligence‐generated responses on pediatric celiac disease: Comparative assessment of Open AI ChatGPT and Google Gemini

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen