Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Accuracy and Temporal Consistency of <scp>ChatGPT</scp> and Gemini in Responding to Textbook and Patient‐Oriented Dental Bleaching Questions: A Multi‐Session Comparative Study
0
Zitationen
3
Autoren
2026
Jahr
Abstract
OBJECTIVE: This study compared the accuracy and temporal consistency of ChatGPT and Gemini in responding to dental bleaching questions across three weekly sessions. MATERIALS AND METHODS: A total of 280 true/false questions were developed comprising 200 textbook-based and 80 patient-oriented frequently asked questions. Both chatbots were queried weekly under controlled conditions. Accuracy was compared using generalized estimating equations, consistency was assessed using Fleiss' kappa, and weekly stability was evaluated using Cochran's Q test. Open-ended responses were scored for quality and misinformation by two evaluators. RESULTS: For textbook questions, ChatGPT achieved significantly higher accuracy than Gemini (77.7% versus 70.5%, p = 0.0009). For frequently asked questions, both chatbots performed comparably (92.9% versus 90.8%, p = 0.252). Temporal consistency was only fair for textbook questions but almost perfect for frequently asked questions in both chatbots. Both chatbots showed significant upward trends in textbook accuracy across sessions. Gemini received higher global quality scores for open-ended responses, while misinformation rates were similarly low. CONCLUSIONS: Within the limitations of this study, ChatGPT achieved significantly higher accuracy than Gemini for textbook-based dental bleaching questions, while both chatbots performed comparably for patient-oriented questions. Temporal consistency differed markedly, with almost perfect consistency for patient-oriented questions and only fair consistency for textbook-based questions. CLINICAL SIGNIFICANCE: Chatbot responses to common patient questions about dental bleaching are generally accurate and consistent, but their reliability drops substantially for specialized academic content, suggesting these tools should complement rather than replace professional clinical judgment.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.652 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.567 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.083 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.856 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.