Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessing the Accuracy and Completeness of AI-Generated Dental Responses: An Evaluation of the Chat-GPT Model
3
Zitationen
6
Autoren
2025
Jahr
Abstract
<b>Background:</b> The rapid advancement of artificial intelligence (AI) in healthcare has opened new opportunities, yet the clinical validation of AI tools in dentistry remains limited. <b>Objectives:</b> This study aimed to assess the performance of ChatGPT in generating accurate and complete responses to academic dental questions across multiple specialties, comparing the capabilities of GPT-4 and GPT-3.5 models. <b>Methodology:</b> A panel of academic specialists from eight dental specialties collaboratively developed 48 clinical questions, classified by consensus as easy, medium, or hard, and as requiring either binary (yes/no) or descriptive responses. Each question was sequentially entered into both GPT-4 and GPT-3.5 models, with instructions to provide guideline-based answers. The AI-generated responses were independently evaluated by the specialists for accuracy (6-point Likert scale) and completeness (3-point Likert scale). Descriptive and inferential statistics were applied, including Mann-Whitney U and Kruskal-Wallis tests, with significance set at <i>p</i> < 0.05. <b>Results:</b> GPT-4 consistently outperformed GPT-3.5 in both evaluation domains. The median accuracy score was 6.0 for GPT-4 and 5.0 for GPT-3.5 (<i>p</i> = 0.02), while the median completeness score was 3.0 for GPT-4 and 2.0 for GPT-3.5 (<i>p</i> < 0.001). GPT-4 demonstrated significantly higher overall accuracy (5.29 ± 1.1) and completeness (2.44 ± 0.71) compared to GPT-3.5 (4.5 ± 1.7 and 1.69 ± 0.62, respectively; <i>p</i> = 0.024 and <0.001). When stratified by specialty, notable improvements with GPT-4 were observed in Periodontology, Endodontics, Implantology, and Oral Surgery, particularly in completeness scores. <b>Conclusions:</b> In academic dental settings, GPT-4 provided more accurate and complete responses than GPT-3.5. Despite both models showing potential, their clinical application should remain supervised by human experts.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.456 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.332 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.779 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.533 Zit.