OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 15.04.2026, 05:00

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Assessing the Accuracy and Completeness of AI-Generated Dental Responses: An Evaluation of the Chat-GPT Model

2025·3 Zitationen·HealthcareOpen Access
Volltext beim Verlag öffnen

3

Zitationen

6

Autoren

2025

Jahr

Abstract

<b>Background:</b> The rapid advancement of artificial intelligence (AI) in healthcare has opened new opportunities, yet the clinical validation of AI tools in dentistry remains limited. <b>Objectives:</b> This study aimed to assess the performance of ChatGPT in generating accurate and complete responses to academic dental questions across multiple specialties, comparing the capabilities of GPT-4 and GPT-3.5 models. <b>Methodology:</b> A panel of academic specialists from eight dental specialties collaboratively developed 48 clinical questions, classified by consensus as easy, medium, or hard, and as requiring either binary (yes/no) or descriptive responses. Each question was sequentially entered into both GPT-4 and GPT-3.5 models, with instructions to provide guideline-based answers. The AI-generated responses were independently evaluated by the specialists for accuracy (6-point Likert scale) and completeness (3-point Likert scale). Descriptive and inferential statistics were applied, including Mann-Whitney U and Kruskal-Wallis tests, with significance set at <i>p</i> < 0.05. <b>Results:</b> GPT-4 consistently outperformed GPT-3.5 in both evaluation domains. The median accuracy score was 6.0 for GPT-4 and 5.0 for GPT-3.5 (<i>p</i> = 0.02), while the median completeness score was 3.0 for GPT-4 and 2.0 for GPT-3.5 (<i>p</i> < 0.001). GPT-4 demonstrated significantly higher overall accuracy (5.29 ± 1.1) and completeness (2.44 ± 0.71) compared to GPT-3.5 (4.5 ± 1.7 and 1.69 ± 0.62, respectively; <i>p</i> = 0.024 and <0.001). When stratified by specialty, notable improvements with GPT-4 were observed in Periodontology, Endodontics, Implantology, and Oral Surgery, particularly in completeness scores. <b>Conclusions:</b> In academic dental settings, GPT-4 provided more accurate and complete responses than GPT-3.5. Despite both models showing potential, their clinical application should remain supervised by human experts.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationAcademic integrity and plagiarismExplainable Artificial Intelligence (XAI)
Volltext beim Verlag öffnen