Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating ChatGPT and Google Gemini Performance and Implications in Turkish Dental Education
16
Zitationen
1
Autoren
2025
Jahr
Abstract
Artificial intelligence (AI) has emerged as a transformative tool in education, particularly in specialized fields such as dentistry. This study evaluated the performance of four advanced AI models - ChatGPT-4o (San Francisco, CA: OpenAI), ChatGPT-o1, Gemini 1.5 Pro (Mountain View, CA: Google LLC), and Gemini 2.0 Advanced, in the Turkish Dental Specialty Examination (DUS) for 2020 and 2021. A total of 240 questions, comprising 120 questions per year from basic and clinical sciences, were analyzed. AI models were assessed based on their accuracy in providing correct answers compared to the official answer keys. For the 2020 DUS, ChatGPT-o1 and Gemini 2.0 Advanced achieved the highest accuracy rates of 93.70% and 96.80%, respectively, with net scores of 112.50 and 115 out of 120 questions. ChatGPT-4o and Gemini 1.5 Pro followed with accuracy rates of 83.33% and 85.40%. For the 2021 DUS, ChatGPT-o1 again demonstrated the highest accuracy at 97.88% (115.50 net score), closely followed by Gemini 2.0 Advanced at 96.82% (114.25 net score). Overall, ChatGPT-4o and Gemini 1.5 Pro scored lower for 2021, achieving accuracy rates of 88.35% and 93.64%, respectively. Combining results from both years (238 total questions), ChatGPT-o1 and Gemini 2.0 Advanced achieved accuracy rates of 97.46% (230 correct answers, 95% CI: 94.62%, 100.00%) and 97.90% (231 correct answers, 95% CI: 94.62%, 100.00%), respectively, significantly outperforming ChatGPT-4o (88.66%, 211 correct answers, 95% CI: 85.43%, 91.89%) and Gemini 1.5 Pro (91.60%, 218 correct answers, 95% CI: 87.75%, 95.45%). Statistical analysis revealed significant differences among the models (p = 0.0002). Pairwise comparisons demonstrated that ChatGPT-4o underperformed significantly compared to ChatGPT-o1 (p = 0.0016) and Gemini 2.0 Advanced (p = 0.0007) after Bonferroni correction. The consistently high accuracy rates and narrow confidence intervals for the top-performing models underscore their superior reliability and performance in answering the DUS questions. Generative AI modules such as ChatGPT-01 and Gemini 2.0 have the potential to enhance dental board exam preparation through question evaluation. While the AI modules appear to outperform humans on DUS questions, the study raises a concern about the ethical uses of AI and the true justification and value of DUS examinations as dental competency examinations. A higher level of knowledge evaluation should be considered. This research contributes to the growing body of literature on AI applications in specialized knowledge domains and provides a foundation for further exploration of its integration into dental education.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.418 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.288 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.726 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.516 Zit.