Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of ChatGPT 4, ChatGPT 3.5, Gemini 1.5, and Copilot in Solving Oral and Maxillofacial Surgery Questions Asked in the Turkish Dentistry Specialization Education Entrance Exam: Comparison Study
0
Zitationen
1
Autoren
2026
Jahr
Abstract
Objective: The study aims to analyze and compare the performance of 4 leading large language models (LLMs) in answering questions related to oral and maxillofacial surgery, as posed in the Turkish Dentistry Specialization Education Entrance Exam. Material and Methods: A total of 123 oral and maxillofacial surgery questions, without figures or graphs, published between 2012-2021, were analyzed. The study evaluated the performance of ChatGPT 4, ChatGPT 3.5, Gemini 1.5, and Copilot. The correct answer rates of LLMs were compared according to the years in which the questions were asked and oral and maxillofacial surgery topics. Results: In the study, the highest correct response rate was obtained with ChatGPT-4 (91.06%), followed by Copilot (86.99%), ChatGPT 3.5 (82.11%), and Gemini 1.5 (79.67%). However, no statistically significant difference was observed regarding correct response rates among the 4 LLMs examined in the study (p=0.059). All LLMs correctly answered 66.66% of orofacial infection questions, 80% of orthognathic surgery questions, and 100% of orofacial pain questions. ChatGPT 4 and Copilot answered 100% of dental implantology questions correctly. Conclusion: The LLMs examined in the study exhibited acceptable correct response rates (79.67% to 91.06%), and their performances were similar to each other. The results of the study demonstrate the possible of LLMs to be used as educational support instruments in oral and maxillofacial surgery education.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.740 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.649 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.202 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.886 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.