Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative diagnostic accuracy of multiple large language models in oral and maxillofacial radiology specialty examinations: a 13-year analysis of performance and topic trends

2026·0 Zitationen·BMC Oral HealthOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

BACKGROUND: To the best of our knowledge, this is the first study in oral and maxillofacial radiology to comprehensively include all examination questions, to systematically analyse their topic distribution across years and examination periods, and to concurrently compare multiple contemporary large language models within a unified methodological framework. This study aimed to compare the accuracy performance of six different artificial intelligence (AI) systems based on large language models (LLMs) of questions asked in the field of oral and maxillofacial radiology in the Dental Specialization Examination (DSE) over the past 13 years, and to analyze the subject matter in detail. METHODS: A total of 200 oral and maxillofacial radiology questions from the DSE held between 2012 and 2025 were included in the analysis. The questions were grouped according to their topics and divided into early-late periods (2012-2018 and 2019-2025) to observe changes over time. ChatGPT-5.2, ChatGPT-4.0, Gemini-3, Claude 4.5, Microsoft Copilot, and Perplexity AI were tested using the original question formats. The models' answers were evaluated against the official answer key. In addition, the questions were analyzed according to exam years and periods. RESULTS: In the early and late periods, ChatGPT-5.2 showed accuracy rates of 91.9% and 95.7%, respectively. This was followed by ChatGPT-4.0 (79.8% - 82.8%). Differences between the models were statistically significant across periods (p < 0.001). Oral diseases and oral pathology retained their importance in both early and late stages of oral health. Furthermore, while oral diseases and oral pathology were more frequently inquired about in the spring, advanced imaging techniques, radiation physics, and temporomandibular joint disorders were also included in the autumn surveys. CONCLUSION: ChatGPT-5.2 demonstrated the highest and most consistent accuracy among the evaluated models in DSE oral and maxillofacial radiology questions. In the field of oral and maxillofacial radiology, these model prototypes have the potential to generate new knowledge. Interest in oral diseases and pathology has remained important, while attention to jaw lesions and advanced imaging techniques has increased in recent years.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationRadiology practices and educationRadiomics and Machine Learning in Medical Imaging

Volltext beim Verlag öffnen

Comparative diagnostic accuracy of multiple large language models in oral and maxillofacial radiology specialty examinations: a 13-year analysis of performance and topic trends

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen