Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A pilot study of the performance of Chat GPT and other large language models on a written final year periodontology exam
10
Zitationen
3
Autoren
2025
Jahr
Abstract
Large Language Models (LLMs) such as Chat GPT are being increasingly utilized by students in education with reportedly adequate academic responses. Chat GPT is expected to learn and improve with time. Thus, the aim was to longitudinally compare the performance of the current versions of Chat GPT-4/GPT4o with that of final-year DDS students on a written periodontology exam. Other current non-subscription LLMs were also compared to the students. Chat GPT-4, guided by the exam parameters, generated answers as 'Run 1' and 6 months later as as 'Run 2'. Chat GPT-4o generated answers as 'Run 3' at 15 months later. All LLMs and student scripts were marked independently by two periodontology lecturers (Cohen's Kappa value 0.71). 'Run 1' and 'Run 3' generated statistically significantly (p < 0.001) higher mean scores of 78% and 77% compared to the students (60%). The mean scores of Chat GPT-4 and GPT4o were also similar to that of the best student. 'Run 2' performed at the level of the students but underperformed with generalizations, more inaccuracies and incomplete answers compared to 'Run 1' and 'Run 3'. This variability for 'Run 2' may be due to outdated data sources, hallucinations and inherent LLM limitations such as online traffic, availability of datasets and resources. Other non-subscription LLMs such as Claude, DeepSeek, Gemini and Le Chat also produced statistically significantly (p < 0.001) higher scores compared to the students. Claude was the best performing LLM with more comprehensive answers. LLMs such as Chat GPT may provide summaries and model answers in clinical undergraduate periodontology education. However, the result must be interpreted with caution regarding academic accuracy and credibility especially in a health care profession.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.418 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.288 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.726 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.516 Zit.