Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A pilot study of the performance of Chat GPT and other large language models on a written final year periodontology exam

2025·12 Zitationen·BMC Medical EducationOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Large Language Models (LLMs) such as Chat GPT are being increasingly utilized by students in education with reportedly adequate academic responses. Chat GPT is expected to learn and improve with time. Thus, the aim was to longitudinally compare the performance of the current versions of Chat GPT-4/GPT4o with that of final-year DDS students on a written periodontology exam. Other current non-subscription LLMs were also compared to the students. Chat GPT-4, guided by the exam parameters, generated answers as 'Run 1' and 6 months later as as 'Run 2'. Chat GPT-4o generated answers as 'Run 3' at 15 months later. All LLMs and student scripts were marked independently by two periodontology lecturers (Cohen's Kappa value 0.71). 'Run 1' and 'Run 3' generated statistically significantly (p < 0.001) higher mean scores of 78% and 77% compared to the students (60%). The mean scores of Chat GPT-4 and GPT4o were also similar to that of the best student. 'Run 2' performed at the level of the students but underperformed with generalizations, more inaccuracies and incomplete answers compared to 'Run 1' and 'Run 3'. This variability for 'Run 2' may be due to outdated data sources, hallucinations and inherent LLM limitations such as online traffic, availability of datasets and resources. Other non-subscription LLMs such as Claude, DeepSeek, Gemini and Le Chat also produced statistically significantly (p < 0.001) higher scores compared to the students. Claude was the best performing LLM with more comprehensive answers. LLMs such as Chat GPT may provide summaries and model answers in clinical undergraduate periodontology education. However, the result must be interpreted with caution regarding academic accuracy and credibility especially in a health care profession.

Autoren

Institutionen

University of the West Indies(TT)

Themen

Artificial Intelligence in Healthcare and EducationCOVID-19 diagnosis using AIClinical Reasoning and Diagnostic Skills

Volltext beim Verlag öffnen

A pilot study of the performance of Chat GPT and other large language models on a written final year periodontology exam

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen