OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 16.04.2026, 13:06

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance of ChatGPT-4 as an Auxiliary Tool: Evaluation of Accuracy and Repeatability on Orthodontic Radiology Questions

2025·1 Zitationen·BioengineeringOpen Access
Volltext beim Verlag öffnen

1

Zitationen

6

Autoren

2025

Jahr

Abstract

<b>Background:</b> Large language models (LLMs) are increasingly considered in dentistry, yet their accuracy in orthodontic radiology remains uncertain. This study evaluated the performance of ChatGPT-4 on questions aligned with current radiology guidelines. <b>Methods:</b> Fifty short, guideline-anchored questions were authored; thirty were pre-selected a priori for their diagnostic relevance. Using the ChatGPT-4 web interface in March 2025, we obtained 30 answers per item (900 in total) across two user accounts and three times of day, each in a new chat with a standardised prompt. Two blinded experts graded all responses on a 3-point scale (0 = incorrect, 1 = partially correct, 2 = correct); disagreements were adjudicated. The primary outcome was strict accuracy (proportion of answers graded 2). Secondary outcomes were partial-credit performance (mean 0-2 score) and inter-rater agreement using multiple coefficients. <b>Results:</b> Strict accuracy was 34.1% (95% CI 31.0-37.2), with wide item-level variability (0-100%). The mean partial-credit score was 1.09/2.00 (median 1.02; IQR 0.53-1.83). Inter-rater agreement was high (percent agreement: 0.938, with coefficients indicating substantial to almost-perfect reliability). <b>Conclusions:</b> In the conditions of this study, ChatGPT-4 demonstrated limited strict accuracy yet substantial reliability in expert grading when applied to orthodontic radiology questions. These findings underline its potential as a complementary educational and decision-support resource while also highlight its present limitations. Its role should remain supportive and informative, never replacing the critical appraisal and professional judgement of the clinician.

Ähnliche Arbeiten