Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of ChatGPT-4 as an Auxiliary Tool: Evaluation of Accuracy and Repeatability on Orthodontic Radiology Questions
1
Zitationen
6
Autoren
2025
Jahr
Abstract
<b>Background:</b> Large language models (LLMs) are increasingly considered in dentistry, yet their accuracy in orthodontic radiology remains uncertain. This study evaluated the performance of ChatGPT-4 on questions aligned with current radiology guidelines. <b>Methods:</b> Fifty short, guideline-anchored questions were authored; thirty were pre-selected a priori for their diagnostic relevance. Using the ChatGPT-4 web interface in March 2025, we obtained 30 answers per item (900 in total) across two user accounts and three times of day, each in a new chat with a standardised prompt. Two blinded experts graded all responses on a 3-point scale (0 = incorrect, 1 = partially correct, 2 = correct); disagreements were adjudicated. The primary outcome was strict accuracy (proportion of answers graded 2). Secondary outcomes were partial-credit performance (mean 0-2 score) and inter-rater agreement using multiple coefficients. <b>Results:</b> Strict accuracy was 34.1% (95% CI 31.0-37.2), with wide item-level variability (0-100%). The mean partial-credit score was 1.09/2.00 (median 1.02; IQR 0.53-1.83). Inter-rater agreement was high (percent agreement: 0.938, with coefficients indicating substantial to almost-perfect reliability). <b>Conclusions:</b> In the conditions of this study, ChatGPT-4 demonstrated limited strict accuracy yet substantial reliability in expert grading when applied to orthodontic radiology questions. These findings underline its potential as a complementary educational and decision-support resource while also highlight its present limitations. Its role should remain supportive and informative, never replacing the critical appraisal and professional judgement of the clinician.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.460 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.341 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.791 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.536 Zit.