Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Analyzing the Role of AI in Resident Education: An Evaluation of ChatGPT on Ophthalmology Trainee Examination Questions by Subtopic
0
Zitationen
7
Autoren
2025
Jahr
Abstract
Background: ChatGPT is a large-scale language model trained on various datasets to learn, analyze, and generate human-like answers to user’s questions. To assess its applicability to medical education, more information is required to understand whether its analyses can provide accurate and coherent responses to questions. The aim of this study was to characterize ChatGPT responses to ophthalmology questions according to subtopic to determine where the system might be used reliably in resident education and where its performance remains weak. Methods: Ophthalmology questions were obtained from a widely utilized study resource, OphthoQuestions. Thirteen sections, each with a differing ophthalmic subtopic, were sampled, and questions were collected from each section. Questions containing images or tables were excluded. Of 163 questions and their respective answer choices, 131 were input into ChatGPT-3.5. The accuracy of ChatGPT by subtopic was analyzed using Excel. ChatGPT responses were evaluated via the properties of natural coherence. Incorrect responses were categorized as logical fallacy, informational fallacy, or explicit fallacy. Statistical significance of categorical variables were analyzed using the χ2 test. Results: ChatGPT answered 71 of 131 questions correctly (54.2%). Accuracy in each subtopic was as follows: general medicine (90%), oculoplastics (70%), retina and vitreous (70%) cornea (30%), fundamentals (40%), optics (40%), pediatrics (40%), glaucoma (50%), lens and cataract (50%), neuro-ophthalmology (60%), pathology and tumors (60%), refractive surgery (55), and uveitis (50%). Logical reasoning, internal information, and external information were identified in 82.4%, 100%, and 83.2% of the responses, respectively. The use of logical reasoning (P = 0.003) and external information (P = 0.02) was found to be statistically significant when stratified by correct and incorrect responses. Conclusion: ChatGPT scored higher in general medicine, oculoplastics, and retina and vitreous than in cornea, fundamentals, optics, and pediatrics. Identifying subtopics in which ChatGPT performs less well allows learners to acquire appropriate supplemental resources in these areas.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.436 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.311 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.753 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.523 Zit.