Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Does AI have utility in medical student surgical education? A comparative analysis of chatbots in answering standardized surgical multiple-choice questions
0
Zitationen
5
Autoren
2025
Jahr
Abstract
Abstract Purpose Artificial intelligence (AI) chatbots have potential as adjunctive medical education tools. AI chatbots can provide question-specific explanations with more supplemental content for students to learn from self-assessments. Chatbot performance on general surgery exams has not been studied at the medical student level. This study aims to assess the accuracy of popular low-cost chatbots—ChatGPT, Gemini, and Claude—in answering National Board of Medical Examiners (NBME) surgery practice questions for use in medical student education. Character count, as a proxy for question complexity, was assessed in relation to accuracy. Methods ChatGPT-4o mini, ChatGPT o3-mini, Gemini 2.0 Flash, and Claude 3.5 Sonnet were prompted to answer 20 multiple-choice questions from NBME Surgery Sample Items and provide justification on three attempts. Character count, answer choice, and explanation were recorded for each question. A logistic regression model assessed the relationship between accuracy and question character count. Results ChatGPT o3-mini and Claude 3.5 Sonnet scored 100% on all three attempts. Gemini 2.0 Flash scored 95% on all three attempts with an odds ratio of 0.904 [0.446, 1.831] ( p = 0.7794). ChatGPT-4o mini scored 95%, averaged on three attempts with odds ratios of 0.998 [0.993, 1.003] ( p = 0.4669). There was no statistically significant relationship between character count and accuracy. Conclusions The lack of correlation between question length and response accuracy implies that question complexity may not impact performance of these models. ChatGPT o3-mini and Claude 3.5 Sonnet outperform their counterparts on standardized general surgery exam questions, showcasing potential as supplementary tools for surgery students. ChatGPT-4o mini and Gemini 2.0 Flash have room for improvement to concurrently serve this purpose. Future models can continue to familiarize themselves with core surgical concepts to provide more comprehensive explanations.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.418 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.288 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.726 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.516 Zit.