Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Large Language Models Evaluation of Medical Licensing Examination Using GPT-4.0, ERNIE Bot 4.0, and GPT-4o
2
Zitationen
6
Autoren
2026
Jahr
Abstract
This study systematically evaluated the performance of three advanced large language models (LLMs)-GPT-4.0, ERNIE Bot 4.0, and GPT-4o-in the 2023 Chinese Medical Licensing Examination. Employing a dataset of 600 standardized questions, we analyzed the accuracy of each model in answering questions from three comprehensive sections: Basic Medical Comprehensive, Clinical Medical Comprehensive, and Humanities and Preventive Medicine Comprehensive. Our results demonstrate that both ERNIE Bot 4.0 and GPT-4o significantly outperformed GPT-4.0, achieving accuracies above the national pass mark. The study further examined the strengths and limitations of each model, providing insights into their applicability in medical education and potential areas for future improvement. These findings underscore the promise and challenges of deploying LLMs in multilingual medical education, suggesting a pathway towards integrating AI into medical training and assessment practices.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.402 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.270 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.702 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.507 Zit.