Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluation of the effectiveness of artificial intelligence models in the Polish State Specialization Exam in orthopedics and traumatology of the musculoskeletal system
0
Zitationen
4
Autoren
2026
Jahr
Abstract
Introduction. The rapid development of language models such as ChatGPT and Google Gemini, and their growing interest in medicine, has prompted an analysis of their potential in the context of specialty examinations. Their use in orthopaedics and traumatology, which require extensive theoretical and clinical knowledge, is particularly significant. This study aimed to evaluate whether ChatGPT-5 and Google Geminicould pass the Polish orthopaedics and traumatology specialisation examination using official questionsfrom the Centre for Medical Examinations (CEM, Poland). Materials and methods. We analysed 468 validated questions from four Polish State Specialization Exams in orthopaedics and traumatology of the musculoskeletal system (2024–2025) and classified them into different categories. ChatGPT-5 and Gemini 2.5 answered all items in standardised, independent sessions. Accuracy was compared with the official answer key, inter-rater agreement assessed using Cohen’s kappa, and group differences evaluated with Chi-square/Fisher’s tests, Mann–Whitney U tests, and t-tests (α = 0.05). Results. ChatGPT-5.0 achieved consistently high accuracy across question types, cognitive domains, image versus text formats, and orthopaedic subspecialties, outperforming Gemini 2.5 in all comparisons. In four examination sessions, ChatGPT-5.0 reached performance comparable to or exceeding that of physicians, although the best performance in most sessions was achieved by a physician. Overall, ChatGPT-5.0 attained 80.2% accuracy, significantly higher than both Gemini 2.5 (75.7%) and physicians (74.3%), whereas Gemini 2.5 showed greater variability across domains and lower mean performance. Conclusions. Rapidly evolving AI models, such as ChatGPT-5.0, demonstrate high scores in orthopaedic examinations, highlighting their potential for education and clinical support. However, further research is required to evaluate their safety, limitations, and integration into clinical practice
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.402 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.270 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.702 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.507 Zit.