Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluation of the effectiveness of artificial intelligence models in the Polish State Specialization Exam in orthopedics and traumatology of the musculoskeletal system

2026·0 Zitationen·Chirurgia Narządów Ruchu i Ortopedia PolskaOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Introduction. The rapid development of language models such as ChatGPT and Google Gemini, and their growing interest in medicine, has prompted an analysis of their potential in the context of specialty examinations. Their use in orthopaedics and traumatology, which require extensive theoretical and clinical knowledge, is particularly significant. This study aimed to evaluate whether ChatGPT-5 and Google Gemini could pass the Polish orthopaedics and traumatology specialisation examination using official questions from the Centre for Medical Examinations (CEM, Poland). Materials and methods. We analysed 468 validated questions from four Polish State Specialization Exams in orthopaedics and traumatology of the musculoskeletal system (2024–2025) and classified them into different categories. ChatGPT-5 and Gemini 2.5 answered all items in standardised, independent sessions. Accuracy was compared with the official answer key, inter-rater agreement assessed using Cohen’s kappa, and group differences evaluated with Chi-square/Fisher’s tests, Mann–Whitney U tests, and t-tests (α = 0.05). Results. ChatGPT-5.0 achieved consistently high accuracy across question types, cognitive domains, image versus text formats, and orthopaedic subspecialties, outperforming Gemini 2.5 in all comparisons. In four examination sessions, ChatGPT-5.0 reached performance comparable to or exceeding that of physicians, although the best performance in most sessions was achieved by a physician. Overall, ChatGPT-5.0 attained 80.2% accuracy, significantly higher than both Gemini 2.5 (75.7%) and physicians (74.3%), whereas Gemini 2.5 showed greater variability across domains and lower mean performance. Conclusions. Rapidly evolving AI models, such as ChatGPT-5.0, demonstrate high scores in orthopaedic examinations, highlighting their potential for education and clinical support. However, further research is required to evaluate their safety, limitations, and integration into clinical practice

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsRadiology practices and education

Volltext beim Verlag öffnen

Evaluation of the effectiveness of artificial intelligence models in the Polish State Specialization Exam in orthopedics and traumatology of the musculoskeletal system

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen