Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance and reliability of large language models on the European Board of Hand Surgery examination: a multi-model evaluation study

2026·0 Zitationen·Journal of Hand Surgery (European Volume)Open Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

INTRODUCTION: Artificial intelligence (AI) has demonstrated transformative potential in medical education and assessment, with large language models achieving competitive results across multiple high-stakes examinations. In this study, we evaluated the performance and inter-run reliability of 10 widely adopted large language models (LLMs) on the European Board of Hand Surgery written examination. METHODS: ). RESULTS: of 0.739. The overall reliability across the LLMs was 0.821. CONCLUSIONS: Contemporary LLMs show robust and reproducible performance on a complex surgical certification examination, with proprietary architectures tending to outperform open-source counterparts. Although several models reached or exceeded an illustrative pass threshold, persistent gaps in subspecialty knowledge remain such as congenital anomalies and complex reconstructions. Therefore, LLMs may assist in structured learning and examination preparation but require specialist oversight and remain unsuitable for independent subspecialty decision-making. LEVEL OF EVIDENCE: Not applicable.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationDiversity and Career in MedicineClinical Reasoning and Diagnostic Skills

Volltext beim Verlag öffnen

Performance and reliability of large language models on the European Board of Hand Surgery examination: a multi-model evaluation study

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen