OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 05.05.2026, 21:59

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance and reliability of large language models on the European Board of Hand Surgery examination: a multi-model evaluation study

2026·0 Zitationen·Journal of Hand Surgery (European Volume)Open Access
Volltext beim Verlag öffnen

0

Zitationen

6

Autoren

2026

Jahr

Abstract

INTRODUCTION: Artificial intelligence (AI) has demonstrated transformative potential in medical education and assessment, with large language models achieving competitive results across multiple high-stakes examinations. In this study, we evaluated the performance and inter-run reliability of 10 widely adopted large language models (LLMs) on the European Board of Hand Surgery written examination. METHODS: ). RESULTS: of 0.739. The overall reliability across the LLMs was 0.821. CONCLUSIONS: Contemporary LLMs show robust and reproducible performance on a complex surgical certification examination, with proprietary architectures tending to outperform open-source counterparts. Although several models reached or exceeded an illustrative pass threshold, persistent gaps in subspecialty knowledge remain such as congenital anomalies and complex reconstructions. Therefore, LLMs may assist in structured learning and examination preparation but require specialist oversight and remain unsuitable for independent subspecialty decision-making. LEVEL OF EVIDENCE: Not applicable.

Ähnliche Arbeiten