Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance and reliability of state-of-the-art LLMs in complex hand surgery scenarios: A prospective cross-sectional, double-blinded study
0
Zitationen
1
Autoren
2026
Jahr
Abstract
< 0.001). Notably, Gemini and Grok demonstrated consistently high performance with minimal variability, while ChatGPT, particularly DeepSeek, exhibited considerable inconsistency in complex clinical judgments.ConclusionGemini 2 and Grok 3 showed reliable and clinically relevant performance, positioning them as promising adjunctive tools for decision-making and education in hand surgery. The limitations in ChatGPT-5 and the significant shortcomings of DeepSeek underscore the necessity for cautious deployment and continued refinement.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.652 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.567 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.083 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.856 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.