Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
The evaluation illusion of large language models in medicine
17
Zitationen
4
Autoren
2025
Jahr
Abstract
While large language models (LLMs) hold promise for transforming clinical healthcare, current comparisons and benchmark evaluations of large language models in medicine often fail to capture real-world efficacy. Specifically, we highlight how key discrepancies arising from choices of data, tasks, and metrics can limit meaningful assessment of translational impact and cause misleading conclusions. Therefore, we advocate for rigorous, context-aware evaluations and experimental transparency across both research and deployment.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.740 Zit.
Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data
2005 · 10.547 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.950 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.554 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.071 Zit.