Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
ClinicRealm: Re-evaluating large language models with conventional machine learning for non-generative clinical prediction tasks
0
Zitationen
12
Autoren
2026
Jahr
Abstract
Large Language Models (LLMs) are increasingly deployed in medicine. However, their utility for non-generative clinical prediction is under-evaluated, and they are often assumed to be inferior to specialized models, creating potential for misuse and misunderstanding. To address this, our ClinicRealm benchmark systematically evaluates 15 GPT-style LLMs, 5 BERT-style models, and 11 traditional methods on unstructured clinical notes and structured Electronic Health Records (EHR) across predictive performance, reasoning, fairness, etc. Our findings reveal a significant shift: on clinical notes, leading zero-shot LLMs (e.g., DeepSeek-V3.1-Think, GPT-5) now decisively outperform finetuned BERT models. On structured EHRs, while specialized models excel with ample data, advanced LLMs demonstrate potent zero-shot capabilities, often surpassing conventional models in data-scarce settings. Notably, leading open-source LLMs match or exceed their proprietary counterparts. This provides compelling evidence that modern LLMs are competitive tools for clinical prediction, necessitating a re-evaluation of model selection strategies by health data scientists and developers.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.801 Zit.
Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data
2005 · 10.558 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.993 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.605 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.133 Zit.