Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating the reliability of large language models for clinical data extraction in bladder cancer prognosis
1
Zitationen
11
Autoren
2025
Jahr
Abstract
Advances in natural language processing (NLP) and machine learning could assist human users in clinical data extraction from unstructured electronic medical records (EMRs). This study investigates the accuracy and consistency of several Large Language Models (LLMs) - including Dolly, Vicuna, Llama, and GPT-4 - in extracting critical clinical information pertinent to bladder cancer survival prediction. Using EMRs from 163 bladder cancer patients, we assessed the impact on LLM performance by factors such as differences in the trained models, model evolution, input text length, and sequencing of case inputs. GPT-4 demonstrated superior performance with Fleiss' Kappa values exceeding 0.97, accuracy consistently above 93%, and survival prediction metrics closely aligned with ground truth (AUC ± 0.02). Among offline models, Llama-2.0-13b and Llama-3.3-70b exhibited the highest reliability in both information extraction and survival prediction. This study underscores the potential of LLMs to automate clinical data extraction for predictive modeling while highlighting the challenges related to LLM variability and reliability.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.486 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.788 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.341 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.791 Zit.
Artificial intelligence in healthcare: past, present and future
2017 · 4.462 Zit.