OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 16.04.2026, 02:05

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating the reliability of large language models for clinical data extraction in bladder cancer prognosis

2025·1 Zitationen·Scientific ReportsOpen Access
Volltext beim Verlag öffnen

1

Zitationen

11

Autoren

2025

Jahr

Abstract

Advances in natural language processing (NLP) and machine learning could assist human users in clinical data extraction from unstructured electronic medical records (EMRs). This study investigates the accuracy and consistency of several Large Language Models (LLMs) - including Dolly, Vicuna, Llama, and GPT-4 - in extracting critical clinical information pertinent to bladder cancer survival prediction. Using EMRs from 163 bladder cancer patients, we assessed the impact on LLM performance by factors such as differences in the trained models, model evolution, input text length, and sequencing of case inputs. GPT-4 demonstrated superior performance with Fleiss' Kappa values exceeding 0.97, accuracy consistently above 93%, and survival prediction metrics closely aligned with ground truth (AUC ± 0.02). Among offline models, Llama-2.0-13b and Llama-3.3-70b exhibited the highest reliability in both information extraction and survival prediction. This study underscores the potential of LLMs to automate clinical data extraction for predictive modeling while highlighting the challenges related to LLM variability and reliability.

Ähnliche Arbeiten