Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparative evaluation of proprietary and open-source large language models for systematic multi-source information extraction in interventional oncology
0
Zitationen
9
Autoren
2025
Jahr
Abstract
Purpose To compare proprietary (GPT-4o, Gemini 1.5 Pro) and open-source (Llama 3.1 70B, Llama 3.1 405B) large language models (LLMs) for extracting clinically relevant variables from transarterial chemoembolization (TACE) reports in patients with hepatocellular carcinoma (HCC).Methods Retrospective analysis of 556 anonymized longitudinal TACE-related reports (radiology, interventional procedure, and clinical follow-up) from 50 patients with HCC treated between 2012 and 2024 at a single tertiary center was carried out. Models extracted predefined binary variables (e.g., modified Response Evaluation Criteria in Solid Tumors [mRECIST] tumor response, alpha-fetoprotein [AFP] dynamics, Barcelona Clinic Liver Cancer [BCLC] stage) and ordinal variables (e.g., liver segment involvement, vascular invasion, follow-up assessment) using a standardized system prompt and output template. Model performance was assessed by accuracy, ordinal scores, and longitudinal error rates using mixed-effects regression with patient-level random intercepts.Results Proprietary models outperformed open-source models. GPT-4o and Gemini achieved the highest mean accuracies for binary variables (0.87 ± 0.21 and 0.85 ± 0.16) and ordinal variables (4.15/5 and 4.10/5), significantly exceeding both Llama models (p < 0.05). GPT-4o showed the lowest longitudinal error rate for binary variables (0.01 vs 0.09-0.21 for the other models), indicating greater robustness over time. All models showed poor performance in vascular invasion detection and follow-up assessment.Conclusion Proprietary LLMs can accurately extract most key TACE-related variables from routine clinical reports and may support decision-making in interventional oncology; however, all models showed poor performance in vascular invasion detection and follow-up assessment, so expert human oversight remains essential.
Ähnliche Arbeiten
New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1)
2008 · 29.091 Zit.
TNM Classification of Malignant Tumours
1987 · 16.123 Zit.
A survey on deep learning in medical image analysis
2017 · 13.806 Zit.
Reduced Lung-Cancer Mortality with Low-Dose Computed Tomographic Screening
2011 · 10.844 Zit.
The American Joint Committee on Cancer: the 7th Edition of the AJCC Cancer Staging Manual and the Future of TNM
2010 · 9.125 Zit.