Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Enhancing large language model clinical support information with machine learning risk and explainability: a feasibility study
1
Zitationen
6
Autoren
2026
Jahr
Abstract
Abstract Background C urrent machine learning (ML) prediction models offer limited guidance for individualized actionable management. Large language models (LLMs) can transform ML model-predicted risk estimates with Shapley Additive Explanations (SHAP) into clinically meaningful support information, yet the added value of incorporating ML-derived data and the relative performance of different LLMs remain uncertain. To address these gaps, we used our previously developed IMPACT framework to evaluate the quality of LLM-generated outputs. Methods In this retrospective analysis of MIMIC-IV v3.1 intensive care unit (ICU) admissions, we applied a previously developed XGBoost model to estimate ICU mortality risk and derive corresponding SHAP values. GPT-4o transformed the predicted mortality risk, clinical predictors, and their SHAP values into risk interpretation, recommended examinations and management. The primary analysis examined whether augmenting LLM inputs with predicted mortality risk and SHAP values improved clinical response quality, as assessed by the IMPACT framework. We further compared GPT-4o with seven contemporary LLMs; all eight models generated clinical support responses that were scored by Claude 3.7 Sonnet to assess performance differences. Results Claude 3.7 Sonnet showed excellent agreement with human IMPACT ratings (intraclass correlation coefficient [ICC] 0.979, 95% CI 0.973–0.984) and o3-mini (ICC 0.971, 95% CI 0.964–0.980). In the primary analysis, adding predicted ICU mortality risk and SHAP values significantly increased GPT-4o IMPACT scores across prompting strategies. GPT-5 mini (96.0) and gpt-oss-120B (93.4) outperformed GPT-4o (90.4; both p < 0.001) for interpretability and quality. Conclusions Combining ML-derived risk, SHAP explanations and LLMs may modestly improve ICU clinical support information, while LLM-based evaluators demonstrated feasibility for scalable evaluation of generated clinical content.
Ähnliche Arbeiten
The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3)
2016 · 27.540 Zit.
pROC: an open-source package for R and S+ to analyze and compare ROC curves
2011 · 13.846 Zit.
APACHE II
1985 · 13.637 Zit.
Definitions for Sepsis and Organ Failure and Guidelines for the Use of Innovative Therapies in Sepsis
1992 · 13.190 Zit.
The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure
1996 · 11.540 Zit.