Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Enhancing large language model clinical support information with machine learning risk and explainability: a feasibility study

2026·1 Zitationen·Intensive Care Medicine ExperimentalOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Abstract Background C urrent machine learning (ML) prediction models offer limited guidance for individualized actionable management. Large language models (LLMs) can transform ML model-predicted risk estimates with Shapley Additive Explanations (SHAP) into clinically meaningful support information, yet the added value of incorporating ML-derived data and the relative performance of different LLMs remain uncertain. To address these gaps, we used our previously developed IMPACT framework to evaluate the quality of LLM-generated outputs. Methods In this retrospective analysis of MIMIC-IV v3.1 intensive care unit (ICU) admissions, we applied a previously developed XGBoost model to estimate ICU mortality risk and derive corresponding SHAP values. GPT-4o transformed the predicted mortality risk, clinical predictors, and their SHAP values into risk interpretation, recommended examinations and management. The primary analysis examined whether augmenting LLM inputs with predicted mortality risk and SHAP values improved clinical response quality, as assessed by the IMPACT framework. We further compared GPT-4o with seven contemporary LLMs; all eight models generated clinical support responses that were scored by Claude 3.7 Sonnet to assess performance differences. Results Claude 3.7 Sonnet showed excellent agreement with human IMPACT ratings (intraclass correlation coefficient [ICC] 0.979, 95% CI 0.973–0.984) and o3-mini (ICC 0.971, 95% CI 0.964–0.980). In the primary analysis, adding predicted ICU mortality risk and SHAP values significantly increased GPT-4o IMPACT scores across prompting strategies. GPT-5 mini (96.0) and gpt-oss-120B (93.4) outperformed GPT-4o (90.4; both p < 0.001) for interpretability and quality. Conclusions Combining ML-derived risk, SHAP explanations and LLMs may modestly improve ICU clinical support information, while LLM-based evaluators demonstrated feasibility for scalable evaluation of generated clinical content.

Autoren

Institutionen

Themen

Sepsis Diagnosis and TreatmentArtificial Intelligence in Healthcare and EducationMachine Learning in Healthcare

Volltext beim Verlag öffnen

Enhancing large language model clinical support information with machine learning risk and explainability: a feasibility study

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen