Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Large Language Model-Assisted Point-in-Time Interpretation of Advanced Hemodynamics in Liver Transplant Recipients: A Pilot Evaluation of Content Quality and Safety

2026·1 Zitationen·Journal of Clinical MedicineOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Background: Large language models (LLMs) are increasingly used in clinical medicine, yet their ability to interpret advanced intraoperative hemodynamic monitoring-particularly in the context of liver transplantation-remains largely unexplored. In this proof-of-concept study, we evaluated ChatGPT's capacity to interpret multimodal hemodynamic data derived from both standard anesthesia monitoring and the PiCCO system. The study also employed a structured assessment instrument (ARQuAT), adapted through a Delphi-based process to evaluate LLM-generated clinical interpretations. Methods: Ten key surgical-hemodynamic phases of liver transplantation were identified using a modified Delphi approach to capture the major physiological transitions of the procedure. Sequential screenshots representing these phases were obtained from five liver transplant recipients, yielding a total of 50 images. Each screenshot, along with standardized clinical background information, was submitted to ChatGPT. Five expert anesthesiologists independently assessed the model's responses using the modified ARQuAT tool, which includes six content-quality domains (Accuracy, Up-to-dateness, Contextual Consistency, Clinical Usability, Trustworthiness, Clarity) and a separate catastrophic Risk item. Descriptive statistics were calculated for domain-level performance. Inter-rater reliability (Kendall's W) and internal consistency (Cronbach's alpha, McDonald's omega) were also analyzed. All statistical analyses and visualizations were performed using NumIQO. Results: ChatGPT demonstrated consistently high performance across all content-quality domains, with median scores ranging from 4.6 to 4.8 and more than 90% of all ratings classified as satisfactory. Lower scores appeared only in a small subset of frames associated with abrupt hemodynamic changes and did not indicate a recurring weakness in any specific domain. Catastrophic Risk exhibited a pronounced floor effect, with 86% of ratings scored as 0 and only three isolated high-risk assessments across the dataset. Internal consistency of the six ARQuAT content domains was excellent, while inter-rater agreement was modest, reflecting ceiling effects and tied ratings among evaluators. Conclusions: ChatGPT generated clinically acceptable, contextually aligned interpretations of complex intraoperative hemodynamic data in liver transplant recipients, with minimal evidence of unsafe recommendations. These findings suggest preliminary promise for LLM-assisted interpretation of advanced monitoring, while underscoring the need for future studies involving larger datasets, dynamic physiological inputs, and expanded evaluator groups. The reliability characteristics observed also provide initial support for further refinement and broader validation of the Delphi-derived ARQuAT framework.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationHemodynamic Monitoring and TherapySimulation-Based Education in Healthcare

Volltext beim Verlag öffnen

Large Language Model-Assisted Point-in-Time Interpretation of Advanced Hemodynamics in Liver Transplant Recipients: A Pilot Evaluation of Content Quality and Safety

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen