Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Large Language Model-Assisted Point-in-Time Interpretation of Advanced Hemodynamics in Liver Transplant Recipients: A Pilot Evaluation of Content Quality and Safety
1
Zitationen
7
Autoren
2026
Jahr
Abstract
<b>Background:</b> Large language models (LLMs) are increasingly used in clinical medicine, yet their ability to interpret advanced intraoperative hemodynamic monitoring-particularly in the context of liver transplantation-remains largely unexplored. In this proof-of-concept study, we evaluated ChatGPT's capacity to interpret multimodal hemodynamic data derived from both standard anesthesia monitoring and the PiCCO system. The study also employed a structured assessment instrument (ARQuAT), adapted through a Delphi-based process to evaluate LLM-generated clinical interpretations. <b>Methods:</b> Ten key surgical-hemodynamic phases of liver transplantation were identified using a modified Delphi approach to capture the major physiological transitions of the procedure. Sequential screenshots representing these phases were obtained from five liver transplant recipients, yielding a total of 50 images. Each screenshot, along with standardized clinical background information, was submitted to ChatGPT. Five expert anesthesiologists independently assessed the model's responses using the modified ARQuAT tool, which includes six content-quality domains (Accuracy, Up-to-dateness, Contextual Consistency, Clinical Usability, Trustworthiness, Clarity) and a separate catastrophic Risk item. Descriptive statistics were calculated for domain-level performance. Inter-rater reliability (Kendall's W) and internal consistency (Cronbach's alpha, McDonald's omega) were also analyzed. All statistical analyses and visualizations were performed using NumIQO. <b>Results:</b> ChatGPT demonstrated consistently high performance across all content-quality domains, with median scores ranging from 4.6 to 4.8 and more than 90% of all ratings classified as satisfactory. Lower scores appeared only in a small subset of frames associated with abrupt hemodynamic changes and did not indicate a recurring weakness in any specific domain. Catastrophic Risk exhibited a pronounced floor effect, with 86% of ratings scored as 0 and only three isolated high-risk assessments across the dataset. Internal consistency of the six ARQuAT content domains was excellent, while inter-rater agreement was modest, reflecting ceiling effects and tied ratings among evaluators. <b>Conclusions:</b> ChatGPT generated clinically acceptable, contextually aligned interpretations of complex intraoperative hemodynamic data in liver transplant recipients, with minimal evidence of unsafe recommendations. These findings suggest preliminary promise for LLM-assisted interpretation of advanced monitoring, while underscoring the need for future studies involving larger datasets, dynamic physiological inputs, and expanded evaluator groups. The reliability characteristics observed also provide initial support for further refinement and broader validation of the Delphi-derived ARQuAT framework.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.436 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.311 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.753 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.523 Zit.