Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Assessment of a zero-shot large language model in measuring documented goals-of-care discussions

2025·1 Zitationen·medRxivOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

ABSTRACT Context Goals-of-care (GOC) discussions and their documentation are important process measures in palliative care. However, existing natural language processing (NLP) models for identifying such documentation require costly task-specific training data. Large language models (LLMs) hold promise for measuring such constructs with fewer or no task-specific training data. Objective To evaluate the performance of a publicly available LLM with no task-specific training data (zero-shot prompting) for identifying documented GOC discussions. Methods We compared performance of two NLP models in identifying documented GOC discussions: Llama 3.3 using zero-shot prompting; and, a task-specific BERT (Bidirectional Encoder Representations from Transformers)-based model trained on 4,642 manually annotated notes. We tested both models on records from a series of clinical trials enrolling adult patients with chronic life-limiting illness hospitalized over 2018-2023. We evaluated the area under the receiver operating characteristic curve (AUC), area under the precision-recall curve (AUPRC), and maximal F 1 score, for both note-level and patient-level classification over a 30-day period. Results In our text corpora, GOC documentation represented <1% of text and was found in 7.3-9.9% of notes for 23-37% of patients. In a 617-patient held-out test set, Llama 3.3 (zero-shot) and BERT (task-specific, trained) exhibited comparable performance in identifying GOC documentation (Llama 3.3: AUC 0.979, AUPRC 0.873, and F 1 0.83; BERT: AUC 0.981, AUPRC 0.874, and F 1 0.83). Conclusion A zero-shot large language model with no task-specific training performed similarly to a task-specific trained BERT model in identifying documented goals-of-care discussions. This demonstrates the promise of LLMs in measuring novel clinical research outcomes. KEY MESSAGE This article reports the performance of a publicly available large language model with no task-specific training data in measuring the occurrence of documented goals-of-care discussions from electronic health records. The study demonstrates that newer large language AI models may allow investigators to measure novel outcomes without requiring costly training data.

Autoren

Institutionen

Themen

Palliative Care and End-of-Life IssuesTopic ModelingArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Assessment of a zero-shot large language model in measuring documented goals-of-care discussions

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen