Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Exploring evaluation measures of large language models for family caregiver use: A scoping review

2026·0 Zitationen·Digital HealthOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Background: Large language models have a huge positive impact on various disciplines, including healthcare. As family caregivers are an essential part of the healthcare system, they need support and can benefit from the technology. However, there is no consensus on reliable and valid measures to evaluate large language models. Objective: This study aims to review the literature on the evaluation measures of large language models for caregivers. Methods: We conducted a scoping review guided by Arksey and O'Malley methodology and the PRISMA-ScR checklist. A literature search on PubMed, EMBASE, CINAHL, and PsycINFO, from 2018 through July 2024, was carried out. An additional rapid review was conducted for the recent literature update from July 2024 through November 2025. Results: All 10 final publications that met the inclusion criteria out of 1812 focused on ChatGPT, whereas three of them also addressed other large language models, such as Google Bard and Bing AI. The most commonly assessed core conceptual components of evaluation measures were accuracy, reliability, readability, and comprehensiveness. Overall, the included studies reported that large language models' responses were somewhat accurate and reliable and mixed results in readability and comprehensiveness. The final 14 publications from a rapid review offered additional evidence on ChatGPT-centrism. Conclusions: This review provides a comprehensive overview of the measures for evaluating large language models and highlights the need for their improvement using reliable and valid measures. The findings guide the direction of future research and practice to maximize the benefits through continuous quality improvement.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMental Health via WritingDigital Mental Health Interventions

Volltext beim Verlag öffnen

Exploring evaluation measures of large language models for family caregiver use: A scoping review

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen