Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Exploring evaluation measures of large language models for family caregiver use: A scoping review
0
Zitationen
4
Autoren
2026
Jahr
Abstract
Background: Large language models have a huge positive impact on various disciplines, including healthcare. As family caregivers are an essential part of the healthcare system, they need support and can benefit from the technology. However, there is no consensus on reliable and valid measures to evaluate large language models. Objective: This study aims to review the literature on the evaluation measures of large language models for caregivers. Methods: We conducted a scoping review guided by Arksey and O'Malley methodology and the PRISMA-ScR checklist. A literature search on PubMed, EMBASE, CINAHL, and PsycINFO, from 2018 through July 2024, was carried out. An additional rapid review was conducted for the recent literature update from July 2024 through November 2025. Results: All 10 final publications that met the inclusion criteria out of 1812 focused on ChatGPT, whereas three of them also addressed other large language models, such as Google Bard and Bing AI. The most commonly assessed core conceptual components of evaluation measures were accuracy, reliability, readability, and comprehensiveness. Overall, the included studies reported that large language models' responses were somewhat accurate and reliable and mixed results in readability and comprehensiveness. The final 14 publications from a rapid review offered additional evidence on ChatGPT-centrism. Conclusions: This review provides a comprehensive overview of the measures for evaluating large language models and highlights the need for their improvement using reliable and valid measures. The findings guide the direction of future research and practice to maximize the benefits through continuous quality improvement.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.646 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.554 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.071 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.851 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.