Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Beyond the First Generated Summarization: Comparing Human and AI-Generated Reports in Geriatric Outpatient Consultations on Content, Linguistic Form, and Healthcare Professionals’ Reporting Preferences (Preprint)
0
Zitationen
7
Autoren
2025
Jahr
Abstract
<sec> <title>BACKGROUND</title> Administrative reporting is a major contributor to healthcare professionals’ (HCPs) workload, available patient time and HCP burnout rates, and large staffing costs. Automated medical reporting (AMR) systems, systems using large language models to automatically generate medical reports, have been proposed as a solution, but their clinical validity remains uncertain, despite their promised benefits. While some studies address content accuracy, linguistic form in AMR reports and their alignment with HCP preferences are understudied. </sec> <sec> <title>OBJECTIVE</title> The goal is 1) to evaluate the first development iteration in terms of content (what is reported), linguistic form (how it is reported), and HCPs’ reporting preferences; and 2) to generate insights for the AMR system’s further development. </sec> <sec> <title>METHODS</title> We evaluated the first development iteration of an AMR system for geriatric outpatient care by focusing on the history taking section (or anamnesis) of the multifaceted and time-consuming Comprehensive Geriatric Assessment (CGA). This first iteration of the CGA reporting model was developed based on the Dutch CGA guidelines and CGA blueprints used in the hospital this study was conducted at. A mixed-methods design was used, employing: (1) content comparison of ten audio-recorded consultations and their corresponding conventional and AMR reports, (2) a focus group in which HCPs discuss reporting differences and HCPs’ reporting preferences were elicited, (3) a linguistic sentence complexity analysis, as a proxy to study linguistic form. </sec> <sec> <title>RESULTS</title> Compared to the conventional reports, AMR reports were shorter, contained less and different information (including hallucinations), repeated more content, had higher sentence complexity, and adhered to a rigid structure. HCPs evaluated AMR as too concise and conclusive, and lacking specific textual elements they deemed important for medical and legal accountability, HCPs appreciated AMR’s structural organization. </sec> <sec> <title>CONCLUSIONS</title> This study yielded multiple insights for the further development of the AMR system used, specifically on information selection, how it is linguistically realized, and HCPs’ reporting preferences. It showcases how not only the content of reports should be evaluated, but that linguistic form and HCP reporting preferences are important to evaluate for AMR implementations as well. This methodology serves as an initial proof-of-concept validation framework that future research can use to evaluate AI technology beyond content-based measures. </sec>
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.719 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.628 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.176 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.880 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.