Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
HealthAdminBench: Evaluating Computer-Use Agents on Healthcare Administration Tasks
0
Zitationen
15
Autoren
2026
Jahr
Abstract
Healthcare administration accounts for over $1 trillion in annual spending, making it a promising target for LLM-based computer-use agents (CUAs). While clinical applications of LLMs have received significant attention, no benchmark exists for evaluating CUAs on end-to-end administrative workflows. To address this gap, we introduce HealthAdminBench, a benchmark comprising four realistic GUI environments: an EHR, two payer portals, and a fax system, and 135 expert-defined tasks spanning three administrative task types: Prior Authorization, Appeals and Denials Management, and Durable Medical Equipment (DME) Order Processing. Each task is decomposed into fine-grained, verifiable subtasks, yielding 1,698 evaluation points. We evaluate seven agent configurations under multiple prompting and observation settings and find that, despite strong subtask performance, end-to-end reliability remains low: the best-performing agent (Claude Opus 4.6 CUA) achieves only 36.3 percent task success, while GPT-5.4 CUA attains the highest subtask success rate (82.8 percent). These results reveal a substantial gap between current agent capabilities and the demands of real-world administrative workflows. HealthAdminBench provides a rigorous foundation for evaluating progress toward safe and reliable automation of healthcare administrative workflows.
Ähnliche Arbeiten
Machine Learning in Medicine
2019 · 3.823 Zit.
Systematic Review: Impact of Health Information Technology on Quality, Efficiency, and Costs of Medical Care
2006 · 3.176 Zit.
Effects of Computerized Clinical Decision Support Systems on Practitioner Performance and Patient Outcomes
2005 · 2.972 Zit.
Studies in health technology and informatics
2008 · 2.903 Zit.
An overview of clinical decision support systems: benefits, risks, and strategies for success
2020 · 2.744 Zit.