Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Leveraging large language models for heuristic usability assessment of medical software: Insights with the Radiation Planning Assistant
0
Zitationen
22
Autoren
2026
Jahr
Abstract
BACKGROUND: Usability engineering is essential for ensuring the safety and effectiveness of medical software, as design-related issues are a leading cause of use errors in clinical settings. Heuristic evaluation provides a practical approach to identifying usability problems, but its outcomes depend heavily on expert interpretation. Large Language Models (LLMs), such as ChatGPT, offer a potential means to augment heuristic evaluation by generating structured, context-aware usability feedback. This study explored the use of ChatGPT to support heuristic assessment of the Radiation Planning Assistant (RPA), a web-based radiotherapy planning tool designed to support clinical teams in low- and middle-income countries. METHODS: ChatGPT was provided with the RPA user and technical guides, training videos for each functional dashboard, and Zhang et al.'s 14 usability heuristics. The model was instructed to score each dashboard according to these heuristics, using Zhang's 0-4 severity scale, and to propose concrete interface improvements. The resulting feedback was reviewed and scored independently by the RPA developer team and by 13 users during a dedicated User Meeting. Comparative analysis was performed between ChatGPT, developer, and user ratings. RESULTS: ChatGPT identified 26 potential usability issues across six heuristic domains. The developer team considered nine of these actionable, though all were classified as minor (severity ≤ 2). User ratings showed wide variability, with nine suggestions achieving mean scores ≥ 1.5. Qualitative agreement between users and developers was limited, underscoring the importance of diverse perspectives in heuristic evaluation. Three suggestions-enhanced upload logs, reversible actions ("reopen request"), and stronger error prevention-were rated as potentially high priority by a minority of users. ChatGPT's ratings were consistent across dashboards. CONCLUSIONS: While ChatGPT did not reveal any critical usability failures, its heuristic assessment proved valuable in prompting discussion, identifying minor refinements, and enriching both developer and user engagement with the RPA's interface design. This study demonstrates that LLMs can serve as an effective, low-cost complement to conventional heuristic evaluation, supporting early-stage usability review and stakeholder dialogue in the development of medical software.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.644 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.550 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.061 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.850 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Autoren
- Laurence E. Court
- Jacobus Smit
- L.J. Strauss
- William V. Shaw
- Andrea Marais
- Christoph Trauernicht
- Nanette Joubert
- Elaine G. Smith
- Shona Badre
- Graeme L. Lazarus
- Thekiso Khotle
- Lauren Netherton
- Wanda van Heerden
- Carlos E. S. Cárdenas
- Monica Serban
- Jan Seuntjens
- Christine Chung
- Pavel Govyadinov
- Meena Khan
- Saurabh Sudhakaran Nair
- T. Netherton
- Lifei Zhang
Institutionen
- The University of Texas MD Anderson Cancer Center(US)
- University of the Free State(ZA)
- Stellenbosch University(ZA)
- Tygerberg Hospital(ZA)
- Groote Schuur Hospital(ZA)
- Wentworth Hospital(ZA)
- Johannesburg Hospital(ZA)
- University of Houston(US)
- Research ICT Africa(ZA)
- University of Alabama at Birmingham(US)
- Princess Margaret Cancer Centre(CA)