Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Are Artificial Intelligence-generated Patient Leaflets Ready for Clinical Use? A Readability Comparison across Common Orthopaedic Procedures
0
Zitationen
2
Autoren
2025
Jahr
Abstract
Introduction: Readable patient information is central to informed consent, shared decision-making, and treatment adherence. With the emergence of large language models (LLMs), such as ChatGPT, Gemini, and DeepSeek, there is growing interest in their role in generating health education content. However, the readability of such AI-generated patient information leaflets (PILs) has not been systematically compared with that of professionally authored materials. Objective: This study aimed to compare the readability of PILs generated by three generative artificial intelligence (AI) platforms with those produced by the Royal College of Surgeons of England (RCS England) for three common orthopedic procedures: Carpal tunnel release, total hip replacement, and total knee replacement. Materials and Methods: A total of 12 PILs (four per procedure) were analyzed using five validated readability metrics: Flesch reading ease, Flesch-Kincaid grade level, Gunning Fog Index, simple measure of Gobbledygook (SMOG) Index, and Coleman-Liau Index. Each AI model was prompted with a standardized instruction to generate a leaflet for the specified procedure. The RCS England leaflets served as the professional benchmark. Results: Across all metrics and procedures, RCS England leaflets demonstrated superior readability, with Flesch Reading Ease scores above 70 and Flesch-Kincaid Grade Levels between 5.52 and 7.15. In contrast, AI-generated leaflets frequently exceeded recommended complexity thresholds, with Grade Levels often above 12 and Gunning Fog and SMOG scores indicating post-secondary reading requirements. ChatGPT outputs were the most linguistically complex, while Gemini and DeepSeek produced intermediate but still suboptimal readability. Conclusion: While LLMs offer promising avenues for scalable health communication, current AI-generated PILs do not consistently meet recommended readability standards. Professionally authored leaflets remain more accessible for the average patient. These findings highlight the ongoing need for clinician oversight and quality assurance when integrating AI into patient education materials.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.652 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.567 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.083 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.856 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.