Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating the Quality and Reliability of Large Language Models for Plastic Surgery Patient Education: A Comparative Analysis of ChatGPT and OpenEvidence
1
Zitationen
5
Autoren
2025
Jahr
Abstract
Abstract Background Concerns regarding information inaccuracy when using general-purpose large language models have prompted the quest for alternative tools. OpenEvidence has emerged as a healthcare-focused large language model trained exclusively on data from peer-reviewed medical literature. Objectives This study compared the quality, accuracy, and readability of aesthetic surgery patient education materials generated by OpenEvidence and ChatGPT. Methods A standardized prompt requesting comprehensive postoperative discharge instructions for 20 of the most common aesthetic surgery procedures was entered into OpenEvidence and ChatGPT-5. Outputs were evaluated using 4 validated assessment tools: the DISCERN instrument for information quality (1-5), the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P) for information understandability and actionability (0-100), the Flesch-Kincaid scale for estimated grade level (fifth grade to professional level) and reading ease (0-100), and a Likert scale for citation accuracy (1-4). Results OpenEvidence scored significantly higher than ChatGPT-5 in DISCERN (3.3 ± 0.4 vs 1.7 ± 0.4, P < .001) and the citation accuracy scale (2.4 ± 1.3 vs 1.5 ± 0.7, P = .007). Scores were comparable among both tools in PEMAT-P understandability (71 ± 5 vs 69 ± 0, P = .3) and actionability (52 ± 12 vs 54 ± 5, P = .6), as well as on the Flesch Kincaid Grade Level (9.3 ± 1.0 vs 9.2 ± 0.6, P = .8) and the Flesch Reading Ease Score (40.0 ± 6.6 vs 41.0 ± 5.5, P = .6). Conclusions OpenEvidence generated materials of significantly higher quality and reliability than ChatGPT, suggesting it may serve as a more reliable alternative for patient education in aesthetic surgery practice.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.707 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.613 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.159 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.875 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.