Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Accuracy of nine artificial intelligence chatbots in replying in accordance with the 2023 ESH guidelines for the management of arterial hypertension
0
Zitationen
10
Autoren
2024
Jahr
Abstract
Abstract Background/Introduction The emergence of artificial intelligence (AI) models has created new opportunities in the medical field. The potential of AI chatbots to deliver timely, reliable medical information is one of its promising features. Purpose Our goal was to assess how well online AI chatbots could respond in accordance with the 2023 ESH Guidelines for the management of arterial hypertension. Methods We structured 20 questions covering issues that have been included in the recommendations of the 2023 ESH Guidelines. Fifteen questions required simple answers (e.g. What is the systolic blood pressure threshold for initiation of drug therapy in patients ≥80 years? Should we use cuffless blood pressure devices for the evaluation of hypertension in clinical practice?). The questions were fed to nine free online chatbots. The responses were recorded and evaluated by three experienced cardiologists with special interest in hypertension. To assess consistency, each question was asked three times, though only the first response was included in the accuracy analysis. All questions were preceded by "According to the 2023 ESH Guidelines for the management of arterial hypertension". A response was considered "accurate" if it included all essential information, "inaccurate" if it was not in accordance with the guidelines, and "incomplete" if any essential information was missing. Results In total there were 180 responses recorded. A total of 82 (45.6%) responses were deemed accurate, ranging from only 4 out of 20 (20% for deepai.org) to 16 out of 20 (80% for Google-PaLM) (see Figure). Eighty (44.4%) of the responses were judged as inaccurate and 18 (10%) as incomplete. Only one question got accurate responses from all nine chatbots and there were three questions with accurate replies from only one chatbot (different chatbot for each question). Moreover, 293 out of the 360 regenerated responses were consistent with the initial answer (81.1%). No chatbot would have replied accurately to every question even if the regenerated responses were to be considered. Conclusion(s) The study resulted in a variation in the accuracy of the responses generated by nine popular online AI chatbots when asked about issues covered in the recommendations of the 2023 ESH Guidelines on arterial hypertension. While the use of chat-based AI in medicine is still in its early stages and current models are not intended for medical use, the potential for such technology is significant. The debate is still ongoing about what level of accuracy is thought to be acceptable.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.485 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.371 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.827 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.549 Zit.