Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
STAGER checklist: Standardized testing and assessment guidelines for evaluating generative artificial intelligence reliability
0
Zitationen
35
Autoren
2024
Jahr
Abstract
Generative artificial intelligence (AI) holds immense potential for medical applications, but the lack of a comprehensive evaluation framework and methodological deficiencies in existing studies hinder its effective implementation. Standardized assessment guidelines are crucial for ensuring reliable and consistent evaluation of generative AI in healthcare. Our objective is to develop robust, standardized guidelines tailored for evaluating generative AI performance in medical contexts. Through a rigorous literature review utilizing the Web of Sciences, Cochrane Library, PubMed, and Google Scholar, we focused on research testing generative AI capabilities in medicine. Our multidisciplinary team of experts conducted discussion sessions to develop a comprehensive 32‐item checklist. This checklist encompasses critical evaluation aspects of generative AI in medical applications, addressing key dimensions such as question collection, querying methodologies, and assessment techniques. The checklist and its broader assessment framework provide a holistic evaluation of AI systems, delineating a clear pathway from question gathering to result assessment. It guides researchers through potential challenges and pitfalls, enhancing research quality and reporting and aiding the evolution of generative AI in medicine and life sciences. Our framework furnishes a standardized, systematic approach for testing generative AI's applicability in medicine. For a concise checklist, please refer to Table S or visit GenAIMed.org . Highlights This work formulates the standardized testing and assessment guidelines for evaluating generative artificial intelligence (AI) reliability (STAGER) checklist, a 32‐item framework offering standardized assessment guidelines tailored for evaluating generative AI systems in medical and life science contexts. It consists of key aspects, including question collection, querying approaches, and assessment techniques. It enhances research quality and facilitates advances in this emerging field.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.400 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.261 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.695 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.506 Zit.
Autoren
- Chang Qi
- Weipu Mao
- Cheng Quan
- Ulf D. Kahlert
- Iain S. Whitaker
- Che Ok Jeon
- Shuqin Gu
- Jinghong Chen
- Zaoqu Liu
- Xu Sun
- Xin Chen
- Wenjie Shi
- Trunghieu Ngo
- Loïc Cabannes
- Haiyang Wu
- Anqi Lin
- Shuofeng Yuan
- Stephen R Ali
- Haojie Huang
- Bufu Tang
- Peng Luo
- Gerald Sng Gui Ren
- Baolei Jia
- Dongqiang Zeng
- Xiaofan Lu
- Aimin Jiang
- Shipeng Guo
- Shixiang Wang
- Yongbin He
- Kai Miao
- Weiming Mou
- Wisit Cheungpasitporn
- Zhongji Pu
- Lingxuan Zhu
- Jian‐Guo Zhou