Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
ID: 4349993 QUALITATIVE EVALUATION FRAMEWORK FOR COMPARING THE EFFECTIVENESS OF LARGE LANGUAGE MODELS THAT POWER HEALTH CARE CONVERSATIONS USING GENERATIVE ARTIFICIAL INTELLIGENCE IN ATRIAL FIBRILLATION
0
Zitationen
5
Autoren
2025
Jahr
Abstract
Generative AI (GenAI) is employed across industries, including healthcare, where hallucinations (fabricated or incorrect information) can have dangerous consequences. Existing evaluation metrics for large language models (LLMs) are primarily generic, emphasizing correctness without addressing the knowledge and conceptual nuances necessary in healthcare. This highlights the need for a tailored evaluation benchmark to ensure outputs are Accurate, Relevant, Trustworthy: Fair, Robust, Explainable, Equitable—F.R.E.E, Empathy, Safe —A.R.T.E.S, and efficient for clinical use.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.740 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.649 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.202 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.886 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.