Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Protocol for human evaluation of generative artificial intelligence chatbots in clinical consultations
3
Zitationen
2
Autoren
2025
Jahr
Abstract
BACKGROUND: Generative artificial intelligence (GenAI) has the potential to revolutionise healthcare delivery. The nuances of real-life clinical practice and complex clinical environments demand a rigorous, evidence-based approach to ensure safe and effective deployment of AI. METHODS: We present a protocol for the systematic evaluation of large language models (LLMs) as GenAI chatbots within the context of clinical microbiology and infectious diseases clinical consultations. We aim to critically assess recommendations produced by four leading GenAI models, including Claude 2, Gemini Pro, GPT-4.0, and a GPT-4.0-based custom AI chatbot. DISCUSSION: A standardised, healthcare-specific, universal prompt template is developed to elicit clinically impactful AI responses. Generated responses will be graded by two panels of practicing clinicians, encompassing a wide spectrum of domain expertise in clinical microbiology and virology, as well as infectious diseases. Evaluations will be performed using a 5-point Likert scale across four clinical domains: factual consistency, comprehensiveness, coherence, and medical harmfulness. Our study will offer insights into the feasibility, limitations, and boundaries of GenAI in clinical consultations, providing guidance for future research and clinical implementation. Ethical guidelines and safety guardrails should be developed to uphold patient safety and clinical standards.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.578 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.470 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.984 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.814 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.