Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A Multiassessment and Multiprofessional Agents Approach for Medical Chatbot Risk Estimation: Development and Evaluation Study.
0
Zitationen
5
Autoren
2026
Jahr
Abstract
BACKGROUND: Assessing chatbot responses across 3 domains-medical, ethical, and legal-is essential to ensuring the safe use of artificial intelligence in health care. Although advancements in the use of large language models (LLMs) show significant improvements in evaluating question-answer datasets, such as multiple-choice medical exams, existing systems use general LLMs without incorporating specialized domain knowledge. They rely on standardized instructions without integrating real-world information, and ensemble methods such as majority voting fail to resolve disagreements among agents, resulting in misclassification and challenges in risk assessment. OBJECTIVE: This study aims to design, develop, and evaluate a synergistic approach for assessing risks associated with chatbot responses using multiassessment (MA) and multiprofessional agents (MPAs). METHODS: -score difference (Δ) as supporting metrics to assess the approach's effectiveness. RESULTS: -score gains ranging from +0.176 to +0.214 across systems. The MPA approach performed better when integrated with MA and external knowledge, with paired bootstrap estimates showing a gain of +0.037 (95% CI 0.003-0.074) over baseline; however, joint accuracy gains were not evident (95% CI -2.9% to 7.7%), and gains relative to the enhanced prompt were small. Notably, MA alone achieved higher joint accuracy than RAG (62.7% vs 60.3%), indicating a metric-specific trade-off rather than consistent superiority across all metrics. CONCLUSIONS: The MA-MPA approach shows potential for improving risk estimation in chatbot responses. The results suggest that the framework is particularly useful for enhancing balanced overall performance, especially when combined with external knowledge, although the medical risk domain remains challenging. Furthermore, more specialized LLMs may further improve contextually grounded risk estimation.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.719 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.628 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.176 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.880 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.