Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Toward trustworthy chatbots: a protocol for red teaming for health related conversations
0
Zitationen
5
Autoren
2026
Jahr
Abstract
Health-related chatbots require safety assurance beyond factual correctness. We propose a red-teaming protocol for patient-facing AI structured around three pillars: error stratification, dual-pronged testing, and vulnerability-informed mitigation. We distinguish Document Adherence (DA) from Instruction Adherence (IA), deploying adversarial “attacks” across both single-turn and multi-turn exchanges to provoke system failures. We then applied layered mitigations informed by the vulnerabilities revealed by these attacks. We evaluate this framework on a retrieval-augmented generation (RAG) based chatbot designed to assist with health-related social needs (HRSN).The protocol identified behavioral noncompliance as the dominant risk. While robust in DA (0/60 errors), the system struggled with IA (15% error rate). Crucially, multi-turn stress tests revealed vulnerabilities hidden in single-turn checks: error rates spiked to 50% for advice queries and 40% for user distress. All high-severity failures occurred during these sustained interactions. Of our mitigations, prompt augmentation reduced total errors by 60%, while document augmentation mitigated single-turn distress errors. Combined, they eliminated high-severity errors entirely by forcing “safe failure” loops. We suggest this cycle of stratified analysis, depth-based testing, and targeted mitigation can be a guiding framework for securing clinical conversational agents.
Ähnliche Arbeiten
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller
1999 · 5.633 Zit.
An experiment in linguistic synthesis with a fuzzy logic controller
1975 · 5.587 Zit.
A FRAMEWORK FOR REPRESENTING KNOWLEDGE
1988 · 4.551 Zit.
Opinion Paper: “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy
2023 · 3.462 Zit.