Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Toward trustworthy chatbots: a protocol for red teaming for health related conversations

2026·0 Zitationen·Scientific ReportsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Health-related chatbots require safety assurance beyond factual correctness. We propose a red-teaming protocol for patient-facing AI structured around three pillars: error stratification, dual-pronged testing, and vulnerability-informed mitigation. We distinguish Document Adherence (DA) from Instruction Adherence (IA), deploying adversarial “attacks” across both single-turn and multi-turn exchanges to provoke system failures. We then applied layered mitigations informed by the vulnerabilities revealed by these attacks. We evaluate this framework on a retrieval-augmented generation (RAG) based chatbot designed to assist with health-related social needs (HRSN).The protocol identified behavioral noncompliance as the dominant risk. While robust in DA (0/60 errors), the system struggled with IA (15% error rate). Crucially, multi-turn stress tests revealed vulnerabilities hidden in single-turn checks: error rates spiked to 50% for advice queries and 40% for user distress. All high-severity failures occurred during these sustained interactions. Of our mitigations, prompt augmentation reduced total errors by 60%, while document augmentation mitigated single-turn distress errors. Combined, they eliminated high-severity errors entirely by forcing “safe failure” loops. We suggest this cycle of stratified analysis, depth-based testing, and targeted mitigation can be a guiding framework for securing clinical conversational agents.

Autoren

Institutionen

Themen

AI in Service InteractionsDigital Mental Health InterventionsArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Toward trustworthy chatbots: a protocol for red teaming for health related conversations

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen