Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Suicide-Related Responses from AI Chatbots through Consumer-Facing Interfaces and APIs (Preprint)
0
Zitationen
11
Autoren
2026
Jahr
Abstract
<sec> <title>BACKGROUND</title> Generative AI chatbots are frequently used for mental health-related questions, including suicide-related queries. Prior evaluations have focused on user interfaces (UI), which may include additional safety controls not present in direct application programming interface (API) access. </sec> <sec> <title>OBJECTIVE</title> To examine how frequently AI chatbot models provide direct responses to suicide-related prompts across access pathways and to determine whether safer behavior is intrinsic to the model or depends on UI-level safeguards. </sec> <sec> <title>METHODS</title> Observational cross-sectional study conducted in February and March 2026. We evaluated how five widely used consumer AI models (ChatGPT, Claude, Gemini, Grok, and Llama) responded to 30 previously vetted suicide-related prompts spanning five clinician-assigned risk levels. Each prompt was submitted 100 times through both public-facing UI and direct API access, yielding 30,000 total responses. The primary outcome was whether a response directly answered the suicide-related prompt. A direct response was defined as one that provided specific information or guidance related to the question asked, rather than refusing to answer, redirecting the user to a crisis resource, or responding only with general safety language. AI responses were categorized using a blinded large language model–based classifier. We estimated mixed-effects logistic regression models that predicted a direct response. The primary predictors were the AI model, access mode (UI vs. API), and prompt risk category. </sec> <sec> <title>RESULTS</title> 69.8% of responses to the suicide prompts were direct. Direct responses were more common through APIs than UIs (77.4% vs. 62.2%). Differences in the likelihood of a direct response were most pronounced for higher-risk prompts: among very high-risk prompts, 24.8% of API responses were direct compared with 4.6% of UI responses; among high-risk prompts, 80.1% of API responses were direct compared with 48.4% of UI responses. Claude and Gemini had the highest direct response rates (78.1% and 73.9%), whereas ChatGPT, Grok, and Llama had lower rates (64.6%, 64.5%, and 67.8%). In mixed-effects models, UI access was associated with lower odds of a direct response than API access (odds ratio, 0.09; 95% CI, 0.08-0.10). Higher prompt risk was associated with lower direct response probability, and access-mode differences varied by risk level and model. </sec> <sec> <title>CONCLUSIONS</title> AI safety behavior depends on how the user accesses the model. The safety observed in an AI chatbot’s UI should not be assumed to generalize direct model access through APIs or to downstream applications built on those APIs. Evaluations and policies must consider access channels to ensure comprehensive safety protections for all individuals interacting with AI chatbots. </sec>
Ähnliche Arbeiten
Amazon's Mechanical Turk
2011 · 10.048 Zit.
The Epidemiology of Major Depressive Disorder
2003 · 7.981 Zit.
The Transtheoretical Model of Health Behavior Change
1997 · 7.743 Zit.
Acute and Longer-Term Outcomes in Depressed Outpatients Requiring One or Several Treatment Steps: A STAR*D Report
2006 · 5.487 Zit.
Depression Is a Risk Factor for Noncompliance With Medical Treatment
2000 · 4.150 Zit.