OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 19.04.2026, 09:52

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating the effect of mental health fine-tuning relative to other model characteristics on LLM safety performance

2026·0 Zitationen·medRxivOpen Access
Volltext beim Verlag öffnen

0

Zitationen

6

Autoren

2026

Jahr

Abstract

Abstract Large language models (LLMs) are increasingly used in mental health applications, yet it remains unclear whether mental health–specific fine-tuning meaningfully improves safety-relevant performance beyond gains from model scale or architecture. We evaluated 127 publicly available open-source LLMs across three model families, multiple architecture generations, parameter scales (270M–70B), and fine-tuning strategies on three psychiatrist-reviewed synthetic classification tasks: suicidal ideation detection, identification of user requests for therapy, and detection of explicit therapy-like interactions in multi-turn conversations. Performance was summarized using F1 score, with multivariable regression and paired comparisons used to estimate independent effects of model characteristics. Across tasks, newer architectures and larger models consistently showed superior performance. General instruction tuning improved detection of therapy requests and engagement, whereas mental health–specific, medical, or safety fine-tuning conferred no consistent benefit and were sometimes associated with reduced performance. These findings suggest that baseline model capability is more consequential than domain-specific fine-tuning for certain safety-relevant mental health classification tasks, underscoring the importance of careful model selection and task-specific evaluation.

Ähnliche Arbeiten