Laurène Vaugrante
5 Arbeiten3 Zitationen
Relevante Arbeiten
Meistzitierte Publikationen im Bereich Gesundheit & MedTech
Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment
2026 · 0 Zit. · ArXiv.org
Compromising Honesty and Harmlessness in Language Models via Deception Attacks
2025 · 0 Zit. · ArXiv.org
Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment
2026 · 0 Zit. · Open MIND