Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Mission Impossible: Universal LLM Moral Alignment
0
Zitationen
4
Autoren
2026
Jahr
Abstract
Universal moral alignment for large language models (LLMs) is often framed as the goal of learning a single policy that behaves in accordance with human values. This framing assumes that sufficiently capable models can approximate a coherent and universally valid moral objective. We argue that this assumption is false in pluralistic settings. Drawing on a preference-learning view of alignment and insights from social choice theory, we show that when different groups hold internally coherent but conflicting moral judgments over the same context-action pairs, no non-degenerate single policy can satisfy all groups simultaneously. Under stronger forms of disagreement, aggregation can even produce policies that are misaligned with every group. This is therefore not merely an engineering bottleneck to be overcome with more data, larger models, or improved optimization, but a structural limitation of the universal-alignment objective itself. This paper clarifies a conceptual limit of current reward-modeling and preference-aggregation paradigms. We outline a constructive agenda that replaces universal moral alignment with pluralistic and procedurally explicit alternatives, including normative governance mechanisms, impossibility-aware evaluation, and richer representations of human preferences that make disagreement visible rather than averaging it away.
Ähnliche Arbeiten
The global landscape of AI ethics guidelines
2019 · 4.620 Zit.
The Limitations of Deep Learning in Adversarial Settings
2016 · 3.876 Zit.
Trust in Automation: Designing for Appropriate Reliance
2004 · 3.435 Zit.
Fairness through awareness
2012 · 3.293 Zit.
Mind over Machine: The Power of Human Intuition and Expertise in the Era of the Computer
1987 · 3.184 Zit.