Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Mission Impossible: Universal LLM Moral Alignment

2026·0 Zitationen·HAL (Le Centre pour la Communication Scientifique Directe)Open Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Universal moral alignment for large language models (LLMs) is often framed as the goal of learning a single policy that behaves in accordance with human values. This framing assumes that sufficiently capable models can approximate a coherent and universally valid moral objective. We argue that this assumption is false in pluralistic settings. Drawing on a preference-learning view of alignment and insights from social choice theory, we show that when different groups hold internally coherent but conflicting moral judgments over the same context-action pairs, no non-degenerate single policy can satisfy all groups simultaneously. Under stronger forms of disagreement, aggregation can even produce policies that are misaligned with every group. This is therefore not merely an engineering bottleneck to be overcome with more data, larger models, or improved optimization, but a structural limitation of the universal-alignment objective itself. This paper clarifies a conceptual limit of current reward-modeling and preference-aggregation paradigms. We outline a constructive agenda that replaces universal moral alignment with pluralistic and procedurally explicit alternatives, including normative governance mechanisms, impossibility-aware evaluation, and richer representations of human preferences that make disagreement visible rather than averaging it away.

Autoren

Institutionen

Themen

Ethics and Social Impacts of AIArtificial Intelligence in Healthcare and EducationArtificial Intelligence in Law

Volltext beim Verlag öffnen

Mission Impossible: Universal LLM Moral Alignment

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen