OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 25.04.2026, 22:40

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

The Verbosity Premium: What RLHF-Induced Token Inflation Costs the AI Industry

2026·0 Zitationen·Zenodo (CERN European Organization for Nuclear Research)Open Access
Volltext beim Verlag öffnen

0

Zitationen

1

Autoren

2026

Jahr

Abstract

We aggregate published measurements of RLHF-induced response length inflation across the literature and compute the first industry-scale estimate of its economic cost. Alignment training systematically inflates output length: sentences triple after SFT, DPO doubles response length within the first 10% of training, and on one benchmark 98% of PPO reward improvement is attributable to length alone. Verbosity compensation rates range from 13.6% to 74.2% across 14 models, and output tokens cost 4-8x more than input tokens across all frontier providers. Combining published verbosity rates, real-world token volumes, and current API pricing, we estimate the annual verbosity premium at 500M to 1.8B, with a central estimate of 1.2B (approximately 14% of total industry inference spend). We survey 12 training-side mitigations and show that all target response length rather than information density. A 500-token response with 50 atomic facts is efficient; the same length with 10 facts restated five ways is waste. Length penalties cannot distinguish these cases. Drawing on rate-distortion theory and evidence that factual precision degrades with response length, we argue the correct optimization target is information density (supported facts per token) and present two concrete density-aware reward formulations.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)Ethics and Social Impacts of AI
Volltext beim Verlag öffnen