Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Benchmarking LLM Agent Efficiency in Production Systems: An Observational Prospective Methodology

2026·0 Zitationen·Zenodo (CERN European Organization for Nuclear Research)Open Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Existing large language model (LLM) benchmarks measure model capability on synthetic tasks, but none address the operational efficiency of multi-agent LLM systems executing real production work over sustained sessions. This paper introduces an observational prospective methodology for benchmarking production LLM agent efficiency and applies it to a complete, instrumented production session (2026-04-03, 4.1 hours, Gallora ecosystem). We report the first end-to-end token accounting of a multi-agent session: 64,853,375 effective tokens processed at a 94.2% cache hit rate, producing 19 artifacts at a cost of $36.74 USD ($1.93 per artifact). We propose a standardized suite of six operational metrics — Cache Hit Rate (CHR), Output Density (OD), Agent Cost Multiplier (ACM), Cost Per Artifact (CPA), Tool Execution Ratio (TER), and Turns Per Hour (TPH) — as a reproducible benchmark framework for production agentic systems. Key finding: in production multi-agent systems, cost is dominated by context complexity (93.7% cache reads), not task complexity — a result with significant architectural implications for system design and cost governance.

Autoren

Institutionen

Centro Universitário Una(BR)

Themen

Artificial Intelligence in Healthcare and EducationMulti-Agent Systems and NegotiationArtificial Intelligence in Law

Volltext beim Verlag öffnen

Benchmarking LLM Agent Efficiency in Production Systems: An Observational Prospective Methodology

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen