Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Benchmarking and Enhancing Rule Knowledge-Driven Reasoning of Large Language Models

2026·0 Zitationen·Proceedings of the AAAI Conference on Artificial IntelligenceOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Large Language Models (LLMs) have demonstrated strong capabilities across diverse tasks under the example-driven learning paradigm. However, in high-stakes domains such as emergency response and industrial safety, historical incidents are scarce, confidential, or both, while concise rule books are abundant. We formalize this underexplored setting as rule knowledge-driven reasoning and ask: Can LLMs reason reliably when rules are plentiful but examples are nearly absent? To study this question, we introduce RULER, an automatic benchmark that generates 32K rigorously verified questions from 1K expert-curated emergency response rules to probe three core abilities: rule memorization, single-rule application, and multi-rule complex reasoning. RULER is further equipped with a hallucination-aware evaluation suite and novel relational metrics. A comprehensive empirical study of five representative LLMs and five enhancement strategies shows that, even when models achieve reliable performance on rule memorization and single-rule application, multi-rule complex reasoning plateaus at 5.4 on a 10-point scale. To address this limitation, we propose RAMPS, a Rule knowledge-Aware Monte Carlo Tree Search Process-reward Supervision framework. RAMPS injects rule knowledge priors into MCTS, distills 12K step-level traces without human annotation, and trains an advantage-based reward model that scores candidate reasoning paths during beam search inference. Experimental results show that RAMPS significantly improves multi-rule complex reasoning performance to 7.7.

Autoren

Institutionen

Themen

Topic ModelingText Readability and SimplificationArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Benchmarking and Enhancing Rule Knowledge-Driven Reasoning of Large Language Models

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen