InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning

· 2026 · cs.AI · arXiv 2603.17310

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Large Language Models (LLMs) with extended reasoning capabilities often generate verbose and redundant reasoning traces, incurring unnecessary computational cost. While existing reinforcement learning approaches address this by optimizing final response length, they neglect the quality of intermediate reasoning steps, leaving models vulnerable to reward hacking. We argue that verbosity is not merely a length problem, but a symptom of poor intermediate reasoning quality. To investigate this, we conduct an empirical study tracking the per-token predictive entropy of large reasoning models across reasoning trajectories. We find that high-quality reasoning traces exhibit two consistent properties: low uncertainty convergence and fast uncertainty descent. These findings suggest that high-quality reasoning traces are informationally dense, that is, reasoning steps contribute to reaching a low uncertainty level relative to the total reasoning length. Motivated by this, we propose InfoDensity, a reward framework for RL training that captures both properties through a single suffix-max envelope of the entropy trajectory, weighted by a length scaling term that favors achieving equivalent quality more concisely. Experiments on mathematical and general reasoning benchmarks demonstrate that InfoDensity outperforms state-of-the-art baselines on the accuracy-efficiency trade-off.

representative citing papers

Momentum for Reasoning: Dense Intrinsic Signals in Policy Optimization

cs.AI · 2026-06-07 · unverdicted · novelty 6.0

ISPO densifies GRPO rewards with sequence-level informativeness and token-level directional signals from policy probabilities to reduce zero-advantage collapse and hallucinated certainty on math benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

Momentum for Reasoning: Dense Intrinsic Signals in Policy Optimization cs.AI · 2026-06-07 · unverdicted · none · ref 13 · internal anchor
ISPO densifies GRPO rewards with sequence-level informativeness and token-level directional signals from policy probabilities to reduce zero-advantage collapse and hallucinated certainty on math benchmarks.

InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning

fields

years

verdicts

representative citing papers

citing papers explorer