ECHO: Entropy-confidence hybrid optimization for test-time reinforcement learning

Zhiyuan Zhao, Yifan Yang, et al · 2026 · arXiv 2602.02150

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

representative citing papers

Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning

stat.ML · 2026-05-06 · unverdicted · novelty 7.0

InfoTree casts intermediate state selection in tree search as monotone submodular maximization under fixed rollout budgets, yielding closed-form UUCB terms and lifting mixed-outcome ratios while outperforming flat GRPO and prior tree variants on nine benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning stat.ML · 2026-05-06 · unverdicted · none · ref 29 · internal anchor
InfoTree casts intermediate state selection in tree search as monotone submodular maximization under fixed rollout budgets, yielding closed-form UUCB terms and lifting mixed-outcome ratios while outperforming flat GRPO and prior tree variants on nine benchmarks.

ECHO: Entropy-confidence hybrid optimization for test-time reinforcement learning

fields

years

verdicts

representative citing papers

citing papers explorer