pith. sign in

Train less, learn more: Adaptive efficient rollout optimization for group-based reinforcement learning

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

years

2026 5

verdicts

UNVERDICTED 5

clear filters

representative citing papers

Cross-Epoch Adaptive Rollout Optimization for RL Post-Training

cs.LG · 2026-06-04 · unverdicted · novelty 7.0

CERO uses Beta posteriors and Fenchel-dual online optimization to adaptively allocate a fixed rollout budget across prompts and epochs in LLM RL, outperforming fixed-allocation GRPO on math reasoning benchmarks.

citing papers explorer

Showing 5 of 5 citing papers after filters.