pith. sign in

Title resolution pending

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.LG 1

years

2026 1

verdicts

UNVERDICTED 1

clear filters

representative citing papers

Cross-Epoch Adaptive Rollout Optimization for RL Post-Training

cs.LG · 2026-06-04 · unverdicted · novelty 7.0

CERO uses Beta posteriors and Fenchel-dual online optimization to adaptively allocate a fixed rollout budget across prompts and epochs in LLM RL, outperforming fixed-allocation GRPO on math reasoning benchmarks.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • Cross-Epoch Adaptive Rollout Optimization for RL Post-Training cs.LG · 2026-06-04 · unverdicted · none · ref 5

    CERO uses Beta posteriors and Fenchel-dual online optimization to adaptively allocate a fixed rollout budget across prompts and epochs in LLM RL, outperforming fixed-allocation GRPO on math reasoning benchmarks.