verl: V olcano engine reinforcement learning for LLMs

verl project · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime

cs.LG · 2026-05-06 · unverdicted · novelty 6.0 · 3 refs

Prefix Sampling replays self-generated trajectory prefixes to control rollout pass rates near 50% in binary-reward RL, delivering wall-clock speedups and modest performance gains on SWE-bench Verified and AIME tasks.

citing papers explorer

Showing 1 of 1 citing paper.

Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime cs.LG · 2026-05-06 · unverdicted · none · ref 36 · 3 links
Prefix Sampling replays self-generated trajectory prefixes to control rollout pass rates near 50% in binary-reward RL, delivering wall-clock speedups and modest performance gains on SWE-bench Verified and AIME tasks.

verl: V olcano engine reinforcement learning for LLMs

fields

years

verdicts

representative citing papers

citing papers explorer