Scaling up the refinement size yields highly marginal improvements across all benchmarks

As shown in the table, our method demonstrates remarkable robustness to this hyperparameter · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Understanding and Mitigating Spurious Signal Amplification in Test-Time Reinforcement Learning for Math Reasoning

cs.LG · 2026-04-23 · unverdicted · novelty 6.0

DDRL reduces spurious reward noise in test-time RL for math by excluding ambiguous samples, using fixed advantages, and adding consensus-based updates, outperforming prior TTRL methods on math benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

Understanding and Mitigating Spurious Signal Amplification in Test-Time Reinforcement Learning for Math Reasoning cs.LG · 2026-04-23 · unverdicted · none · ref 22
DDRL reduces spurious reward noise in test-time RL for math by excluding ambiguous samples, using fixed advantages, and adding consensus-based updates, outperforming prior TTRL methods on math benchmarks.

Scaling up the refinement size yields highly marginal improvements across all benchmarks

fields

years

verdicts

representative citing papers

citing papers explorer