One training example via RLVR boosts LLM math reasoning from 17.6% to 35.7% average across six benchmarks.
Rethinking reflection in pre- training.arXiv preprint arXiv:2504.04022
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
fields
cs.LG 3years
2025 3representative citing papers
DARS adaptively increases rollouts on hard problems in RLVR to improve Pass@K, and when paired with batch scaling for breadth, achieves gains in both Pass@K and Pass@1 by treating depth and breadth as complementary exploration dimensions.