A curriculum sampling questions with high variance in success rate improves reinforcement learning performance for LLM reasoning tasks.
This allows prioritizing samples that maximize the change in loss—i.e., the model’s learning progress [ 52, 11, 53, 49]
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Learning to Reason at the Frontier of Learnability
A curriculum sampling questions with high variance in success rate improves reinforcement learning performance for LLM reasoning tasks.