Failure-prefix conditioning unlocks learning from saturated reasoning problems by conditioning on failure prefixes, improving recovery from misleading early steps and matching gains from new medium-difficulty problems.
How to explore to scale rl training of llms on hard problems? https://blog.ml.cmu.edu/2025/1 1/26/how-to-explore-to-scale-rl-train ing-of-llms-on-hard-problems
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning
Failure-prefix conditioning unlocks learning from saturated reasoning problems by conditioning on failure prefixes, improving recovery from misleading early steps and matching gains from new medium-difficulty problems.