CRPS synthesizes reasoning paths by contrasting high- and low-quality MCTS trajectories, enabling models trained on 60K examples to match or exceed those trained on 590K standard examples with better out-of-domain generalization.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories
CRPS synthesizes reasoning paths by contrasting high- and low-quality MCTS trajectories, enabling models trained on 60K examples to match or exceed those trained on 590K standard examples with better out-of-domain generalization.