STILL-2 uses imitation of distilled long-form thoughts, multi-rollout exploration on difficult problems, and iterative self-improvement of the dataset to train reasoning models that reach competitive performance on three challenging benchmarks.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2024 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems
STILL-2 uses imitation of distilled long-form thoughts, multi-rollout exploration on difficult problems, and iterative self-improvement of the dataset to train reasoning models that reach competitive performance on three challenging benchmarks.