SFT on divergent branch-heavy CoT from DeepSeek-R1 yields worse generalization than convergent CoT from gpt-oss despite lower loss, but filtering frequent branches improves average performance by 3.6% on five reasoning benchmarks.
It represents the ”divergent” phase of reasoning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning
SFT on divergent branch-heavy CoT from DeepSeek-R1 yields worse generalization than convergent CoT from gpt-oss despite lower loss, but filtering frequent branches improves average performance by 3.6% on five reasoning benchmarks.