SFT delivers uniform procedural skill gains of 4-7.5 points across 0.8B-4B models while pre-SFT performance follows a W-shape, making SFT most effective where base models struggle.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Procedural-skill SFT across capacity tiers: A W-Shaped pre-SFT Trajectory and Regime-Asymmetric Mechanism on 0.8B-4B Qwen3.5 Models
SFT delivers uniform procedural skill gains of 4-7.5 points across 0.8B-4B models while pre-SFT performance follows a W-shape, making SFT most effective where base models struggle.