TOPD augments on-policy distillation by using near-future trajectory signals to suppress non-divergent high-loss tokens and distribute guidance, raising average accuracy from 47.8% to 52.2% on reasoning benchmarks.
IEEE Transactions on Emerging Topics in Computational Intelligence , year =
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Bridging Reasoning Trajectories in On-Policy Distillation via Near-Future Guidance
TOPD augments on-policy distillation by using near-future trajectory signals to suppress non-divergent high-loss tokens and distribute guidance, raising average accuracy from 47.8% to 52.2% on reasoning benchmarks.