TOPD improves on-policy distillation for LLM reasoning by using near-future guidance to identify divergent states, raising average accuracy from 47.8% to 52.2% on math benchmarks including AIME24 and AIME25.
IEEE Transactions on Emerging Topics in Computational Intelligence , year =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Empirical study of on-policy distillation finds 'Rock Tokens' that resist correction, account for up to 18% of output tokens, and add little to model reasoning despite high optimization cost.
citing papers explorer
-
Bridging Reasoning Trajectories in On-Policy Distillation via Near-Future Guidance
TOPD improves on-policy distillation for LLM reasoning by using near-future guidance to identify divergent states, raising average accuracy from 47.8% to 52.2% on math benchmarks including AIME24 and AIME25.
-
Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation
Empirical study of on-policy distillation finds 'Rock Tokens' that resist correction, account for up to 18% of output tokens, and add little to model reasoning despite high optimization cost.