Position bias in on-policy distillation degrades later-token supervision; IW-OPD weights tokens by accumulated discrepancy, yielding faster convergence and up to 6.9 point gains on AIME-2025.
LLM-oriented token-adaptive knowledge distillation, 2025
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
CaMOPD recovers general capabilities in domain-specialized LLMs via alternating training and gap-based sample selection in multi-teacher on-policy distillation while preserving domain behavior.
citing papers explorer
-
On the Position Bias of On-Policy Distillation
Position bias in on-policy distillation degrades later-token supervision; IW-OPD weights tokens by accumulated discrepancy, yielding faster convergence and up to 6.9 point gains on AIME-2025.
-
Counteraction-Aware Multi-Teacher On-Policy Distillation for General Capability Recovery with Domain Preservation
CaMOPD recovers general capabilities in domain-specialized LLMs via alternating training and gap-based sample selection in multi-teacher on-policy distillation while preserving domain behavior.