Self-Policy Distillation extracts a capability subspace from model gradients on correctness tokens, projects KV activations into it for self-generation, and fine-tunes LLMs to achieve up to 13-16% gains over baselines without external signals.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
other 1polarities
unclear 1representative citing papers
MAD-OPD recasts on-policy distillation teachers as a debating collective to supply better supervision, lifting agentic and code performance over single-teacher OPD across multiple model sizes.
Prune-OPD detects prefix drift via top-k overlap and dynamically prunes unreliable teacher rewards in OPD, cutting training time 37.6-68% on AMC/AIME/HMMT while preserving performance.
citing papers explorer
-
Self-Policy Distillation via Capability-Selective Subspace Projection
Self-Policy Distillation extracts a capability subspace from model gradients on correctness tokens, projects KV activations into it for self-generation, and fine-tunes LLMs to achieve up to 13-16% gains over baselines without external signals.
-
MAD-OPD: Breaking the Ceiling in On-Policy Distillation via Multi-Agent Debate
MAD-OPD recasts on-policy distillation teachers as a debating collective to supply better supervision, lifting agentic and code performance over single-teacher OPD across multiple model sizes.
-
Prune-OPD: Efficient and Reliable On-Policy Distillation for Long-Horizon Reasoning
Prune-OPD detects prefix drift via top-k overlap and dynamically prunes unreliable teacher rewards in OPD, cutting training time 37.6-68% on AMC/AIME/HMMT while preserving performance.