← back to paper
arxiv: 2605.11739 · 2 revisions
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation