← back to paper
arxiv: 2605.09725 · 2 revisions
On-Policy Distillation with Best-of-N Teacher Rollout Selection