← back to paper
arxiv: 2605.18141 · 2 revisions
A Brief Overview: On-Policy Self-Distillation In Large Language Models