← back to paper
arxiv: 2605.11505 · 2 revisions
Selective Off-Policy Reference Tuning with Plan Guidance