← back to paper
arxiv: 2605.06094 · 3 revisions
VISD: Enhancing Video Reasoning via Structured Self-Distillation