Video-OPD uses on-policy distillation from a frontier teacher to turn sparse episode rewards into dense step-wise signals for more efficient post-training of MLLMs on temporal video grounding.
Gaussian-Weighted Difference Sampling (GWDS) We consider a probabilistic sampling strategy based on a Gaussian weighting over IoU differences
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation
Video-OPD uses on-policy distillation from a frontier teacher to turn sparse episode rewards into dense step-wise signals for more efficient post-training of MLLMs on temporal video grounding.