Video-OPD uses on-policy distillation from a frontier teacher to turn sparse episode rewards into dense step-wise signals for more efficient post-training of MLLMs on temporal video grounding.
Datasets and recipes for video temporal grounding via reinforce- ment learning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation
Video-OPD uses on-policy distillation from a frontier teacher to turn sparse episode rewards into dense step-wise signals for more efficient post-training of MLLMs on temporal video grounding.