Internvideo2: Scaling foundation models for multimodal video understanding

Yi Wang, Kunchang Li, Xinhao Li, Jiashuo Yu, Yinan He, Guo Chen, Baoqi Pei, Rongkun Zheng, Zun Wang, Yansong Shi, et al · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

cs.CV · 2025-04-09 · unverdicted · novelty 5.0

Reinforcement fine-tuning with temporal rewards produces VideoChat-R1, a video MLLM showing large gains on spatio-temporal perception benchmarks such as +31.8 temporal grounding and +31.2 object tracking.

citing papers explorer

Showing 1 of 1 citing paper.

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning cs.CV · 2025-04-09 · unverdicted · none · ref 28
Reinforcement fine-tuning with temporal rewards produces VideoChat-R1, a video MLLM showing large gains on spatio-temporal perception benchmarks such as +31.8 temporal grounding and +31.2 object tracking.

Internvideo2: Scaling foundation models for multimodal video understanding

fields

years

verdicts

representative citing papers

citing papers explorer