Msr-vtt: A large video description dataset for bridging video and language

Jun Xu, Tao Mei, Ting Yao, Yong Rui · 2016

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

cs.CV · 2023-07-13 · unverdicted · novelty 6.0

InternVid supplies 7M videos and LLM captions to train ViCLIP, which reaches leading zero-shot action recognition and competitive retrieval performance.

MagicVideo: Efficient Video Generation With Latent Diffusion Models

cs.CV · 2022-11-20 · unverdicted · novelty 6.0

MagicVideo generates 256x256 text-conditioned video clips via latent diffusion with a custom 3D U-Net, achieving roughly 64 times lower compute than prior video diffusion models.

citing papers explorer

Showing 2 of 2 citing papers.

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation cs.CV · 2023-07-13 · unverdicted · none · ref 45
InternVid supplies 7M videos and LLM captions to train ViCLIP, which reaches leading zero-shot action recognition and competitive retrieval performance.
MagicVideo: Efficient Video Generation With Latent Diffusion Models cs.CV · 2022-11-20 · unverdicted · none · ref 50
MagicVideo generates 256x256 text-conditioned video clips via latent diffusion with a custom 3D U-Net, achieving roughly 64 times lower compute than prior video diffusion models.

Msr-vtt: A large video description dataset for bridging video and language

fields

years

verdicts

representative citing papers

citing papers explorer