Masked conditional video diffusion for prediction, generation, and interpolation.arXiv preprint arXiv:2205.09853

Vikram V oleti, Alexia Jolicoeur-Martineau, Christopher Pal · 2022 · arXiv 2205.09853

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Phenaki: Variable Length Video Generation From Open Domain Textual Description

cs.CV · 2022-10-05 · unverdicted · novelty 7.0

Phenaki generates arbitrary-length videos from sequences of text prompts by tokenizing videos with causal temporal attention and generating tokens with a text-conditioned masked transformer, trained jointly on images and videos.

Inferring Dynamic Physical Properties from Video Foundation Models

cs.CV · 2025-10-02 · unverdicted · novelty 6.0

Video foundation models infer dynamic physical properties such as elasticity, viscosity, and friction from videos at levels close to classical oracles while outperforming current MLLMs with suitable prompting.

Latent Video Diffusion Models for High-Fidelity Long Video Generation

cs.CV · 2022-11-23 · unverdicted · novelty 6.0

Latent-space hierarchical diffusion models with targeted error-correction techniques generate realistic videos exceeding 1000 frames while using less compute than prior pixel-space approaches.

citing papers explorer

Showing 3 of 3 citing papers.

Phenaki: Variable Length Video Generation From Open Domain Textual Description cs.CV · 2022-10-05 · unverdicted · none · ref 47
Phenaki generates arbitrary-length videos from sequences of text prompts by tokenizing videos with causal temporal attention and generating tokens with a text-conditioned masked transformer, trained jointly on images and videos.
Inferring Dynamic Physical Properties from Video Foundation Models cs.CV · 2025-10-02 · unverdicted · none · ref 11
Video foundation models infer dynamic physical properties such as elasticity, viscosity, and friction from videos at levels close to classical oracles while outperforming current MLLMs with suitable prompting.
Latent Video Diffusion Models for High-Fidelity Long Video Generation cs.CV · 2022-11-23 · unverdicted · none · ref 39
Latent-space hierarchical diffusion models with targeted error-correction techniques generate realistic videos exceeding 1000 frames while using less compute than prior pixel-space approaches.

Masked conditional video diffusion for prediction, generation, and interpolation.arXiv preprint arXiv:2205.09853

fields

years

verdicts

representative citing papers

citing papers explorer