Generating Videos with Scene Dynamics

Antonio Torralba; Carl Vondrick; Hamed Pirsiavash

arxiv: 1609.02612 · v3 · pith:JNQ7XYFBnew · submitted 2016-09-08 · 💻 cs.CV · cs.GR· cs.LG

Generating Videos with Scene Dynamics

Carl Vondrick , Hamed Pirsiavash , Antonio Torralba This is my paper

classification 💻 cs.CV cs.GRcs.LG

keywords videoscenedynamicsmodelexperimentsgenerativetasksvideos

0 comments

read the original abstract

We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e.g. action classification) and video generation tasks (e.g. future prediction). We propose a generative adversarial network for video with a spatio-temporal convolutional architecture that untangles the scene's foreground from the background. Experiments suggest this model can generate tiny videos up to a second at full frame rate better than simple baselines, and we show its utility at predicting plausible futures of static images. Moreover, experiments and visualizations show the model internally learns useful features for recognizing actions with minimal supervision, suggesting scene dynamics are a promising signal for representation learning. We believe generative video models can impact many applications in video understanding and simulation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Phenaki: Variable Length Video Generation From Open Domain Textual Description
cs.CV 2022-10 unverdicted novelty 7.0

Phenaki generates arbitrary-length videos from sequences of text prompts by tokenizing videos with causal temporal attention and generating tokens with a text-conditioned masked transformer, trained jointly on images ...
Imagen Video: High Definition Video Generation with Diffusion Models
cs.CV 2022-10 unverdicted novelty 7.0

Imagen Video generates high-definition text-conditional videos via a cascade of base and super-resolution diffusion models, achieving high fidelity and controllability.
Planning Robot Motion using Deep Visual Prediction
cs.RO 2019-06 unverdicted novelty 3.0

PROM-Net performs unsupervised visual prediction of robot motion from raw frames and integrates the predictions into model predictive control for navigation in unknown dynamic settings.