pith. sign in

Diffusion probabilistic modeling for video generation

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 3 baseline 1

citation-polarity summary

years

2023 3 2022 5

representative citing papers

Learning Interactive Real-World Simulators

cs.AI · 2023-10-09 · conditional · novelty 7.0

UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.

Video Diffusion Models

cs.CV · 2022-04-07 · unverdicted · novelty 7.0

A diffusion model for video generation extends image architectures with joint image-video training and improved conditional sampling, delivering first large-scale text-to-video results and state-of-the-art performance on video prediction and unconditional generation benchmarks.

ModelScope Text-to-Video Technical Report

cs.CV · 2023-08-12 · unverdicted · novelty 4.0

ModelScopeT2V is a 1.7-billion-parameter text-to-video model built on Stable Diffusion that adds temporal modeling and outperforms prior methods on three evaluation metrics.

citing papers explorer

Showing 8 of 8 citing papers.

  • Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow cs.LG · 2022-09-07 · unverdicted · none · ref 92

    Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.

  • Learning Interactive Real-World Simulators cs.AI · 2023-10-09 · conditional · none · ref 175

    UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.

  • Phenaki: Variable Length Video Generation From Open Domain Textual Description cs.CV · 2022-10-05 · unverdicted · none · ref 56

    Phenaki generates arbitrary-length videos from sequences of text prompts by tokenizing videos with causal temporal attention and generating tokens with a text-conditioned masked transformer, trained jointly on images and videos.

  • Imagen Video: High Definition Video Generation with Diffusion Models cs.CV · 2022-10-05 · unverdicted · none · ref 21

    Imagen Video generates high-definition text-conditional videos via a cascade of base and super-resolution diffusion models, achieving high fidelity and controllability.

  • Video Diffusion Models cs.CV · 2022-04-07 · unverdicted · none · ref 63

    A diffusion model for video generation extends image architectures with joint image-video training and improved conditional sampling, delivering first large-scale text-to-video results and state-of-the-art performance on video prediction and unconditional generation benchmarks.

  • eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers cs.CV · 2022-11-02 · unverdicted · none · ref 86

    An ensemble of stage-specialized text-to-image diffusion models improves prompt alignment over single shared-parameter models while preserving visual quality and inference speed.

  • I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models cs.CV · 2023-11-07 · unverdicted · none · ref 52

    I2VGen-XL applies cascaded diffusion models with a base stage for semantic preservation via hierarchical encoders and a refinement stage for detail and resolution, trained on 35 million text-video and 6 billion text-image pairs.

  • ModelScope Text-to-Video Technical Report cs.CV · 2023-08-12 · unverdicted · none · ref 66

    ModelScopeT2V is a 1.7-billion-parameter text-to-video model built on Stable Diffusion that adds temporal modeling and outperforms prior methods on three evaluation metrics.