pith. sign in

hub Canonical reference

Latent Video Diffusion Models for High-Fidelity Long Video Generation

Canonical reference. 92% of citing Pith papers cite this work as background.

39 Pith papers citing it
Background 92% of classified citations
abstract

AI-generated content has attracted lots of attention recently, but photo-realistic video synthesis is still challenging. Although many attempts using GANs and autoregressive models have been made in this area, the visual quality and length of generated videos are far from satisfactory. Diffusion models have shown remarkable results recently but require significant computational resources. To address this, we introduce lightweight video diffusion models by leveraging a low-dimensional 3D latent space, significantly outperforming previous pixel-space video diffusion models under a limited computational budget. In addition, we propose hierarchical diffusion in the latent space such that longer videos with more than one thousand frames can be produced. To further overcome the performance degradation issue for long video generation, we propose conditional latent perturbation and unconditional guidance that effectively mitigate the accumulated errors during the extension of video length. Extensive experiments on small domain datasets of different categories suggest that our framework generates more realistic and longer videos than previous strong baselines. We additionally provide an extension to large-scale text-to-video generation to demonstrate the superiority of our work. Our code and models will be made publicly available.

hub tools

citation-role summary

background 11 method 1

citation-polarity summary

representative citing papers

Efficient Video Diffusion Models: Advancements and Challenges

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.

History-Guided Video Diffusion

cs.LG · 2025-02-10 · unverdicted · novelty 7.0

DFoT enables flexible history conditioning in video diffusion, with history guidance methods that boost temporal consistency and support long rollouts.

Learning Interactive Real-World Simulators

cs.AI · 2023-10-09 · conditional · novelty 7.0

UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.

DiffATS: Diffusion in Aligned Tensor Space

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

DiffATS trains diffusion models directly on aligned Tucker tensor primitives that are proven to be homeomorphisms, delivering efficient unconditional and conditional generation across images, videos, and PDE data with high compression.

LongLive: Real-time Interactive Long Video Generation

cs.CV · 2025-09-26 · conditional · novelty 6.0

LongLive is a causal autoregressive video generator that produces up to 240-second interactive videos at 20.7 FPS on one H100 GPU after 32 GPU-days of fine-tuning from a 1.3B short-clip model.

ReSim: Reliable World Simulation for Autonomous Driving

cs.CV · 2025-06-11 · unverdicted · novelty 6.0

ReSim is a controllable video world model trained on heterogeneous real and simulated driving data that achieves higher fidelity and controllability for both expert and non-expert actions, plus a Video2Reward module for estimating action quality from simulated futures.

citing papers explorer

Showing 39 of 39 citing papers.