A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
Arlon: Boosting diffusion transformers with autoregressive models for long video generation
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 6roles
background 3representative citing papers
Stream-T1 is a test-time scaling framework for streaming video generation using scaled noise propagation from history, reward pruning across short and long windows, and feedback-guided memory sinking to improve temporal consistency and visual quality.
AdaCluster delivers a training-free adaptive query-key clustering framework for sparse attention in video DiTs, yielding 1.67-4.31x inference speedup with negligible quality loss on CogVideoX-2B, HunyuanVideo, and Wan-2.1.
Rolling Forcing generates multi-minute videos in real time by jointly denoising frames at increasing noise levels, anchoring attention to early frames, and using windowed distillation to limit error accumulation.
A geometry-aware 4D video generation model trained with cross-view pointmap alignment to produce spatio-temporally consistent future videos from novel viewpoints for robot manipulation.
citing papers explorer
-
Efficient Video Diffusion Models: Advancements and Challenges
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
-
Stream-T1: Test-Time Scaling for Streaming Video Generation
Stream-T1 is a test-time scaling framework for streaming video generation using scaled noise propagation from history, reward pruning across short and long windows, and feedback-guided memory sinking to improve temporal consistency and visual quality.
-
AdaCluster: Adaptive Query-Key Clustering for Sparse Attention in Video Generation
AdaCluster delivers a training-free adaptive query-key clustering framework for sparse attention in video DiTs, yielding 1.67-4.31x inference speedup with negligible quality loss on CogVideoX-2B, HunyuanVideo, and Wan-2.1.
-
Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
Rolling Forcing generates multi-minute videos in real time by jointly denoising frames at increasing noise levels, anchoring attention to early frames, and using windowed distillation to limit error accumulation.
-
Geometry-aware 4D Video Generation for Robot Manipulation
A geometry-aware 4D video generation model trained with cross-view pointmap alignment to produce spatio-temporally consistent future videos from novel viewpoints for robot manipulation.
- VRAG: Learning World Models for Interactive Video Generation