A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
hub
From reusing to forecasting: Accel- erating diffusion models with taylorseers
12 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
LayerCache enables per-layer-group caching in flow matching models via adaptive JVP span selection and greedy 3D scheduling, delivering 1.37x speedup with PSNR 37.46 dB, SSIM 0.9834, and LPIPS 0.0178 on Qwen-Image.
DRIFT uses resilience analysis, targeted DVFS, and adaptive rollback ABFT to deliver 36% average energy savings or 1.7x speedup in diffusion model inference while preserving generation quality.
Chorus accelerates video DiT serving up to 45% via inter-request caching reuse in a three-stage denoising strategy with token-guided attention amplification.
RT-Lynx shifts DiT sparsity from weights to activations, reports up to 1.55x linear-layer speedup while preserving generation quality across multiple diffusion models.
MotionCache accelerates autoregressive video generation up to 6.28x by motion-weighted cache reuse based on inter-frame differences, with negligible quality loss on SkyReels-V2 and MAGI-1.
CoCoDiff achieves 3.6x average and 8.4x peak speedup for distributed DiT inference on up to 96 GPU tiles via tile-aware all-to-all, V-first scheduling, and selective V communication.
DreamZero uses a 14B video diffusion model as a World Action Model to achieve over 2x better zero-shot generalization on real robots than state-of-the-art VLAs, real-time 7Hz closed-loop control, and cross-embodiment transfer with 10-30 minutes of data.
FlashEdit delivers real-time localized text-guided image editing under 0.2 seconds via cycle-consistent one-step inversion, background shield, and sparsified spatial cross-attention, achieving over 150x speedup on PIE-Bench.
OTCache uses optimal transport to interpolate caching schedules between a graph-based reference and an Optuna-optimized anchor, delivering 3.66x-4.7x speedups on FLUX.1, Qwen-Image and HunyuanVideo with improved fidelity.
DVG dynamically selects content-aware spatio-temporal acceleration strategies for diffusion-based video generation, delivering up to 7x speedup with near-lossless quality on models like HunyuanVideo.
A commutator-zero condition enables training-free generation of perceptually consistent low-resolution previews for high-resolution diffusion model outputs, achieving up to 33% computation reduction.
citing papers explorer
No citing papers match the current filters.