DisCa replaces heuristic feature caching with a lightweight learnable neural predictor compatible with distillation, achieving 11.8× acceleration on video diffusion transformers with preserved generation quality.
Accvideo: Accel- erating video diffusion model with synthetic dataset
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
PARE applies structure-aware head pruning and timestep/content-conditioned block routing to compress video DiTs, reducing per-step compute while preserving quality on Wan2.1-14B.
Synthetic data complements real data in diffusion-based controllable human video generation, with effective sample selection improving motion realism, temporal consistency, and identity preservation.
SURF accelerates high-resolution video generation up to 12.5x by using noise reshifting for low-res previews from pretrained models and a shifting-window Refiner for efficient upscaling that retains original signatures.
ActDiff-VC partitions video into segments, transmits adaptive keyframes and budget-aware point trajectories, and reconstructs frames via conditional diffusion, reporting up to 64.6% bitrate reduction at matched NIQE on UVG and MCL-JCV.