hub

Sparse videogen: Acceler- ating video diffusion transformers with spatial-temporal sparsity.arXiv preprint arXiv:2502.01776

Haocheng Xi, Shuo Yang, Yilong Zhao, Chenfeng Xu, Muyang Li, Xiuyu Li, Yujun Lin, Han Cai, Jintao Zhang, Dacheng Li, et al · 2025 · arXiv 2502.01776

25 Pith papers cite this work. Polarity classification is still indexing.

25 Pith papers citing it

read on arXiv browse 25 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

DFSAttn is a training-free framework for dynamic fine-grained sparse attention in video DiTs that achieves up to 2.1x speedup while preserving generation quality via Hilbert reordering, hierarchical scoring, and adaptive caching.

HASTE: Training-Free Video Diffusion Acceleration via Head-Wise Adaptive Sparse Attention

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

HASTE delivers up to 1.93x speedup on Wan2.1 video DiTs via head-wise adaptive sparse attention using temporal mask reuse and error-guided per-head calibration while preserving video quality.

CoReDiT: Spatial Coherence-Guided Token Pruning and Reconstruction for Efficient Diffusion Transformers

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

CoReDiT reduces self-attention FLOPs in DiTs by up to 55% via linear-time spatial coherence pruning and neighbor-based reconstruction, delivering 1.33x-1.72x speedups with maintained quality.

Efficient Video Diffusion Models: Advancements and Challenges

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.

Attention Sparsity is Input-Stable: Training-Free Sparse Attention for Video Generation via Offline Sparsity Profiling and Online QK Co-Clustering

cs.CV · 2026-03-19 · conditional · novelty 7.0

Attention sparsity in video DiTs is an input-stable layer-wise property, enabling offline profiling and online bidirectional QK co-clustering for up to 1.93x speedup with PSNR up to 29 dB.

SyncCache: Exploiting Asymmetric Dynamics for Fast Audio-Driven Portrait Animation

cs.CV · 2026-06-29 · unverdicted · novelty 6.0

SyncCache accelerates DiT-based audio-driven portrait animation up to 4.12x via spatially-asymmetric probing and modality-decoupled caching while preserving near-lossless quality and audio sync.

EcoVideo: Entropy-Orchestrated Video Generation Paradigm in Cloud-Edge Dynamics

cs.CV · 2026-06-29 · unverdicted · novelty 6.0

EcoVideo introduces entropy-driven dynamic frame selection for cloud-edge DiT video generation, yielding up to 2.9x speedup with adaptive keyframe budgets.

RhymeFlow: Training-Free Acceleration for Video Generation with Asynchronous Denoising Flow Scheduling

cs.CV · 2026-06-04 · unverdicted · novelty 6.0

RhymeFlow is a training-free acceleration framework that decouples denoising trajectories across video frames by dense processing of semantic keyframes and asynchronous skipping for non-keyframes, augmented by a latent trajectory projection module to maintain consistency.

Light Interaction: Training-Free Inference Acceleration for Interactive Video World Models

cs.CV · 2026-05-29 · unverdicted · novelty 6.0

Light Interaction accelerates interactive video world models up to 2.59x via adaptive context management, denoising cache acceleration, and 3D block sparse attention without retraining.

LVSA: Training-Free Sparse Attention for Long Video Diffusion

cs.CV · 2026-05-29 · unverdicted · novelty 6.0

LVSA is a training-free block-sparse attention technique combining structured windows with rotating global anchors that reduces inference compute 2.98-3.33x on video diffusion models at extended horizons while remaining quality-neutral or positive.

Veda: Scalable Video Diffusion via Distilled Sparse Attention

cs.CV · 2026-05-28 · unverdicted · novelty 6.0

Veda formulates tile selection in video diffusion attention as a reconstruction problem from full attention maps, using statistics-aware and head-aware scoring to enable high sparsity with maintained quality and hardware speedups up to 5.1x end-to-end.

RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models

cs.LG · 2026-05-26 · unverdicted · novelty 6.0

RT-Lynx shifts DiT sparsity from weights to activations, reports up to 1.55x linear-layer speedup while preserving generation quality across multiple diffusion models.

SparseSAM: Structured Sparsification of Activations in Segment Anything Models

cs.CV · 2026-05-17 · unverdicted · novelty 6.0

SparseSAM achieves 2x faster inference and 2.8x memory reduction in SAM with only 0.004 mIoU loss at 0.4 density via Stripe-Sort Attention and Residual-Consistency MLP.

DynamicRad: Content-Adaptive Sparse Attention for Long Video Diffusion

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

DynamicRad achieves 1.7x-2.5x inference speedups in long video diffusion with over 80% sparsity by grounding adaptive selection in a radial locality prior, using dual-mode static/dynamic strategies and offline BO with a semantic motion router.

AdaCluster: Adaptive Query-Key Clustering for Sparse Attention in Video Generation

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

AdaCluster delivers a training-free adaptive query-key clustering framework for sparse attention in video DiTs, yielding 1.67-4.31x inference speedup with negligible quality loss on CogVideoX-2B, HunyuanVideo, and Wan-2.1.

Memorize When Needed: Decoupled Memory Control for Spatially Consistent Long-Horizon Video Generation

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

A decoupled memory branch with hybrid cues, cross-attention, and gating improves spatial consistency and data efficiency in long-horizon camera-trajectory video generation.

Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation

cs.CV · 2026-04-11 · conditional · novelty 6.0

Hybrid Forcing combines linear temporal attention for long-range retention, block-sparse attention for efficiency, and decoupled distillation to achieve real-time unbounded 832x480 streaming video generation at 29.5 FPS.

Video Compression Meets Video Generation: Latent Inter-Frame Pruning with Attention Recovery

cs.CV · 2026-03-06 · unverdicted · novelty 6.0

LIPAR prunes redundant inter-frame latent patches in video generation and recovers attention to deliver 1.53x speedup at 19.3 FPS with no quality drop or extra training.

S2O: Early Stopping for Sparse Attention via Online Permutation

cs.LG · 2026-02-26 · unverdicted · novelty 6.0

S2O uses online permutation and importance-based early stopping to increase effective sparsity in attention, delivering 7.51x attention and 3.81x end-to-end speedups on Llama-3.1-8B at 128K context with preserved accuracy.

Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization

cs.LG · 2026-02-03 · unverdicted · novelty 6.0

Quant VideoGen reduces KV cache memory by up to 7 times in autoregressive video diffusion models via semantic aware smoothing and progressive residual quantization, achieving better quality than baselines with under 4% latency overhead.

SURF: Signature-Retained Fast Video Generation

cs.GR · 2025-11-25 · unverdicted · novelty 6.0

SURF accelerates high-resolution video generation up to 12.5x by using noise reshifting for low-res previews from pretrained models and a shifting-window Refiner for efficient upscaling that retains original signatures.

DiSC: Resolution-Scalable Acceleration of Diffusion Models by Exploiting Sparsity and Cached Token Reuse with Hash-based Distribution

cs.AR · 2026-05-25 · unverdicted · novelty 5.0

DiSC accelerates DiT and PixArt-Sigma diffusion models 3.47-4.74x over A100 GPUs by reusing cached tokens across denoising steps and reusing sparsity masks in attention, using hash-based bank distribution to run sparse workloads on standard compute units.

Towards Redundancy Reduction in Diffusion Models for Efficient Video Super-Resolution

cs.CV · 2025-09-28 · unverdicted · novelty 5.0

OASIS reduces redundancy in diffusion models for real-world video super-resolution via attention specialization routing and progressive training, delivering state-of-the-art quality with 6.2x faster inference than prior one-step baselines.

OSP-Next: Efficient High-Quality Video Generation with Sparse Sequence Parallelism, HiF8 Quantization, and Reinforcement Learning

cs.CV · 2026-05-27 · unverdicted · novelty 4.0

OSP-Next reports 83.73% VBench score and up to 2.27x speedup via hybrid sparse attention, SSP parallelism, HiF8 quantization, and Mix-GRPO on diffusion transformers.

citing papers explorer

Showing 2 of 2 citing papers after filters.

SURF: Signature-Retained Fast Video Generation cs.GR · 2025-11-25 · unverdicted · none · ref 39
SURF accelerates high-resolution video generation up to 12.5x by using noise reshifting for low-res previews from pretrained models and a shifting-window Refiner for efficient upscaling that retains original signatures.
Towards Redundancy Reduction in Diffusion Models for Efficient Video Super-Resolution cs.CV · 2025-09-28 · unverdicted · none · ref 16
OASIS reduces redundancy in diffusion models for real-world video super-resolution via attention specialization routing and progressive training, delivering state-of-the-art quality with 6.2x faster inference than prior one-step baselines.

Sparse videogen: Acceler- ating video diffusion transformers with spatial-temporal sparsity.arXiv preprint arXiv:2502.01776

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer