ORBIS uses output-guided token reduction and DATM to achieve 2x higher token reduction than AsymRnR, with up to 4.5x speedup and 79.3% energy savings versus A100 GPU for video DiT models.
hub
Ryoo, and Tian Xie
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 11roles
background 2polarities
background 2representative citing papers
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
DisCa replaces heuristic feature caching with a lightweight learnable neural predictor compatible with distillation, achieving 11.8× acceleration on video diffusion transformers with preserved generation quality.
Dual-Rate Diffusion interleaves sparse heavy context encoding with frequent light denoising to cut diffusion sampling cost by 2-4x on ImageNet while matching baseline quality and remaining compatible with distillation.
CoCoDiff achieves 3.6x average and 8.4x peak speedup for distributed DiT inference on up to 96 GPU tiles via tile-aware all-to-all, V-first scheduling, and selective V communication.
S2O uses online permutation and importance-based early stopping to increase effective sparsity in attention, delivering 7.51x attention and 3.81x end-to-end speedups on Llama-3.1-8B at 128K context with preserved accuracy.
HERO accelerates world model inference 1.73x via hierarchical patch-wise refresh in shallow layers and linear extrapolation in deeper layers with minimal quality loss.
BAC accelerates transformer-based Diffusion Policy up to 3x by block-level adaptive feature caching using an Adaptive Caching Scheduler and Bubbling Union Algorithm to control error propagation.
DVG dynamically selects content-aware spatio-temporal acceleration strategies for diffusion-based video generation, delivering up to 7x speedup with near-lossless quality on models like HunyuanVideo.
AllocMV uses a global planner to build a structured persistent state then solves a Multiple-Choice Knapsack Problem to allocate High-Gen, Mid-Gen, and Reuse compute branches, achieving an optimal Cost-Quality Ratio under budget and rhythmic constraints.
AdaCorrection adaptively corrects offset caches in DiT inference via on-the-fly spatio-temporal validity checks to maintain near-original FID with moderate acceleration.
citing papers explorer
-
ORBIS: Output-Guided Token Reduction with Distribution-Aware Matching for Video Diffusion Acceleration
ORBIS uses output-guided token reduction and DATM to achieve 2x higher token reduction than AsymRnR, with up to 4.5x speedup and 79.3% energy savings versus A100 GPU for video DiT models.
-
Efficient Video Diffusion Models: Advancements and Challenges
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
-
DisCa: Accelerating Video Diffusion Transformers with Distillation-Compatible Learnable Feature Caching
DisCa replaces heuristic feature caching with a lightweight learnable neural predictor compatible with distillation, achieving 11.8× acceleration on video diffusion transformers with preserved generation quality.
-
Dual-Rate Diffusion: Accelerating diffusion models with an interleaved heavy-light network
Dual-Rate Diffusion interleaves sparse heavy context encoding with frequent light denoising to cut diffusion sampling cost by 2-4x on ImageNet while matching baseline quality and remaining compatible with distillation.
-
CoCoDiff: Optimizing Collective Communications for Distributed Diffusion Transformer Inference Under Ulysses Sequence Parallelism
CoCoDiff achieves 3.6x average and 8.4x peak speedup for distributed DiT inference on up to 96 GPU tiles via tile-aware all-to-all, V-first scheduling, and selective V communication.
-
S2O: Early Stopping for Sparse Attention via Online Permutation
S2O uses online permutation and importance-based early stopping to increase effective sparsity in attention, delivering 7.51x attention and 3.81x end-to-end speedups on Llama-3.1-8B at 128K context with preserved accuracy.
-
HERO: Hierarchical Extrapolation and Refresh for Efficient World Models
HERO accelerates world model inference 1.73x via hierarchical patch-wise refresh in shallow layers and linear extrapolation in deeper layers with minimal quality loss.
-
Block-wise Adaptive Caching for Accelerating Diffusion Policy
BAC accelerates transformer-based Diffusion Policy up to 3x by block-level adaptive feature caching using an Adaptive Caching Scheduler and Bubbling Union Algorithm to control error propagation.
-
Dynamic Video Generation: Shaping Video Generation Across Time and Space
DVG dynamically selects content-aware spatio-temporal acceleration strategies for diffusion-based video generation, delivering up to 7x speedup with near-lossless quality on models like HunyuanVideo.
-
AllocMV: Optimal Resource Allocation for Music Video Generation via Structured Persistent State
AllocMV uses a global planner to build a structured persistent state then solves a Multiple-Choice Knapsack Problem to allocate High-Gen, Mid-Gen, and Reuse compute branches, achieving an optimal Cost-Quality Ratio under budget and rhythmic constraints.
-
AdaCorrection: Adaptive Offset Cache Correction for Accurate Diffusion Transformers
AdaCorrection adaptively corrects offset caches in DiT inference via on-the-fly spatio-temporal validity checks to maintain near-original FID with moderate acceleration.