ORBIS uses output-guided token reduction and DATM to achieve 2x higher token reduction than AsymRnR, with up to 4.5x speedup and 79.3% energy savings versus A100 GPU for video DiT models.
hub
Ryoo, and Tian Xie
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 13roles
background 2polarities
background 2representative citing papers
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
DisCa replaces heuristic feature caching with a lightweight learnable neural predictor compatible with distillation, achieving 11.8× acceleration on video diffusion transformers with preserved generation quality.
TCC calibrates cached representations in diffusion sampling via an offline iterative procedure that accounts for trajectory shifts, improving FID from 29.83 to 27.35 on PixArt-alpha while preserving reuse policies.
Dual-Rate Diffusion interleaves sparse heavy context encoding with frequent light denoising to cut diffusion sampling cost by 2-4x on ImageNet while matching baseline quality and remaining compatible with distillation.
CoCoDiff achieves 3.6x average and 8.4x peak speedup for distributed DiT inference on up to 96 GPU tiles via tile-aware all-to-all, V-first scheduling, and selective V communication.
S2O uses online permutation and importance-based early stopping to increase effective sparsity in attention, delivering 7.51x attention and 3.81x end-to-end speedups on Llama-3.1-8B at 128K context with preserved accuracy.
HERO accelerates world model inference 1.73x via hierarchical patch-wise refresh in shallow layers and linear extrapolation in deeper layers with minimal quality loss.
BAC accelerates transformer-based Diffusion Policy up to 3x by block-level adaptive feature caching using an Adaptive Caching Scheduler and Bubbling Union Algorithm to control error propagation.
DVG dynamically selects content-aware spatio-temporal acceleration strategies for diffusion-based video generation, delivering up to 7x speedup with near-lossless quality on models like HunyuanVideo.
AllocMV uses a global planner to build a structured persistent state then solves a Multiple-Choice Knapsack Problem to allocate High-Gen, Mid-Gen, and Reuse compute branches, achieving an optimal Cost-Quality Ratio under budget and rhythmic constraints.
AdaCorrection adaptively corrects offset caches in DiT inference via on-the-fly spatio-temporal validity checks to maintain near-original FID with moderate acceleration.
Coherence-first rendering with 15 FPS anchors plus FSR4 upsampling to 30 FPS preserves scene geometry and identity longer than native 30 FPS generation across tested forest, sword, desert, and snow scenes, with LPIPS favoring the coherence branch.
citing papers explorer
-
Fewer, Better Frames: A Compute-Normalized Proof of Concept for Coherence-First World-Model Rendering with Model-Guided FSR4 Frame Generation
Coherence-first rendering with 15 FPS anchors plus FSR4 upsampling to 30 FPS preserves scene geometry and identity longer than native 30 FPS generation across tested forest, sword, desert, and snow scenes, with LPIPS favoring the coherence branch.