Recognition: unknown
HASTE: Training-Free Video Diffusion Acceleration via Head-Wise Adaptive Sparse Attention
Pith reviewed 2026-05-15 05:14 UTC · model grok-4.3
The pith
Head-wise adaptive sparse attention accelerates pretrained video diffusion models up to 1.93 times without retraining by reusing temporal masks and calibrating sparsity per head.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that two plug-in components—Temporal Mask Reuse, which skips mask prediction based on query-key drift, and Error-guided Budgeted Calibration, which assigns per-head top-p thresholds by minimizing model-output error under a global sparsity budget—together produce a head-wise adaptive sparse attention scheme that consistently improves prior training-free methods and reaches up to 1.93 times speedup at 720p on Wan2.1-1.3B and Wan2.1-14B models while preserving competitive video quality and similarity metrics.
What carries the argument
Head-wise adaptive sparse attention framework using Temporal Mask Reuse to skip unnecessary mask predictions on low query-key drift and Error-guided Budgeted Calibration to set per-head sparsity thresholds that minimize measured model-output error under a fixed global budget.
Load-bearing premise
That measured model-output error under a global sparsity budget reliably predicts perceptual video quality across heads and that temporal query-key drift stays stable enough for safe mask reuse without visible artifacts.
What would settle it
Running the method on Wan2.1 models at 720p and observing either a clear drop in perceptual video quality metrics such as FVD or visible temporal flickering and artifacts traceable to reused masks, or failing to realize the reported wall-clock speedup on standard inference hardware.
Figures
read the original abstract
Diffusion-based video generation has advanced substantially in visual fidelity and temporal coherence, but practical deployment remains limited by the quadratic complexity of full attention. Training-free sparse attention is attractive because it accelerates pretrained models without retraining, yet existing online top-$p$ sparse attention still spends non-negligible cost on mask prediction and applies shared thresholds despite strong head-level heterogeneity. We show that these two overlooked factors limit the practical speed-quality trade-off of training-free sparse attention in Video DiTs. To address them, we introduce a head-wise adaptive framework with two plug-in components: Temporal Mask Reuse, which skips unnecessary mask prediction based on query-key drift, and Error-guided Budgeted Calibration, which assigns per-head top-$p$ thresholds by minimizing measured model-output error under a global sparsity budget. On Wan2.1-1.3B and Wan2.1-14B, our method consistently improves XAttention and SVG2, achieving up to 1.93 times speedup at 720P while maintaining competitive video quality and similarity metrics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces HASTE, a training-free head-wise adaptive sparse attention framework for accelerating pretrained Video DiTs. It adds two plug-in components—Temporal Mask Reuse (skipping mask prediction via query-key drift) and Error-guided Budgeted Calibration (per-head top-p thresholds chosen to minimize measured model-output error under a global sparsity budget)—and reports that the method improves XAttention and SVG2 on Wan2.1-1.3B and Wan2.1-14B, reaching up to 1.93× speedup at 720P while preserving competitive video quality and similarity metrics.
Significance. If the speedup and quality claims are substantiated with rigorous ablations and perceptual validation, the work would offer a practical, training-free acceleration path for large video generation models. The explicit handling of head-level heterogeneity and the reuse of masks address two real bottlenecks in online sparse attention for DiTs; the training-free nature is a clear strength.
major comments (3)
- [Experimental results (abstract and §4)] The central empirical claim (up to 1.93× speedup at 720P with competitive quality) rests on the Error-guided Budgeted Calibration, yet the manuscript provides no error bars, exact per-metric tables, or ablation isolating the calibration objective from the global sparsity budget. Without these, it is impossible to verify that the reported gains are statistically reliable or that the per-head thresholds actually improve the speed-quality frontier over uniform top-p baselines.
- [Error-guided Budgeted Calibration] The calibration minimizes a scalar model-output error (latent-space L2 or equivalent) subject to the global sparsity budget. This proxy is not shown to correlate with perceptual video quality, especially temporal coherence and motion artifacts; latent error frequently decouples from visible flickering once temporal attention patterns shift across denoising steps. The paper must demonstrate that the chosen thresholds preserve human-visible quality (e.g., via FVD, user studies, or temporal stability metrics) rather than only frame-wise similarity scores.
- [Temporal Mask Reuse] Temporal Mask Reuse assumes query-key drift remains small enough for safe mask reuse across timesteps. Drift magnitude typically increases as noise decreases and semantic structure appears; if this assumption fails for later denoising steps, visible artifacts can appear that are invisible to the calibration objective. The manuscript should quantify drift statistics and show that reuse does not degrade temporal consistency on the evaluated models.
minor comments (2)
- [Abstract] The abstract states “competitive video quality and similarity metrics” without naming the concrete metrics (FVD, CLIP-T, etc.) or reporting numerical values; this should be clarified in the abstract and results section.
- [Method overview] Notation for the global sparsity budget and per-head top-p thresholds should be introduced with a single consistent symbol set and a small illustrative diagram showing how the budget is allocated across heads.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will incorporate the suggested analyses and ablations in the revised manuscript to strengthen the empirical validation.
read point-by-point responses
-
Referee: [Experimental results (abstract and §4)] The central empirical claim (up to 1.93× speedup at 720P with competitive quality) rests on the Error-guided Budgeted Calibration, yet the manuscript provides no error bars, exact per-metric tables, or ablation isolating the calibration objective from the global sparsity budget. Without these, it is impossible to verify that the reported gains are statistically reliable or that the per-head thresholds actually improve the speed-quality frontier over uniform top-p baselines.
Authors: We agree that error bars, full tables, and isolating ablations are needed for rigor. In revision we will add standard deviations over 3 random seeds for all reported metrics, include complete per-metric tables, and provide a dedicated ablation comparing Error-guided Budgeted Calibration against uniform top-p under identical global sparsity budgets. This will isolate the per-head adaptation benefit and confirm statistical reliability of the 1.93× speedup. revision: yes
-
Referee: [Error-guided Budgeted Calibration] The calibration minimizes a scalar model-output error (latent-space L2 or equivalent) subject to the global sparsity budget. This proxy is not shown to correlate with perceptual video quality, especially temporal coherence and motion artifacts; latent error frequently decouples from visible flickering once temporal attention patterns shift across denoising steps. The paper must demonstrate that the chosen thresholds preserve human-visible quality (e.g., via FVD, user studies, or temporal stability metrics) rather than only frame-wise similarity scores.
Authors: We acknowledge that latent L2 is a proxy and will expand evaluation in revision by reporting Fréchet Video Distance (FVD) and temporal stability metrics (e.g., frame-to-frame difference variance). We will also add discussion of observed correlation between the calibration objective and these perceptual metrics on Wan2.1. While full user studies exceed current scope, we will include additional qualitative temporal-coherence examples and note that our similarity metrics already remain competitive; the revision will prioritize the requested quantitative perceptual metrics. revision: partial
-
Referee: [Temporal Mask Reuse] Temporal Mask Reuse assumes query-key drift remains small enough for safe mask reuse across timesteps. Drift magnitude typically increases as noise decreases and semantic structure appears; if this assumption fails for later denoising steps, visible artifacts can appear that are invisible to the calibration objective. The manuscript should quantify drift statistics and show that reuse does not degrade temporal consistency on the evaluated models.
Authors: We will add a new subsection quantifying query-key drift magnitude (L2 distance between consecutive Q/K) across all denoising timesteps for both Wan2.1-1.3B and 14B. We will also report temporal consistency metrics (e.g., temporal PSNR and motion artifact scores) with and without mask reuse to demonstrate that reuse preserves coherence. Our internal checks show drift remains below the threshold that triggers visible artifacts, but the explicit statistics will directly address the concern. revision: yes
Circularity Check
No circularity; empirical calibration and reuse rules are independent of claimed outputs
full rationale
The paper introduces two plug-in components (Temporal Mask Reuse based on query-key drift and Error-guided Budgeted Calibration that selects per-head top-p thresholds by minimizing measured model-output error under a global sparsity budget). These are procedural heuristics whose parameters are set by direct measurement on the target model rather than by any derivation that reduces the reported speedup or quality metrics to the inputs by construction. No equations are presented that equate a 'prediction' to a fitted quantity; the central claims are empirical speedups (up to 1.93× at 720P) validated on Wan2.1 models with competitive similarity metrics. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear in the provided text. The method is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- global sparsity budget
axioms (1)
- domain assumption Head-level heterogeneity in attention patterns is stable enough to allow adaptive thresholds that preserve output quality
Reference graph
Works this paper leans on
-
[1]
Dicache: Let diffusion model determine its own cache.arXiv preprint arXiv:2508.17356, 2025
Jiazi Bu, Pengyang Ling, Yujie Zhou, Yibin Wang, Yuhang Zang, Dahua Lin, and Jiaqi Wang. Dicache: Let diffusion model determine its own cache.arXiv preprint arXiv:2508.17356, 2025
-
[2]
Aiyue Chen, Bin Dong, Jingru Li, Jing Lin, Kun Tian, Yiwu Yao, and Gongyi Wang. RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy, June 2025. URLhttp://arxiv.org/ abs/2505.21036. arXiv:2505.21036 [cs]
-
[3]
RainFusion2.0: Temporal-Spatial Awareness and Hardware-Efficient Block-wise Sparse Attention
Aiyue Chen, Yaofu Liu, Junjian Huang, Guang Lian, Yiwu Yao, Wangli Lan, Jing Lin, Zhixin Ma, Tingting Zhou, and Harry Yang. RainFusion2.0: Temporal-Spatial Awareness and Hardware-Efficient Block-wise Sparse Attention, December 2025. URLhttp://arxiv.org/abs/2512.24086. arXiv:2512.24086 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[4]
Pengtao Chen, Xianfang Zeng, Maosen Zhao, Peng Ye, Mingzhu Shen, Wei Cheng, Gang Yu, and Tao Chen. Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers, June 2025. URLhttp://arxiv.org/abs/2506.03065. arXiv:2506.03065 [cs]
-
[5]
Liang Feng, Shikang Zheng, Jiacheng Liu, Yuqi Lin, Qinming Zhou, Peiliang Cai, Xinyu Wang, Junjie Chen, Chang Zou, Yue Ma, et al. Hicache: Training-free acceleration of diffusion models via hermite polynomial-based feature caching.arXiv preprint arXiv:2508.16984, 2025
-
[6]
One Step Diffusion via Shortcut Models
Kevin Frans, Danijar Hafner, Sergey Levine, and Pieter Abbeel. One step diffusion via shortcut models.arXiv preprint arXiv:2410.12557, 2024
work page internal anchor Pith review arXiv 2024
-
[7]
BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation, September 2025
Youping Gu, Xiaolong Li, Yuhao Hu, Minqi Chen, and Bohan Zhuang. BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation, September 2025. URLhttp://arxiv.org/abs/2508.10774 . arXiv:2508.10774 [cs]
-
[8]
Junxian Guo, Haotian Tang, Shang Yang, Zhekai Zhang, Zhijian Liu, and Song Han. Block Sparse Attention. https://github.com/mit-han-lab/Block-Sparse-Attention, 2024
work page 2024
-
[9]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020
work page 2020
-
[10]
Vbench: Comprehensive benchmark suite for video generative models
Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, et al. Vbench: Comprehensive benchmark suite for video generative models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21807–21818, 2024
work page 2024
-
[11]
HunyuanVideo: A Systematic Framework For Large Video Generative Models
Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, Kathrina Wu, Qin Lin, Junkun Yuan, Yanxin Long, Aladdin Wang, Andong Wang, Changlin Li, Duojun Huang, Fang Yang, Hao Tan, Hongmei Wang, Jacob Song, Jiawang Bai, Jianbing Wu, Jinbao Xue, Joey Wang, Kai Wang, Mengyang Liu, Pengyu Li, Shuai Li, ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[12]
Muyang Li, Yujun Lin, Zhekai Zhang, Tianle Cai, Xiuyu Li, Junxian Guo, Enze Xie, Chenlin Meng, Jun-Yan Zhu, and Song Han. Svdquant: Absorbing outliers by low-rank components for 4-bit diffusion models.arXiv preprint arXiv:2411.05007, 2024
-
[13]
Qirui Li, Guangcong Zheng, Qi Zhao, Jie Li, Bin Dong, Yiwu Yao, and Xi Li. Compact Attention: Exploiting Structured Spatio-Temporal Sparsity for Fast Video Generation, August 2025. URLhttp://arxiv.org/abs/25 08.12969. arXiv:2508.12969 [cs]
-
[14]
Xingyang Li, Muyang Li, Tianle Cai, Haocheng Xi, Shuo Yang, Yujun Lin, Lvmin Zhang, Songlin Yang, Jinbo Hu, Kelly Peng, Maneesh Agrawala, Ion Stoica, Kurt Keutzer, and Song Han. Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation, December 2025. URLhttp://arxiv.org/abs/ 2506.19852. arXiv:2506.19852 [cs]
-
[15]
Flow Matching for Generative Modeling
Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022. 17 MAC-AutoML
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[16]
Timestep embedding tells: It’s time to cache for video diffusion model
Feng Liu, Shiwei Zhang, Xiaofeng Wang, Yujie Wei, Haonan Qiu, Yuzhong Zhao, Yingya Zhang, Qixiang Ye, and Fang Wan. Timestep embedding tells: It’s time to cache for video diffusion model. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 7353–7363, 2025
work page 2025
-
[17]
Jiacheng Liu, Peiliang Cai, Qinming Zhou, Yuqi Lin, Deyang Kong, Benhao Huang, Yupei Pan, Haowen Xu, Chang Zou, Junshu Tang, et al. Freqca: Accelerating diffusion models via frequency-aware caching.arXiv preprint arXiv:2510.08669, 2025
-
[18]
From reusing to forecasting: Accelerating diffusion models with taylorseers
Jiacheng Liu, Chang Zou, Yuanhuiyi Lyu, Junjie Chen, and Linfeng Zhang. From reusing to forecasting: Accelerating diffusion models with taylorseers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 15853–15863, 2025
work page 2025
-
[20]
Jiayi Luo, Jiayu Chen, Jiankun Wang, Cong Wang, Hanxin Zhu, Qingyun Sun, Chen Gao, Zhibo Chen, and Jianxin Li. Training-Free Sparse Attention for Fast Video Generation via Offline Layer-Wise Sparsity Profiling and Online Bidirectional Co-Clustering, March 2026. URLhttp://arxiv.org/abs/2603.18636. arXiv:2603.18636 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[21]
Guoqing Ma, Haoyang Huang, Kun Yan, Liangyu Chen, Nan Duan, Shengming Yin, Changyi Wan, Ranchen Ming, Xiaoniu Song, Xing Chen, Yu Zhou, Deshan Sun, Deyu Zhou, Jian Zhou, Kaijun Tan, Kang An, Mei Chen, Wei Ji, Qiling Wu, Wen Sun, Xin Han, Yanan Wei, Zheng Ge, Aojie Li, Bin Wang, Bizhu Huang, Bo Wang, Brian Li, Changxing Miao, Chen Xu, Chenfei Wu, Chenguang...
-
[22]
Latte: Latent Diffusion Transformer for Video Generation
Xin Ma, Yaohui Wang, Xinyuan Chen, Gengyun Jia, Ziwei Liu, Yuan-Fang Li, Cunjian Chen, and Yu Qiao. Latte: Latent diffusion transformer for video generation.arXiv preprint arXiv:2401.03048, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[23]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023
work page 2023
-
[24]
DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance, May
Xuan Shen, Chenxia Han, Yufa Zhou, Yanyue Xie, Yifan Gong, Quanyi Wang, Yiwei Wang, Yanzhi Wang, Pu Zhao, and Jiuxiang Gu. DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance, May
-
[25]
Draftattention: Fast video diffusion via low-resolution attention guidance
URLhttp://arxiv.org/abs/2505.14708. arXiv:2505.14708 [cs]
-
[26]
LiteAttention: A Temporal Sparse Attention for Diffusion Transformers, November 2025
Dor Shmilovich, Tony Wu, Aviad Dahan, and Yuval Domb. LiteAttention: A Temporal Sparse Attention for Diffusion Transformers, November 2025. URLhttp://arxiv.org/abs/2511.11062. arXiv:2511.11062 [cs]
-
[27]
Denoising Diffusion Implicit Models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[28]
Wenhao Sun, Rong-Cheng Tu, Jingyi Liao, Zhao Jin, and Dacheng Tao. AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration, December 2024. URLhttp://arxiv.org/abs/2412.1
work page 2024
- [29]
-
[30]
Wan: Open and Advanced Large-Scale Video Generative Models
Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, Jianyuan Zeng, Jiayu Wang, Jingfeng Zhang, Jingren Zhou, Jinkai Wang, Jixuan Chen, Kai Zhu, Kang Zhao, Keyu Yan, Lianghua Huang, Mengyang Feng, Ningyi Zhang, Pandeng Li, Pingyu Wu, Ruihang Chu, Ruili Feng, Shiwei Zhang, Siyang Sun, Tao Fang, T...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[31]
Objective video quality assessment
Zhou Wang, Hamid R Sheikh, Alan C Bovik, et al. Objective video quality assessment. InThe handbook of video databases: design and applications, volume 41, pages 1041–1078. CRC press Boca Raton, 2003
work page 2003
-
[32]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004
work page 2004
-
[33]
Vmoba: Mixture-of-block attention for video diffusion models.arXiv preprint arXiv:2506.23858,
Jianzong Wu, Liang Hou, Haotian Yang, Xin Tao, Ye Tian, Pengfei Wan, Di Zhang, and Yunhai Tong. Vmoba: Mixture-of-block attention for video diffusion models.arXiv preprint arXiv:2506.23858, 2025
-
[34]
USV: Unified Sparsification for Accelerating Video Diffusion Models, December 2025
Xinjian Wu, Hongmei Wang, Yuan Zhou, and Qinglin Lu. USV: Unified Sparsification for Accelerating Video Diffusion Models, December 2025. URLhttp://arxiv.org/abs/2512.05754. arXiv:2512.05754 [cs]
-
[35]
Haocheng Xi, Shuo Yang, Yilong Zhao, Chenfeng Xu, Muyang Li, Xiuyu Li, Yujun Lin, Han Cai, Jintao Zhang, Dacheng Li, Jianfei Chen, Ion Stoica, Kurt Keutzer, and Song Han. Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity, April 2025. URLhttp://arxiv.org/abs/2502.01776. arXiv:2502.01776 [cs]
-
[36]
Yifei Xia, Suhan Ling, Fangcheng Fu, Yujie Wang, Huixia Li, Xuefeng Xiao, and Bin Cui. Training-free and Adaptive Sparse Attention for Efficient Long Video Generation, February 2025. URLhttp://arxiv.org/abs/25 02.21079. arXiv:2502.21079 [cs]
-
[37]
Xattention: Block sparse attention with antidiagonal scoring.arXiv preprint arXiv:2503.16428,
Ruyi Xu, Guangxuan Xiao, Haofeng Huang, Junxian Guo, and Song Han. XAttention: Block Sparse Attention with Antidiagonal Scoring, March 2025. URLhttp://arxiv.org/abs/2503.16428. arXiv:2503.16428 [cs]
-
[38]
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation
Shuo Yang, Haocheng Xi, Yilong Zhao, Muyang Li, Jintao Zhang, Han Cai, Yujun Lin, Xiuyu Li, Chenfeng Xu, Kelly Peng, Jianfei Chen, Song Han, Kurt Keutzer, and Ion Stoica. Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation, October 2025. URLhttp://arxiv.org/abs/2505.18875. arXiv:2505.18875 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[39]
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Yuxuan Zhang, Weihan Wang, Yean Cheng, Bin Xu, Xiaotao Gu, Yuxiao Dong, and Jie Tang. CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer, March 2025. URL http://arxiv.org/abs/2408.06072. arXiv:2408.06072 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[40]
Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, and William T Freeman. Improved distribution matching distillation for fast image synthesis.Advances in neural information processing systems, 37:47455–47487, 2024
work page 2024
-
[41]
One-step diffusion with distribution matching distillation
Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6613–6623, 2024
work page 2024
-
[42]
Bidirectional sparse attention for faster video diffusion training.arXiv preprint arXiv:2509.01085,
Chenlu Zhan, Wen Li, Chuyu Shen, Jun Zhang, Suhui Wu, and Hao Zhang. Bidirectional Sparse Attention for Faster Video Diffusion Training, September 2025. URLhttp://arxiv.org/abs/2509.01085. arXiv:2509.01085 [cs]
-
[43]
Jintao Zhang, Haofeng Huang, Pengle Zhang, Jia Wei, Jun Zhu, and Jianfei Chen. Sageattention2: Efficient attention with thorough outlier smoothing and per-thread int4 quantization.arXiv preprint arXiv:2411.10958, 2024
-
[44]
Jintao Zhang, Jia Wei, Haofeng Huang, Pengle Zhang, Jun Zhu, and Jianfei Chen. Sageattention: Accurate 8-bit attention for plug-and-play inference acceleration.arXiv preprint arXiv:2410.02367, 2024
-
[45]
Gonzalez, Jun Zhu, and Jianfei Chen
Jintao Zhang, Haoxu Wang, Kai Jiang, Shuo Yang, Kaiwen Zheng, Haocheng Xi, Ziteng Wang, Hongzhou Zhu, Min Zhao, Ion Stoica, Joseph E. Gonzalez, Jun Zhu, and Jianfei Chen. SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention, 2025. URLhttps://arxiv.org/abs/2509.24006. Version Number: 2
-
[46]
Spargeattention: Accurate and training-free sparse attention accelerating any model inference
JintaoZhang, ChendongXiang, HaofengHuang, JiaWei, HaochengXi, JunZhu, andJianfeiChen. SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference, October 2025. URLhttp: //arxiv.org/abs/2502.18137. arXiv:2502.18137 [cs]. 19 MAC-AutoML
-
[47]
VSA: Faster Video Diffusion with Trainable Sparse Attention, October 2025
Peiyuan Zhang, Yongqi Chen, Haofeng Huang, Will Lin, Zhengzhong Liu, Ion Stoica, Eric Xing, and Hao Zhang. VSA: Faster Video Diffusion with Trainable Sparse Attention, October 2025. URLhttp://arxiv.org/abs/2505 .13389. arXiv:2505.13389 [cs]
-
[48]
Fast video generation with sliding tile attention.arXiv preprint arXiv:2502.04507, 2025
Peiyuan Zhang, Yongqi Chen, Runlong Su, Hangliang Ding, Ion Stoica, Zhengzhong Liu, and Hao Zhang. Fast video generation with sliding tile attention.arXiv preprint arXiv:2502.04507, 2025
-
[49]
The unreasonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018
work page 2018
-
[50]
Ride the Wave: Precision-Allocated Sparse Attention for Smooth Video Generation
Wentai Zhang, Ronghui Xi, Shiyao Peng, Jiayu Huang, Haoran Luo, Zichen Tang, et al. Ride the wave: Precision-allocated sparse attention for smooth video generation.arXiv preprint arXiv:2604.12219, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[51]
Training-Free Efficient Video Generation via Dynamic Token Carving, November 2025
Yuechen Zhang, Jinbo Xing, Bin Xia, Shaoteng Liu, Bohao Peng, Xin Tao, Pengfei Wan, Eric Lo, and Jiaya Jia. Training-Free Efficient Video Generation via Dynamic Token Carving, November 2025. URLhttp://arxiv.org/ abs/2505.16864. arXiv:2505.16864 [cs]
-
[52]
Tianchen Zhao, Tongcheng Fang, Haofeng Huang, Enshu Liu, Rui Wan, Widyadewi Soedarmadji, Shiyao Li, Zinan Lin, Guohao Dai, Shengen Yan, et al. Vidit-q: Efficient and accurate quantization of diffusion transformers for image and video generation.arXiv preprint arXiv:2406.02540, 2024
-
[53]
Tianchen Zhao, Ke Hong, Xinhao Yang, Xuefeng Xiao, Huixia Li, Feng Ling, Ruiqi Xie, Siqi Chen, Hongyu Zhu, Yichong Zhang, and Yu Wang. PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models, June 2025. URLhttp://arxiv.org/abs/2506.16054. arXiv:2506.16054 [cs]. 20 MAC-AutoML Appendix A Proof of Prop...
-
[54]
cross-head interaction terms within the same layer, as shown in Eq. (37)
-
[55]
higher-order propagation terms represented byRl in Eq. (28)
-
[56]
Therefore, the additive objective is not an exact decomposition of the full network error
cross-layer interactions, since the ILP sums measurements over all layers and heads while treating them as independently selectable operating points. Therefore, the additive objective is not an exact decomposition of the full network error. 24 MAC-AutoML Practical implication for calibration.Our ILP should thus be interpreted as optimizing a measurement-d...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.