pith. sign in

arxiv: 2505.02242 · v2 · submitted 2025-05-04 · 💻 cs.CV

Sampling-Aware Quantization for Diffusion Models

Pith reviewed 2026-05-22 15:55 UTC · model grok-4.3

classification 💻 cs.CV
keywords diffusion modelsquantizationfast samplingtrajectory alignmentprobability flowdenoisingimage generationmodel compression
0
0 comments X

The pith

Quantization noise disrupts directional estimates in diffusion sampling, but mixed-order trajectory alignment restores accurate few-step generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that quantizing the noise-estimation network in diffusion models adds errors that throw off the direction calculated at every denoising step. These errors compound in higher-order samplers that rely on precise numerical solutions to the underlying equations, pushing the generation path away from the intended trajectory through probability space. To achieve both faster model inference and fewer sampling steps without retraining, the authors introduce a sampling-aware quantization method built around Mixed-Order Trajectory Alignment. This technique applies tighter error limits at each step to keep the probability flow more linear and thereby preserve the fast convergence of advanced samplers. Experiments across datasets confirm that image quality stays high even when both the model and the sampler are accelerated.

Core claim

We uncover that quantization-induced noise disrupts directional estimation at each sampling step, further distorting the precise directional estimations of higher-order samplers when solving the sampling equations through discretized numerical methods, thereby altering the optimal sampling trajectory. To attain dual acceleration with high fidelity, we propose a sampling-aware quantization strategy, wherein a Mixed-Order Trajectory Alignment technique is devised to impose a more stringent constraint on the error bounds at each sampling step, facilitating a more linear probability flow.

What carries the argument

Mixed-Order Trajectory Alignment technique that imposes a more stringent constraint on the error bounds at each sampling step to facilitate a more linear probability flow.

Load-bearing premise

Tightening error bounds via Mixed-Order Trajectory Alignment will produce a sufficiently linear probability flow and preserve high-order sampler convergence properties without introducing new distortions or requiring model retraining.

What would settle it

Measure whether the sampled trajectory after alignment stays within the claimed error bounds and whether few-step generated images match full-precision quality; significant deviation or quality drop would falsify the central claim.

Figures

Figures reproduced from arXiv: 2505.02242 by Huiqiong Wang, Jie Song, Mingli Song, Qian Zeng, Yuanyu Wan.

Figure 1
Figure 1. Figure 1: Comparison of generated samples on the ImageNet [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Direction estimation in reverse diffusion sampling. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Sampling-aware quantization workflow. (a) Module-level reconstruction process employed in SA-PTQ, where [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of the generative performance of our SA-QLoRA under [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Latent space feature trajectories of LDM4 under 20-step sampling on the ImageNet 256 [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of generative performance on the ImageNet 256 [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of generative performance on the ImageNet 256 [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of generative performance between the full-precision LDM8 and its W4A8 quantized counterpart, utilizing our [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Generative performance comparison of W4A8 quantized LDM8 models, employing PTQD and our proposed SA-QLoRA, on [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Generative performance comparison of W4A4 quantized LDM8 models, employing Q-diffusion and our proposed SA-QLoRA, [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of generative performance between the full-precision LDM4 and its W4A8 quantized counterpart, utilizing our [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Generation performance of our SA-QLoRA under W8A8 quantization. [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗
read the original abstract

Diffusion models have recently emerged as the dominant approach in visual generation tasks. However, the lengthy denoising chains and the computationally intensive noise estimation networks hinder their applicability in low-latency and resource-limited environments. Previous research has endeavored to address these limitations in a decoupled manner, utilizing either advanced samplers or efficient model quantization techniques. In this study, we uncover that quantization-induced noise disrupts directional estimation at each sampling step, further distorting the precise directional estimations of higher-order samplers when solving the sampling equations through discretized numerical methods, thereby altering the optimal sampling trajectory. To attain dual acceleration with high fidelity, we propose a sampling-aware quantization strategy, wherein a Mixed-Order Trajectory Alignment technique is devised to impose a more stringent constraint on the error bounds at each sampling step, facilitating a more linear probability flow. Extensive experiments on sparse-step fast sampling across multiple datasets demonstrate that our approach preserves the rapid convergence characteristics of high-speed samplers while maintaining superior generation quality. Code is publicly available at: https://github.com/TaylorJocelyn/Sampling-aware-Quantization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that quantization noise in diffusion models disrupts per-step directional estimates and distorts higher-order samplers' trajectories, and that a sampling-aware quantization method using Mixed-Order Trajectory Alignment can tighten per-step error bounds to restore a more linear probability flow. This enables simultaneous acceleration via fast sampling and quantization while preserving generation quality, without model retraining. The claim is supported by experiments on sparse-step sampling across multiple datasets, with code released publicly.

Significance. If the alignment technique demonstrably preserves the convergence order of high-order samplers under quantization, the work would be significant for practical low-latency deployment of diffusion models. The public code release strengthens reproducibility and allows direct verification of the empirical results.

major comments (2)
  1. [§3.2] §3.2 (Mixed-Order Trajectory Alignment): the manuscript provides no derivation or local truncation error analysis showing that the alignment operator preserves the order of accuracy of the underlying high-order sampler (e.g., second- or third-order) once quantization noise is present. Without this, the central claim that stricter per-step bounds yield a sufficiently linear flow while retaining rapid convergence remains unproven.
  2. [§4.1] §4.1 and Eq. (7): the error-bound constraint introduced by the alignment is stated to be 'more stringent,' yet no quantitative comparison is given to the original sampler's truncation error or to the quantization-induced perturbation term; this makes it impossible to verify that the net trajectory deviation remains controlled.
minor comments (2)
  1. [§3.2] Notation for the alignment operator is introduced without a clear distinction from the standard denoising step; a single-line definition or pseudocode would improve readability.
  2. [Figure 3] Figure 3 caption does not specify the exact sampler order and quantization bit-width used in each curve, making direct comparison to the text claims difficult.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below, providing clarifications and committing to revisions that strengthen the theoretical grounding of our claims without altering the core contributions of the work.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Mixed-Order Trajectory Alignment): the manuscript provides no derivation or local truncation error analysis showing that the alignment operator preserves the order of accuracy of the underlying high-order sampler (e.g., second- or third-order) once quantization noise is present. Without this, the central claim that stricter per-step bounds yield a sufficiently linear flow while retaining rapid convergence remains unproven.

    Authors: We acknowledge that a formal local truncation error analysis would provide stronger theoretical justification for the claim that the alignment preserves sampler order under quantization. Our current arguments rest on the design of the Mixed-Order Trajectory Alignment to enforce tighter per-step bounds that counteract quantization-induced directional errors, supported by extensive empirical results across sparse-step regimes. In the revised manuscript we will add a dedicated derivation in §3.2 that analyzes the local truncation error of the aligned update, showing that the leading-order terms of the original high-order method remain dominant when the quantization perturbation is constrained by the alignment operator. revision: yes

  2. Referee: [§4.1] §4.1 and Eq. (7): the error-bound constraint introduced by the alignment is stated to be 'more stringent,' yet no quantitative comparison is given to the original sampler's truncation error or to the quantization-induced perturbation term; this makes it impossible to verify that the net trajectory deviation remains controlled.

    Authors: We agree that an explicit quantitative comparison of the error terms would improve verifiability. We will revise §4.1 to include both analytical expressions and numerical evaluations comparing (i) the truncation error of the unquantized high-order sampler, (ii) the additional perturbation introduced by quantization, and (iii) the tightened bound enforced by Mixed-Order Trajectory Alignment. These comparisons will be presented alongside the existing experimental results to demonstrate that the net trajectory deviation stays within acceptable limits for the reported sampling budgets. revision: yes

Circularity Check

0 steps flagged

No circularity: new alignment technique presented without reduction to inputs

full rationale

The paper introduces quantization-induced disruption of directional estimation as an observed issue and proposes Mixed-Order Trajectory Alignment as a new technique to tighten per-step error bounds for linear probability flow. No equations, derivations, or self-citations appear in the provided abstract that would equate the claimed preservation of high-order sampler convergence to a fitted parameter, self-definition, or prior result by construction. The central strategy is framed as an independent constraint rather than a re-expression of existing quantities, rendering the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are named. The approach implicitly assumes standard diffusion sampling equations and quantization error models from prior literature.

pith-pipeline@v0.9.0 · 5716 in / 1152 out tokens · 35149 ms · 2026-05-22T15:55:54.903911+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 7 internal anchors

  1. [1]

    Generative adversarial network: An overview of theory and applications.International Journal of Information Manage- ment Data Insights, 1(1):100004, 2021

    Alankrita Aggarwal, Mamta Mittal, and Gopi Battineni. Generative adversarial network: An overview of theory and applications.International Journal of Information Manage- ment Data Insights, 1(1):100004, 2021. 1

  2. [2]

    Analytic-dpm: an an- alytic estimate of the optimal reverse variance in diffusion probabilistic models

    Fan Bao, Chongxuan Li, Jun Zhu, and Bo Zhang. Analytic- dpm: an analytic estimate of the optimal reverse vari- ance in diffusion probabilistic models.arXiv preprint arXiv:2201.06503, 2022. 3

  3. [3]

    Modeling temporal data as continuous functions with process diffusion

    Marin Bilo ˇs, Kashif Rasul, Anderson Schneider, Yuriy Nevmyvaka, and Stephan G ¨unnemann. Modeling temporal data as continuous functions with process diffusion. 2022. 1

  4. [4]

    Ld-pruner: Efficient pruning of latent dif- fusion models using task-agnostic insights.arXiv preprint arXiv:2404.11936, 2024

    Thibault Castells, Hyoung-Kyu Song, Bo-Kyeong Kim, and Shinkook Choi. Ld-pruner: Efficient pruning of latent dif- fusion models using task-agnostic insights.arXiv preprint arXiv:2404.11936, 2024. 2

  5. [5]

    dparallel: Learnable parallel decoding for dllms

    Zigeng Chen, Gongfan Fang, Xinyin Ma, Ruonan Yu, and Xinchao Wang. dparallel: Learnable parallel decoding for dllms. InInternational Conference on Learning Representa- tions, 2026. 3

  6. [6]

    Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021

    Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021. 6

  7. [7]

    Towards accurate post- training quantization for vision transformer

    Yifu Ding, Haotong Qin, Qinghua Yan, Zhenhua Chai, Junjie Liu, Xiaolin Wei, and Xianglong Liu. Towards accurate post- training quantization for vision transformer. InProceedings of the 30th ACM international conference on multimedia, pages 5380–5388, 2022. 2

  8. [8]

    Score- based generative modeling with critically-damped langevin diffusion.arXiv preprint arXiv:2112.07068, 2021

    Tim Dockhorn, Arash Vahdat, and Karsten Kreis. Score- based generative modeling with critically-damped langevin diffusion.arXiv preprint arXiv:2112.07068, 2021. 1

  9. [9]

    Structural pruning for diffusion models.Advances in neural informa- tion processing systems, 36, 2024

    Gongfan Fang, Xinyin Ma, and Xinchao Wang. Structural pruning for diffusion models.Advances in neural informa- tion processing systems, 36, 2024. 2

  10. [10]

    Efficientdm: Efficient quantization-aware fine-tuning of low-bit diffusion models.arXiv preprint arXiv:2310.03270, 2023

    Yefei He, Jing Liu, Weijia Wu, Hong Zhou, and Bo- han Zhuang. Efficientdm: Efficient quantization-aware fine-tuning of low-bit diffusion models.arXiv preprint arXiv:2310.03270, 2023. 1, 2, 3

  11. [11]

    Ptqd: Accurate post-training quantization for diffusion models.Advances in Neural Information Pro- cessing Systems, 36, 2024

    Yefei He, Luping Liu, Jing Liu, Weijia Wu, Hong Zhou, and Bohan Zhuang. Ptqd: Accurate post-training quantization for diffusion models.Advances in Neural Information Pro- cessing Systems, 36, 2024. 3, 5, 7

  12. [12]

    Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 1, 3

  13. [13]

    Explicit expo- nential runge–kutta methods for semilinear parabolic prob- lems.SIAM Journal on Numerical Analysis, 43(3):1069– 1090, 2005

    Marlis Hochbruck and Alexander Ostermann. Explicit expo- nential runge–kutta methods for semilinear parabolic prob- lems.SIAM Journal on Numerical Analysis, 43(3):1069– 1090, 2005. 1

  14. [14]

    Exponential integrators.Acta Numerica, 19:209–286, 2010

    Marlis Hochbruck and Alexander Ostermann. Exponential integrators.Acta Numerica, 19:209–286, 2010. 1

  15. [15]

    Variational diffusion models.Advances in neural infor- mation processing systems, 34:21696–21707, 2021

    Diederik Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational diffusion models.Advances in neural infor- mation processing systems, 34:21696–21707, 2021. 1, 2

  16. [16]

    On fast sampling of diffusion probabilistic models,

    Zhifeng Kong and Wei Ping. On fast sampling of diffu- sion probabilistic models.arXiv preprint arXiv:2106.00132,

  17. [17]

    Proteinsgm: Score-based generative modeling for de novo protein design

    Jin Sub Lee, Jisun Kim, and Philip M Kim. Proteinsgm: Score-based generative modeling for de novo protein design. bioRxiv, pages 2022–07, 2022. 1

  18. [18]

    Q-diffusion: Quantizing diffusion models

    Xiuyu Li, Yijiang Liu, Long Lian, Huanrui Yang, Zhen Dong, Daniel Kang, Shanghang Zhang, and Kurt Keutzer. Q-diffusion: Quantizing diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 17535–17545, 2023. 1, 2, 3, 5, 7

  19. [19]

    Brecq: Pushing the limit of post-training quantization by block reconstruction.arXiv preprint arXiv:2102.05426,

    Yuhang Li, Ruihao Gong, Xu Tan, Yang Yang, Peng Hu, Qi Zhang, Fengwei Yu, Wei Wang, and Shi Gu. Brecq: Pushing the limit of post-training quantization by block reconstruc- tion.arXiv preprint arXiv:2102.05426, 2021. 5

  20. [20]

    Awq: Activation-aware weight quantization for on-device llm compression and acceleration

    Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. Awq: Activation-aware weight quantization for on-device llm compression and acceleration. Proceedings of Machine Learning and Systems, 6:87–100,

  21. [21]

    Microsoft coco: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13, pages 740–755. Springer, 2014. 6

  22. [22]

    Pseudo numerical methods for diffusion models on manifolds

    Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. Pseudo numerical methods for diffusion models on manifolds.arXiv preprint arXiv:2202.09778, 2022. 1, 6, 2

  23. [23]

    Llm-qat: Data-free quantization aware training for large language models

    Zechun Liu, Barlas Oguz, Changsheng Zhao, Ernie Chang, Pierre Stock, Yashar Mehdad, Yangyang Shi, Raghuraman Krishnamoorthi, and Vikas Chandra. Llm-qat: Data-free quantization aware training for large language models.arXiv preprint arXiv:2305.17888, 2023. 2

  24. [24]

    A fast ode solver for diffusion probabilistic model sampling in around 10 steps

    C Lu, Y Zhou, F Bao, J Chen, and C Li. A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Proc. Adv. Neural Inf. Process. Syst., New Orleans, United States, pages 1–31, 2022. 1

  25. [25]

    Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps.Advances in Neural Information Processing Systems, 35:5775–5787,

    Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps.Advances in Neural Information Processing Systems, 35:5775–5787,

  26. [26]

    DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models

    Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongx- uan Li, and Jun Zhu. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models.arXiv preprint arXiv:2211.01095, 2022. 1, 2

  27. [27]

    Repaint: Inpainting using denoising diffusion probabilistic models

    Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. Repaint: Inpainting using denoising diffusion probabilistic models. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11461–11471, 2022. 1

  28. [28]

    A White Paper on Neural Network Quantization

    Markus Nagel, Marios Fournarakis, Rana Ali Amjad, Yely- sei Bondarenko, Mart van Baalen, and Tijmen Blankevoort. A white paper on neural network quantization. arxiv 2021. arXiv preprint arXiv:2106.08295. 2

  29. [29]

    Up or down? adap- tive rounding for post-training quantization

    Markus Nagel, Rana Ali Amjad, Mart Van Baalen, Chris- tos Louizos, and Tijmen Blankevoort. Up or down? adap- tive rounding for post-training quantization. InInternational Conference on Machine Learning, pages 7197–7206. PMLR,

  30. [30]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 6

  31. [31]

    Progressive Distillation for Fast Sampling of Diffusion Models

    Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models.arXiv preprint arXiv:2202.00512, 2022. 2

  32. [32]

    Post-training quantization on diffusion models

    Yuzhang Shang, Zhihang Yuan, Bin Xie, Bingzhe Wu, and Yan Yan. Post-training quantization on diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 1972–1981, 2023. 1, 2, 3, 5, 7

  33. [33]

    Make-A-Video: Text-to-Video Generation without Text-Video Data

    Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, et al. Make-a-video: Text-to-video generation without text-video data.arXiv preprint arXiv:2209.14792,

  34. [34]

    Temporal dynamic quantization for dif- fusion models.Advances in Neural Information Processing Systems, 36, 2024

    Junhyuk So, Jungwon Lee, Daehyun Ahn, Hyungjun Kim, and Eunhyeok Park. Temporal dynamic quantization for dif- fusion models.Advances in Neural Information Processing Systems, 36, 2024. 3

  35. [35]

    Deep unsupervised learning using nonequilibrium thermodynamics

    Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InInternational confer- ence on machine learning, pages 2256–2265. PMLR, 2015. 1

  36. [36]

    Denoising Diffusion Implicit Models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020. 1, 6, 2

  37. [37]

    Generative modeling by esti- mating gradients of the data distribution.Advances in neural information processing systems, 32, 2019

    Yang Song and Stefano Ermon. Generative modeling by esti- mating gradients of the data distribution.Advances in neural information processing systems, 32, 2019. 1

  38. [38]

    Score-Based Generative Modeling through Stochastic Differential Equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equa- tions.arXiv preprint arXiv:2011.13456, 2020. 1

  39. [39]

    Sinsr: Diffusion-based image super- resolution in a single step.arXiv preprint arXiv:2311.14760,

    Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C Kot, and Bihan Wen. Sinsr: Diffusion-based image super- resolution in a single step.arXiv preprint arXiv:2311.14760,

  40. [40]

    Sparsed: Sparse attention for diffusion lan- guage models

    Zeqing Wang, Gongfan Fang, Xinyin Ma, Xingyi Yang, and Xinchao Wang. Sparsed: Sparse attention for diffusion lan- guage models. InInternational Conference on Learning Representations, 2026. 3

  41. [41]

    Restart sampling for im- proving generative processes.Advances in Neural Informa- tion Processing Systems, 36:76806–76838, 2023

    Yilun Xu, Mingyang Deng, Xiang Cheng, Yonglong Tian, Ziming Liu, and Tommi Jaakkola. Restart sampling for im- proving generative processes.Advances in Neural Informa- tion Processing Systems, 36:76806–76838, 2023. 2

  42. [42]

    LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

    Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop.arXiv preprint arXiv:1506.03365, 2015. 6

  43. [43]

    Inversion-based style transfer with diffusion models

    Yuxin Zhang, Nisha Huang, Fan Tang, Haibin Huang, Chongyang Ma, Weiming Dong, and Changsheng Xu. Inversion-based style transfer with diffusion models. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10146–10156, 2023. 1

  44. [44]

    Trajectory consistency distillation.arXiv preprint arXiv:2402.19159, 2024

    Jianbin Zheng, Minghui Hu, Zhongyi Fan, Chaoyue Wang, Changxing Ding, Dacheng Tao, and Tat-Jen Cham. Trajectory consistency distillation.arXiv preprint arXiv:2402.19159, 2024. 2

  45. [45]

    Fast ode-based sampling for diffusion models in around 5 steps

    Zhenyu Zhou, Defang Chen, Can Wang, and Chun Chen. Fast ode-based sampling for diffusion models in around 5 steps. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 7777–7786,

  46. [46]

    Theoretical Analysis A.1

    1, 2 Sampling-Aware Quantization for Diffusion Models Supplementary Material A. Theoretical Analysis A.1. High-Order Approximation via Intermediate Point Evaluations in Numerical Integration For the following ODE [25]: dxt dt =αx t +N(x t, t),(21) whereα∈RandN(x t, t)∈R D is a non-linear function ofx t. Given an initial valuex t at timet, forh >0, the tru...

  47. [47]

    accelerates inference by introducing sparse attention. C. Experimental Details and Results C.1. Quantization Settings. To comprehensively evaluate the proposed sampling-aware quantization framework, we conduct experiments under three quantization configurations:W8A8,W4A8, and W4A4. For theW8A8setting, we assess the performance of the proposed SA-PTQ, whil...

  48. [48]

    The basic QLoRA strategy fixes the sampling steps and aligns the full-precision and quantized outputs at each step of the sampler

    To further enhance the alignment of sparse-step sam- pling trajectories, we design a mixstep progressive LoRA strategy. The basic QLoRA strategy fixes the sampling steps and aligns the full-precision and quantized outputs at each step of the sampler. In contrast, the mixstep progres- sive LoRA strategy iterates over a list of sampling steps set to steps =...