RobuQ: Pushing DiTs to W1.58A2 via Robust Activation Quantization

Haotong Qin; Kaicheng Yang; Kaisen Yang; Xianglong Yan; Xun Zhang; Yucheng Lin; Yulun Zhang

arxiv: 2509.23582 · v2 · pith:4YPHQ4TYnew · submitted 2025-09-28 · 💻 cs.CV

RobuQ: Pushing DiTs to W1.58A2 via Robust Activation Quantization

Kaicheng Yang , Xun Zhang , Haotong Qin , Yucheng Lin , Kaisen Yang , Xianglong Yan , Yulun Zhang This is my paper

Pith reviewed 2026-05-22 13:08 UTC · model grok-4.3

classification 💻 cs.CV

keywords diffusion transformerquantization aware traininglow-bit quantizationimage generationactivation quantizationternary weightsmixed precision

0 comments

The pith

A new training framework lets Diffusion Transformers generate competitive images on ImageNet using ternary weights and average 2-bit activations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to prove that activation quantization, not weight quantization, is the main obstacle to running large Diffusion Transformers at extremely low precision. It introduces a quantization-aware training method that first builds a solid ternary-weight baseline and then adds a RobustQuantizer step. This step applies the Hadamard transform to reshape per-token activation statistics into normal distributions that are easier to quantize. An additional activation-only mixed-precision scheme then assigns different bit widths to individual layers to avoid information loss. If the approach holds, it would make high-quality image generation feasible with far less memory and compute than current full-precision DiTs require.

Core claim

The central claim is that the Hadamard transform converts unknown per-token activation distributions into per-token normal distributions, which in turn supports reliable quantization to an average of 2 bits. When this is paired with an activation-only mixed-precision network that keeps ternary weights everywhere while varying activation precision layer by layer, the resulting model produces stable and competitive unconditional and conditional images on ImageNet-1K, establishing the first such result at this bit width.

What carries the argument

RobustQuantizer, which uses the Hadamard transform to normalize per-token activations before quantization, together with the AMPN pipeline that allocates different activation precisions per layer while holding all weights at ternary precision.

Load-bearing premise

The Hadamard transform can reliably turn arbitrary per-token activation distributions into normal distributions that quantize accurately at low bit widths.

What would settle it

A sharp rise in FID score or visible degradation in generated ImageNet images when the Hadamard transform is removed from the activation quantization path.

Figures

Figures reproduced from arXiv: 2509.23582 by Haotong Qin, Kaicheng Yang, Kaisen Yang, Xianglong Yan, Xun Zhang, Yucheng Lin, Yulun Zhang.

**Figure 1.** Figure 1: RobuQ enables DiTs to generate competitive results at ultra-low bit setting. We select 256×256 images from W1.58A3 quantized DiT-XL/2 trained on ImageNet-1K. ABSTRACT Diffusion Transformers (DiTs) have recently emerged as a powerful backbone for image generation, demonstrating superior scalability and performance over U-Net architectures. However, their practical deployment is hindered by substantial comp… view at source ↗

**Figure 2.** Figure 2: Overall Framework of Our Quantization Pipeline. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of how the Hadamard transforms per-token unknown distributions (left) into a [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: An illustration of why PTQ sensitivity metrics fail for ultra-low-bit QAT mixed-precision. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Visualization of the performance and efficiency of RobuQ and comparative approaches. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: FLOPs and Memory Breakdown in DiT-XL/2 Model. [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Schematic diagram of actual deployment. For simplicity, we have omitted the AdaLN [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization of Activation Bit-Width Distribution [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Per-Block Activation Statistics. Top: average activation bits per block; Bottom: normal [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: W1.58 DiT-XL/2 samples at 256×256. Labels = [360, 985, 309, 207, 387, 279, 417, 973]. Cfg = 4.0, sampling steps = 250. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

read the original abstract

Diffusion Transformers (DiTs) have recently emerged as a powerful backbone for image generation, demonstrating superior scalability and performance over U-Net architectures. However, their practical deployment is hindered by substantial computational and memory costs. While Quantization-Aware Training (QAT) has shown promise for U-Nets, its application to DiTs faces unique challenges, primarily due to the sensitivity and distributional complexity of activations. In this work, we identify activation quantization as the primary bottleneck for pushing DiTs to extremely low-bit settings. To address this, we propose a systematic QAT framework for DiTs, named RobuQ. We start by establishing a strong ternary weight (W1.58A4) DiT baseline. Building upon this, we propose RobustQuantizer to achieve robust activation quantization. Our theoretical analyses show that the Hadamard transform can convert unknown per-token distributions into per-token normal distributions, providing a strong foundation for this method. Furthermore, we propose AMPN, the first Activation-only Mixed-Precision Network pipeline for DiTs. This method applies ternary weights across the entire network while allocating different activation precisions to each layer to eliminate information bottlenecks. Through extensive experiments on unconditional and conditional image generation, our RobuQ framework achieves state-of-the-art performance for DiT quantization in sub-4-bit quantization configuration. To the best of our knowledge, RobuQ is the first achieving stable and competitive image generation on large datasets like ImageNet-1K with activations quantized to average 2 bits. The code and models will be available at https://github.com/racoonykc/RobuQ .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper claims to introduce RobuQ, a systematic QAT framework for DiTs that achieves W1.58A2 quantization. It starts with a strong ternary weight W1.58A4 baseline, proposes RobustQuantizer based on theoretical analysis that the Hadamard transform converts per-token distributions to normal distributions for robust activation quantization, and introduces AMPN for activation-only mixed-precision to allocate different precisions per layer. Through experiments on unconditional and conditional image generation on ImageNet-1K, it achieves SOTA for sub-4-bit DiT quantization and is the first to achieve stable competitive performance with average 2-bit activations.

Significance. If the results and theoretical justification hold, this is a significant advance in quantizing large generative models. Pushing DiTs to extremely low-bit settings with competitive performance on large datasets like ImageNet-1K could enable more efficient deployment of diffusion models. The use of Hadamard transform for activation normalization and the AMPN pipeline are potentially impactful contributions to the field of model quantization for vision transformers.

major comments (1)

[Theoretical Analysis] Theoretical analysis section: The justification for RobustQuantizer rests on the Hadamard transform converting arbitrary per-token activation distributions into per-token normal distributions that are easy to quantize at 2 bits. This distributional property is invoked to argue that activation quantization is no longer the primary bottleneck, but the manuscript provides no direct empirical verification (e.g., histograms, QQ plots, or statistical tests) on activation statistics from actual DiT attention and MLP blocks. Without this, the quantization error bounds and the central W1.58A2 claim on ImageNet-1K rest on an unconfirmed assumption.

minor comments (2)

The abstract states that code and models will be released; ensure the repository includes full reproduction scripts, exact hyperparameter settings for AMPN and QAT, and the per-layer bit-allocation tables.
Experimental results would benefit from reporting standard deviations or multiple random seeds to substantiate claims of 'stable' performance at W1.58A2.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and insightful comments, which have helped us identify areas to strengthen the manuscript. We address the major comment point by point below.

read point-by-point responses

Referee: Theoretical analysis section: The justification for RobustQuantizer rests on the Hadamard transform converting arbitrary per-token activation distributions into per-token normal distributions that are easy to quantize at 2 bits. This distributional property is invoked to argue that activation quantization is no longer the primary bottleneck, but the manuscript provides no direct empirical verification (e.g., histograms, QQ plots, or statistical tests) on activation statistics from actual DiT attention and MLP blocks. Without this, the quantization error bounds and the central W1.58A2 claim on ImageNet-1K rest on an unconfirmed assumption.

Authors: We appreciate the referee's point that direct empirical verification would strengthen the theoretical justification. While the analysis derives the normalization property from the Hadamard matrix's orthogonality and its effect on per-token statistics, we agree that showing this on real DiT activations is valuable. In the revised manuscript we will add histograms, Q-Q plots, and summary statistics (skewness, kurtosis, and Shapiro-Wilk p-values) for activations extracted from both attention and MLP blocks of the DiT model, comparing distributions before and after the Hadamard transform. These additions will empirically support the claim that activation quantization ceases to be the dominant bottleneck and will reinforce the reported W1.58A2 results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on independent theoretical derivation and empirical results

full rationale

The paper derives RobustQuantizer from a theoretical analysis of the Hadamard transform's effect on per-token activation distributions, presents this as a first-principles property within the manuscript, and validates the overall W1.58A2 framework through direct experiments on ImageNet-1K. No step reduces a claimed prediction or uniqueness result to a fitted parameter from the target data, a self-citation chain, or a renaming of known patterns; the central performance claims remain externally falsifiable via the reported generation metrics.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework introduces two new algorithmic components (RobustQuantizer and AMPN) whose correctness depends on the Hadamard normality claim and on the empirical observation that mixed activation precision removes bottlenecks. No new physical entities are postulated.

free parameters (1)

per-layer activation bit allocations
Chosen to eliminate information bottlenecks; values are not stated in the abstract but must be selected or searched for each architecture.

axioms (1)

domain assumption Hadamard transform converts unknown per-token distributions into per-token normal distributions
Invoked to justify RobustQuantizer; treated as a mathematical property that holds for the activation statistics encountered in DiTs.

pith-pipeline@v0.9.0 · 5848 in / 1393 out tokens · 31149 ms · 2026-05-22T13:08:27.848447+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 4 internal anchors

[1]

Croci, Bo Li, Pashmina Cameron, Martin Jaggi, Dan Alistarh, Torsten Hoefler, and James Hensman

Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Pashmina Cameron, Martin Jaggi, Dan Alistarh, Torsten Hoefler, and James Hensman. Quarot: Outlier-free 4-bit inference in rotated llms. In NeurIPS, 2024

work page 2024
[2]

A note on the inception score

Shane Barratt and Rishi Rharma. A note on the inception score. In ICML Workshop, 2018

work page 2018
[3]

A lyapunov type bound in R ^d

Vidmantas Bentkus. A lyapunov type bound in R ^d . Theory of Probability & Its Applications, 1997

work page 1997
[4]

Probability and Measure

Patrick Billingsley. Probability and Measure. 1995

work page 1995
[5]

Sergey G. Bobkov. Refinements of berry--esseen inequalities in terms of lyapunov coefficients. Journal of Fourier Analysis and Applications, 2023

work page 2023
[6]

Cl \'e ment L. Canonne. A short note on an inequality between kl and tv. arXiv:2202.07198, 2022

work page arXiv 2022
[7]

Q-dit: Accurate post-training quantization for diffusion transformers

Lei Chen, Yuan Meng, Chen Tang, Xinzhu Ma, Jinyan Jiang, Xin Wang, Zhi Wang, and Wenwu Zhu. Q-dit: Accurate post-training quantization for diffusion transformers. In CVPR, 2025

work page 2025
[8]

Wavegrad: Estimating gradients for waveform generation

Nanxin Chen, Yu Zhang, Heiga Zen, Ron J Weiss, Mohammad Norouzi, and William Chan. Wavegrad: Estimating gradients for waveform generation. In ICLR, 2020

work page 2020
[9]

Hierarchical integration diffusion model for realistic image deblurring

Zheng Chen, Yulun Zhang, Ding Liu, Bin Xia, Jinjin Gu, Linghe Kong, and Xin Yuan. Hierarchical integration diffusion model for realistic image deblurring. In NeurIPS, 2023

work page 2023
[10]

Binarized diffusion model for image super-resolution

Zheng Chen, Haotong Qin, Yong Guo, Xiongfei Su, Xin Yuan, Linghe Kong, and Yulun Zhang. Binarized diffusion model for image super-resolution. In NeurIPS, 2024

work page 2024
[11]

Cover and Joy A

Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. 2006

work page 2006
[12]

Diffusion models in vision: A survey

Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah. Diffusion models in vision: A survey. TPAMI, 2023

work page 2023
[13]

Information Theory: Coding Theorems for Discrete Memoryless Systems

Imre Csisz \'a r and J \'a nos K \"o rner. Information Theory: Coding Theorems for Discrete Memoryless Systems. 2011

work page 2011
[14]

Diffusion models beat gans on image synthesis

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. In NeurIPS, 2021

work page 2021
[15]

Mpq-dm: Mixed precision quantization for extremely low bit diffusion models

Weilun Feng, Haotong Qin, Chuanguang Yang, Zhulin An, Libo Huang, Boyu Diao, Fei Wang, Renshuai Tao, Yongjun Xu, and Michele Magno. Mpq-dm: Mixed precision quantization for extremely low bit diffusion models. In AAAI, 2025 a

work page 2025
[16]

Mpq-dmv2: Flexible residual mixed precision quantization for low-bit diffusion models with temporal distillation

Weilun Feng, Chuanguang Yang, Haotong Qin, Yuqi Li, Xiangqi Li, Zhulin An, Libo Huang, Boyu Diao, Fuzhen Zhuang, Michele Magno, Yongjun Xu, Yingli Tian, and Tingwen Huang. Mpq-dmv2: Flexible residual mixed precision quantization for low-bit diffusion models with temporal distillation. In arXiv preprint arXiv:2507.04290, 2025 b

work page arXiv 2025
[17]

Unified matrix treatment of the fast walsh--hadamard transform

Bernard Fino and Vadim Algazi. Unified matrix treatment of the fast walsh--hadamard transform. IEEE Transactions on Computers, 1976

work page 1976
[18]

Limit Distributions for Sums of Independent Random Variables

Boris Vladimirovich Gnedenko and Andrey Nikolaevich Kolmogorov. Limit Distributions for Sums of Independent Random Variables. 1954

work page 1954
[19]

Reti-diff: Illumination degradation image restoration with retinex-based latent diffusion model

Chunming He, Chengyu Fang, Yulun Zhang, Kai Li, Longxiang Tang, Chenyu You, Fengyang Xiao, Zhenhua Guo, and Xiu Li. Reti-diff: Illumination degradation image restoration with retinex-based latent diffusion model. arXiv preprint arXiv:2311.11638, 2023

work page arXiv 2023
[20]

Diffusion models in low-level vision: A survey

Chunming He, Yuqi Shen, Chengyu Fang, Fengyang Xiao, Longxiang Tang, Yulun Zhang, Wangmeng Zuo, Zhenhua Guo, and Xiu Li. Diffusion models in low-level vision: A survey. arXiv preprint arXiv:2406.11138, 2024 a

work page arXiv 2024
[21]

Efficientdm: Efficient quantization-aware fine-tuning of low-bit diffusion models

YeFei He, Jing Liu, Weijia Wu, Hong Zhou, and Bohan Zhuang. Efficientdm: Efficient quantization-aware fine-tuning of low-bit diffusion models. In ICLR, 2024 b

work page 2024
[22]

Gans trained by a two time-scale update rule converge to a local nash equilibrium

Martin Heusel, Hubert Ransauer, Thomas Unterhiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, 2017

work page 2017
[23]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. arXiv:2207.12598, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[24]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In NeurIPS, 2020

work page 2020
[25]

St-p3: End-to-end vision-based autonomous driving via spatial-temporal feature learning

Shengchao Hu, Li Chen, Penghao Wu, Hongyang Li, Junchi Yan, and Dacheng Tao. St-p3: End-to-end vision-based autonomous driving via spatial-temporal feature learning. In ECCV, 2022

work page 2022
[26]

Ostquant: Refining large language model quantization with orthogonal and scaling transformations for better distribution fitting

Xing Hu, Yuan Cheng, Dawei Yang, Zukang Xu, Zhihang Yuan, Jiangyong Yu, Chen Xu, Zhe Jiang, and Sifan Zhou. Ostquant: Refining large language model quantization with orthogonal and scaling transformations for better distribution fitting. In ICLR, 2025

work page 2025
[27]

Tq-dit: Efficient time-aware quantization for diffusion transformers

Younghye Hwang, Hyojin Lee, and Joonhyuk Kang. Tq-dit: Efficient time-aware quantization for diffusion transformers. In arXiv preprint arXiv:2502.04056, 2025

work page arXiv 2025
[28]

Edwin T. Jaynes. Information theory and statistical mechanics. 1957

work page 1957
[29]

Learned step size quantization

Steven K Esser, Jeffrey L McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra S Modha. Learned step size quantization. In ICLR, 2019

work page 2019
[30]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In CVPR, 2019

work page 2019
[31]

Mixdit: Accelerating image diffusion transformer inference with mixed-precision mx quantization

Daeun Kim, Jinwoo Hwang, Changhun Oh, and Jongse Park. Mixdit: Accelerating image diffusion transformer inference with mixed-precision mx quantization. In arXiv preprint arXiv:2504.08398, 2025

work page arXiv 2025
[32]

Smoothing the Edges: Smooth Optimization for Sparse Regularization using Hadamard Overparametrization

Chris Kolb, Christian L. Müller, Bernd Bisch, and David Rügamer. Smoothing the edges: Smooth optimization for sparse regularization using hadamard overparametrization. In arXiv preprint arXiv:2307.03571, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[33]

Svdquant: Absorbing outliers by low-rank components for 4-bit diffusion models

Muyang Li, Yujun Lin, Zhekai Zhang, Tianle Cai, Xiuyu Li, Junxian Guo, Enze Xie, Chenlin Meng, Jun-Yan Zhu, and Song Han. Svdquant: Absorbing outliers by low-rank components for 4-bit diffusion models. In ICLR, 2025

work page 2025
[34]

Snapfusion: Text-to-image diffusion model on mobile devices within two seconds

Yanyu Li, Huan Wang, Qing Jin, Ju Hu, Pavlo Chemerys, Yun Fu, Yanzhi Wang, Sergey Tulyakov, and Jian Ren. Snapfusion: Text-to-image diffusion model on mobile devices within two seconds. In NeurIPS, 2024

work page 2024
[35]

Mpgraf: a modular and pre-trained graphformer for learning to rank at web-scale

Yuchen Li, Haoyi Xiong, Linghe Kong, Zeyi Sun, Hongyang Chen, Shuaiqiang Wang, and Dawei Yin. Mpgraf: a modular and pre-trained graphformer for learning to rank at web-scale. In ICDM, 2023 a

work page 2023
[36]

Mhrr: Moocs recommender service with meta hierarchical reinforced ranking

Yuchen Li, Haoyi Xiong, Linghe Kong, Rui Zhang, Fanqin Xu, Guihai Chen, and Minglu Li. Mhrr: Moocs recommender service with meta hierarchical reinforced ranking. TSC, 2023 b

work page 2023
[37]

Duquant: Distributing outliers via dual transformation makes stronger quantized llms

Haokun Lin, Haobo Xu, Yichen Wu, Jingzhi Cui, Yingtao Zhang, Linzhan Mou, Linqi Song, Zhenan Sun, and Ying Wei. Duquant: Distributing outliers via dual transformation makes stronger quantized llms. In NeurIPS, 2025

work page 2025
[38]

Intelligent grimm-open-ended visual storytelling via latent diffusion models

Chang Liu, Haoning Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, and Weidi Xie. Intelligent grimm-open-ended visual storytelling via latent diffusion models. In CVPR, 2024

work page 2024
[39]

Bimacosr: Binary one-step diffusion model leveraging flexible matrix compression for real super-resolution

Kai Liu, Kaicheng Yang, Zheng Chen, Zhiteng Li, Yong Guo, Wenbo Li, Linghe Kong, and Yulun Zhang. Bimacosr: Binary one-step diffusion model leveraging flexible matrix compression for real super-resolution. In ICML, 2025 a

work page 2025
[40]

Hq-dit: Efficient diffusion transformer with fp4 hybrid quantization

Wenxuan Liu and Sai Qian Zhang. Hq-dit: Efficient diffusion transformer with fp4 hybrid quantization. In arXiv preprint arXiv:2405.19751, 2024

work page arXiv 2024
[41]

Spinquant: Llm quantization with learned rotations

Zechun Liu, Changsheng Zhao, Igor Fedorov, Bilge Soran, Dhruv Choudhary, Raghuraman Krishnamoorthi, Vikas Chandra, Yuandong Tian, and Tijmen Blankevoort. Spinquant: Llm quantization with learned rotations. In ICLR, 2025 b

work page 2025
[42]

Lloyd and Bell Laboratories

S. Lloyd and Bell Laboratories. Least squares quantization in pcm. In IEEEXplore, 1982

work page 1982
[43]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In ICLR, 2019

work page 2019
[44]

Terdit: Ternary diffusion models with transformers

Xudong Lu, Aojun Zhou, Ziyi Lin, Yuhui Liu, Qi adn Xu, Renrui Zhang, Xue Yang, Junchi Yan, Peng Gao, and Hongsheng Li. Terdit: Ternary diffusion models with transformers. In arXiv preprint arXiv:2405.14854, 2024

work page arXiv 2024
[45]

Ptq4sam: Post-training quantization for segment anything

Chengtao Lv, Hong Chen, Jingyang Guo, Yifu Ding, and Xianglong Liu. Ptq4sam: Post-training quantization for segment anything. In CVPR, 2024

work page 2024
[46]

A calculus proof of the Cram\'er-Wold theorem

Russell Lyons and Kevin Zumbrun. A calculus proof of the cram \'e r--wold theorem. Proceedings of the American Mathematical Society, 2017. arXiv:1607.03206

work page internal anchor Pith review Pith/arXiv arXiv 2017
[47]

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Shunming Ma, Hongyu Wang, Lingxiao Ma, Wenhui Wang, Lei adn Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, and Furu Wei. The era of 1-bit llms: All large language models are in 1.58 bits. In arXiv preprint arXiv:2402.17764, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[48]

Generating images with sparse representations

Charlie Nash, Jacob Menick, Sander Dieleman, and Peter W Battaglia. Generating images with sparse representations. In arXiv preprint arXiv:2103.03841, 2021

work page arXiv 2021
[49]

Pytorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019

work page 2019
[50]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. In ICCV, 2023

work page 2023
[51]

The matrix cookbook, 2012

Kaare Brandt Petersen and Michael Syskind Pedersen. The matrix cookbook, 2012

work page 2012
[52]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj \"o rn Ommer. High-resolution image synthesis with latent diffusion models. In CVPR, 2022

work page 2022
[53]

U-net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015

work page 2015
[54]

Imagenet large scale visual recognition challenge

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. IJCV, 2015

work page 2015
[55]

Improved techniques for training gans

Tim Salimans, Ian Goodfellow, Wojciench Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In NeurIPS, 2016

work page 2016
[56]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NeurIPS, 2017

work page 2017
[57]

Quest: Low-bit diffusion model quantization via efficient selective finetuning

Haoxuan Wang, Yuzhang Shang, Zhihang Yuan, Junchi Wu, and Yan Yan. Quest: Low-bit diffusion model quantization via efficient selective finetuning. In ICCV, 2025 a

work page 2025
[58]

Bitnet v2: Native 4-bit activations with hadamard transformation for 1-bit llms

Hongyu Wang, Shuming Ma, and Furu Wei. Bitnet v2: Native 4-bit activations with hadamard transformation for 1-bit llms. In arXiv preprint arXiv:2504.18415, 2025 b

work page arXiv 2025
[59]

Ptq4dit: Post-training quantization for diffusion transformers

Junyi Wu, Haoxuan Wang, Yuzhang Shang, Mubarak Shah, and Yan Yan. Ptq4dit: Post-training quantization for diffusion transformers. In NeurIPS, 2024

work page 2024
[60]

Diffusion models: A comprehensive survey of methods and applications

Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. Diffusion models: A comprehensive survey of methods and applications. In ACM Computing Surveys, 2023

work page 2023
[61]

Hadamard Matrix Analysis and Synthesis: With Applications to Communications and Signal/Image Processing

Rao Yarlagadda and John Hershey. Hadamard Matrix Analysis and Synthesis: With Applications to Communications and Signal/Image Processing. 1993

work page 1993
[62]

F. Yates. A fast algorithm for hadamard transform. Mathematical Proceedings of the Cambridge Philosophical Society, 1968

work page 1968
[63]

Flexible residual binarization for image super-resolution

Yulun Zhang, Haotong Qin, Zixiang Zhao, Xianglong Liu, Martin Danelljan, and Fisher Yu. Flexible residual binarization for image super-resolution. In ICML, 2024

work page 2024
[64]

Shengen, Guohao Dai, and Yu Wang

Tianchen Zhao, Xuefei Ning, Tongcheng Fang, Enshu Liu, Guyue Huang, Zinan Lin, Yan. Shengen, Guohao Dai, and Yu Wang. Mixdq: Memory-efficient few-step text-to-image diffusion models with metric-decoupled mixed precision quantization. In ECCV, 2024 a

work page 2024
[65]

Vidit-q: Efficient and accurate quantization of diffusion transformers for image and video generation

Tianchen Zhao, Tongcheng Fang, Haofeng Huang, Enshu Liu, Rui Wan, Widyadewi Soedarmadji, Shiyao Li, Zinan Lin, Guohao Dai, Shengen Yan, Xuefei Yang, Huazhong aand Nong, and Yu Wang. Vidit-q: Efficient and accurate quantization of diffusion transformers for image and video generation. In ICLR, 2025

work page 2025
[66]

Dc-solver: Improving predictor-corrector diffusion sampler via dynamic compensation

Wenliang Zhao, Haolin Wang, Jie Zhou, and Jiwen Lu. Dc-solver: Improving predictor-corrector diffusion sampler via dynamic compensation. In arXiv preprint arXiv:2409.03755, 2024, 2024 b

work page arXiv 2024
[67]

Bidm: Pushing the limit of quantization for diffusion models

Xinyu Zheng, Xianglong Liu, Yichen Bian, Xudong Ma, Yulun Zhang, Jiakai Wang, Jingyang Guo, and Haotong Qin. Bidm: Pushing the limit of quantization for diffusion models. In NeurIPS, 2024

work page 2024
[68]

Binarydm: Accurate weight binarization for efficient diffusion models

Xinyu Zheng, Xianglong Liu, Haotong Qin, Xudong Ma, Mingyuan Zhang, Haojie Hao, Jiakai Wang, Zixiang Zhao, Jingyang Guo, and Michele Magno. Binarydm: Accurate weight binarization for efficient diffusion models. In ICLR, 2025

work page 2025
[69]

Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients

Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. ICLR, 2016

work page 2016

[1] [1]

Croci, Bo Li, Pashmina Cameron, Martin Jaggi, Dan Alistarh, Torsten Hoefler, and James Hensman

Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Pashmina Cameron, Martin Jaggi, Dan Alistarh, Torsten Hoefler, and James Hensman. Quarot: Outlier-free 4-bit inference in rotated llms. In NeurIPS, 2024

work page 2024

[2] [2]

A note on the inception score

Shane Barratt and Rishi Rharma. A note on the inception score. In ICML Workshop, 2018

work page 2018

[3] [3]

A lyapunov type bound in R ^d

Vidmantas Bentkus. A lyapunov type bound in R ^d . Theory of Probability & Its Applications, 1997

work page 1997

[4] [4]

Probability and Measure

Patrick Billingsley. Probability and Measure. 1995

work page 1995

[5] [5]

Sergey G. Bobkov. Refinements of berry--esseen inequalities in terms of lyapunov coefficients. Journal of Fourier Analysis and Applications, 2023

work page 2023

[6] [6]

Cl \'e ment L. Canonne. A short note on an inequality between kl and tv. arXiv:2202.07198, 2022

work page arXiv 2022

[7] [7]

Q-dit: Accurate post-training quantization for diffusion transformers

Lei Chen, Yuan Meng, Chen Tang, Xinzhu Ma, Jinyan Jiang, Xin Wang, Zhi Wang, and Wenwu Zhu. Q-dit: Accurate post-training quantization for diffusion transformers. In CVPR, 2025

work page 2025

[8] [8]

Wavegrad: Estimating gradients for waveform generation

Nanxin Chen, Yu Zhang, Heiga Zen, Ron J Weiss, Mohammad Norouzi, and William Chan. Wavegrad: Estimating gradients for waveform generation. In ICLR, 2020

work page 2020

[9] [9]

Hierarchical integration diffusion model for realistic image deblurring

Zheng Chen, Yulun Zhang, Ding Liu, Bin Xia, Jinjin Gu, Linghe Kong, and Xin Yuan. Hierarchical integration diffusion model for realistic image deblurring. In NeurIPS, 2023

work page 2023

[10] [10]

Binarized diffusion model for image super-resolution

Zheng Chen, Haotong Qin, Yong Guo, Xiongfei Su, Xin Yuan, Linghe Kong, and Yulun Zhang. Binarized diffusion model for image super-resolution. In NeurIPS, 2024

work page 2024

[11] [11]

Cover and Joy A

Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. 2006

work page 2006

[12] [12]

Diffusion models in vision: A survey

Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah. Diffusion models in vision: A survey. TPAMI, 2023

work page 2023

[13] [13]

Information Theory: Coding Theorems for Discrete Memoryless Systems

Imre Csisz \'a r and J \'a nos K \"o rner. Information Theory: Coding Theorems for Discrete Memoryless Systems. 2011

work page 2011

[14] [14]

Diffusion models beat gans on image synthesis

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. In NeurIPS, 2021

work page 2021

[15] [15]

Mpq-dm: Mixed precision quantization for extremely low bit diffusion models

Weilun Feng, Haotong Qin, Chuanguang Yang, Zhulin An, Libo Huang, Boyu Diao, Fei Wang, Renshuai Tao, Yongjun Xu, and Michele Magno. Mpq-dm: Mixed precision quantization for extremely low bit diffusion models. In AAAI, 2025 a

work page 2025

[16] [16]

Mpq-dmv2: Flexible residual mixed precision quantization for low-bit diffusion models with temporal distillation

Weilun Feng, Chuanguang Yang, Haotong Qin, Yuqi Li, Xiangqi Li, Zhulin An, Libo Huang, Boyu Diao, Fuzhen Zhuang, Michele Magno, Yongjun Xu, Yingli Tian, and Tingwen Huang. Mpq-dmv2: Flexible residual mixed precision quantization for low-bit diffusion models with temporal distillation. In arXiv preprint arXiv:2507.04290, 2025 b

work page arXiv 2025

[17] [17]

Unified matrix treatment of the fast walsh--hadamard transform

Bernard Fino and Vadim Algazi. Unified matrix treatment of the fast walsh--hadamard transform. IEEE Transactions on Computers, 1976

work page 1976

[18] [18]

Limit Distributions for Sums of Independent Random Variables

Boris Vladimirovich Gnedenko and Andrey Nikolaevich Kolmogorov. Limit Distributions for Sums of Independent Random Variables. 1954

work page 1954

[19] [19]

Reti-diff: Illumination degradation image restoration with retinex-based latent diffusion model

Chunming He, Chengyu Fang, Yulun Zhang, Kai Li, Longxiang Tang, Chenyu You, Fengyang Xiao, Zhenhua Guo, and Xiu Li. Reti-diff: Illumination degradation image restoration with retinex-based latent diffusion model. arXiv preprint arXiv:2311.11638, 2023

work page arXiv 2023

[20] [20]

Diffusion models in low-level vision: A survey

Chunming He, Yuqi Shen, Chengyu Fang, Fengyang Xiao, Longxiang Tang, Yulun Zhang, Wangmeng Zuo, Zhenhua Guo, and Xiu Li. Diffusion models in low-level vision: A survey. arXiv preprint arXiv:2406.11138, 2024 a

work page arXiv 2024

[21] [21]

Efficientdm: Efficient quantization-aware fine-tuning of low-bit diffusion models

YeFei He, Jing Liu, Weijia Wu, Hong Zhou, and Bohan Zhuang. Efficientdm: Efficient quantization-aware fine-tuning of low-bit diffusion models. In ICLR, 2024 b

work page 2024

[22] [22]

Gans trained by a two time-scale update rule converge to a local nash equilibrium

Martin Heusel, Hubert Ransauer, Thomas Unterhiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, 2017

work page 2017

[23] [23]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. arXiv:2207.12598, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[24] [24]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In NeurIPS, 2020

work page 2020

[25] [25]

St-p3: End-to-end vision-based autonomous driving via spatial-temporal feature learning

Shengchao Hu, Li Chen, Penghao Wu, Hongyang Li, Junchi Yan, and Dacheng Tao. St-p3: End-to-end vision-based autonomous driving via spatial-temporal feature learning. In ECCV, 2022

work page 2022

[26] [26]

Ostquant: Refining large language model quantization with orthogonal and scaling transformations for better distribution fitting

Xing Hu, Yuan Cheng, Dawei Yang, Zukang Xu, Zhihang Yuan, Jiangyong Yu, Chen Xu, Zhe Jiang, and Sifan Zhou. Ostquant: Refining large language model quantization with orthogonal and scaling transformations for better distribution fitting. In ICLR, 2025

work page 2025

[27] [27]

Tq-dit: Efficient time-aware quantization for diffusion transformers

Younghye Hwang, Hyojin Lee, and Joonhyuk Kang. Tq-dit: Efficient time-aware quantization for diffusion transformers. In arXiv preprint arXiv:2502.04056, 2025

work page arXiv 2025

[28] [28]

Edwin T. Jaynes. Information theory and statistical mechanics. 1957

work page 1957

[29] [29]

Learned step size quantization

Steven K Esser, Jeffrey L McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra S Modha. Learned step size quantization. In ICLR, 2019

work page 2019

[30] [30]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In CVPR, 2019

work page 2019

[31] [31]

Mixdit: Accelerating image diffusion transformer inference with mixed-precision mx quantization

Daeun Kim, Jinwoo Hwang, Changhun Oh, and Jongse Park. Mixdit: Accelerating image diffusion transformer inference with mixed-precision mx quantization. In arXiv preprint arXiv:2504.08398, 2025

work page arXiv 2025

[32] [32]

Smoothing the Edges: Smooth Optimization for Sparse Regularization using Hadamard Overparametrization

Chris Kolb, Christian L. Müller, Bernd Bisch, and David Rügamer. Smoothing the edges: Smooth optimization for sparse regularization using hadamard overparametrization. In arXiv preprint arXiv:2307.03571, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[33] [33]

Svdquant: Absorbing outliers by low-rank components for 4-bit diffusion models

Muyang Li, Yujun Lin, Zhekai Zhang, Tianle Cai, Xiuyu Li, Junxian Guo, Enze Xie, Chenlin Meng, Jun-Yan Zhu, and Song Han. Svdquant: Absorbing outliers by low-rank components for 4-bit diffusion models. In ICLR, 2025

work page 2025

[34] [34]

Snapfusion: Text-to-image diffusion model on mobile devices within two seconds

Yanyu Li, Huan Wang, Qing Jin, Ju Hu, Pavlo Chemerys, Yun Fu, Yanzhi Wang, Sergey Tulyakov, and Jian Ren. Snapfusion: Text-to-image diffusion model on mobile devices within two seconds. In NeurIPS, 2024

work page 2024

[35] [35]

Mpgraf: a modular and pre-trained graphformer for learning to rank at web-scale

Yuchen Li, Haoyi Xiong, Linghe Kong, Zeyi Sun, Hongyang Chen, Shuaiqiang Wang, and Dawei Yin. Mpgraf: a modular and pre-trained graphformer for learning to rank at web-scale. In ICDM, 2023 a

work page 2023

[36] [36]

Mhrr: Moocs recommender service with meta hierarchical reinforced ranking

Yuchen Li, Haoyi Xiong, Linghe Kong, Rui Zhang, Fanqin Xu, Guihai Chen, and Minglu Li. Mhrr: Moocs recommender service with meta hierarchical reinforced ranking. TSC, 2023 b

work page 2023

[37] [37]

Duquant: Distributing outliers via dual transformation makes stronger quantized llms

Haokun Lin, Haobo Xu, Yichen Wu, Jingzhi Cui, Yingtao Zhang, Linzhan Mou, Linqi Song, Zhenan Sun, and Ying Wei. Duquant: Distributing outliers via dual transformation makes stronger quantized llms. In NeurIPS, 2025

work page 2025

[38] [38]

Intelligent grimm-open-ended visual storytelling via latent diffusion models

Chang Liu, Haoning Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, and Weidi Xie. Intelligent grimm-open-ended visual storytelling via latent diffusion models. In CVPR, 2024

work page 2024

[39] [39]

Bimacosr: Binary one-step diffusion model leveraging flexible matrix compression for real super-resolution

Kai Liu, Kaicheng Yang, Zheng Chen, Zhiteng Li, Yong Guo, Wenbo Li, Linghe Kong, and Yulun Zhang. Bimacosr: Binary one-step diffusion model leveraging flexible matrix compression for real super-resolution. In ICML, 2025 a

work page 2025

[40] [40]

Hq-dit: Efficient diffusion transformer with fp4 hybrid quantization

Wenxuan Liu and Sai Qian Zhang. Hq-dit: Efficient diffusion transformer with fp4 hybrid quantization. In arXiv preprint arXiv:2405.19751, 2024

work page arXiv 2024

[41] [41]

Spinquant: Llm quantization with learned rotations

Zechun Liu, Changsheng Zhao, Igor Fedorov, Bilge Soran, Dhruv Choudhary, Raghuraman Krishnamoorthi, Vikas Chandra, Yuandong Tian, and Tijmen Blankevoort. Spinquant: Llm quantization with learned rotations. In ICLR, 2025 b

work page 2025

[42] [42]

Lloyd and Bell Laboratories

S. Lloyd and Bell Laboratories. Least squares quantization in pcm. In IEEEXplore, 1982

work page 1982

[43] [43]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In ICLR, 2019

work page 2019

[44] [44]

Terdit: Ternary diffusion models with transformers

Xudong Lu, Aojun Zhou, Ziyi Lin, Yuhui Liu, Qi adn Xu, Renrui Zhang, Xue Yang, Junchi Yan, Peng Gao, and Hongsheng Li. Terdit: Ternary diffusion models with transformers. In arXiv preprint arXiv:2405.14854, 2024

work page arXiv 2024

[45] [45]

Ptq4sam: Post-training quantization for segment anything

Chengtao Lv, Hong Chen, Jingyang Guo, Yifu Ding, and Xianglong Liu. Ptq4sam: Post-training quantization for segment anything. In CVPR, 2024

work page 2024

[46] [46]

A calculus proof of the Cram\'er-Wold theorem

Russell Lyons and Kevin Zumbrun. A calculus proof of the cram \'e r--wold theorem. Proceedings of the American Mathematical Society, 2017. arXiv:1607.03206

work page internal anchor Pith review Pith/arXiv arXiv 2017

[47] [47]

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Shunming Ma, Hongyu Wang, Lingxiao Ma, Wenhui Wang, Lei adn Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, and Furu Wei. The era of 1-bit llms: All large language models are in 1.58 bits. In arXiv preprint arXiv:2402.17764, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[48] [48]

Generating images with sparse representations

Charlie Nash, Jacob Menick, Sander Dieleman, and Peter W Battaglia. Generating images with sparse representations. In arXiv preprint arXiv:2103.03841, 2021

work page arXiv 2021

[49] [49]

Pytorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019

work page 2019

[50] [50]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. In ICCV, 2023

work page 2023

[51] [51]

The matrix cookbook, 2012

Kaare Brandt Petersen and Michael Syskind Pedersen. The matrix cookbook, 2012

work page 2012

[52] [52]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj \"o rn Ommer. High-resolution image synthesis with latent diffusion models. In CVPR, 2022

work page 2022

[53] [53]

U-net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015

work page 2015

[54] [54]

Imagenet large scale visual recognition challenge

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. IJCV, 2015

work page 2015

[55] [55]

Improved techniques for training gans

Tim Salimans, Ian Goodfellow, Wojciench Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In NeurIPS, 2016

work page 2016

[56] [56]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NeurIPS, 2017

work page 2017

[57] [57]

Quest: Low-bit diffusion model quantization via efficient selective finetuning

Haoxuan Wang, Yuzhang Shang, Zhihang Yuan, Junchi Wu, and Yan Yan. Quest: Low-bit diffusion model quantization via efficient selective finetuning. In ICCV, 2025 a

work page 2025

[58] [58]

Bitnet v2: Native 4-bit activations with hadamard transformation for 1-bit llms

Hongyu Wang, Shuming Ma, and Furu Wei. Bitnet v2: Native 4-bit activations with hadamard transformation for 1-bit llms. In arXiv preprint arXiv:2504.18415, 2025 b

work page arXiv 2025

[59] [59]

Ptq4dit: Post-training quantization for diffusion transformers

Junyi Wu, Haoxuan Wang, Yuzhang Shang, Mubarak Shah, and Yan Yan. Ptq4dit: Post-training quantization for diffusion transformers. In NeurIPS, 2024

work page 2024

[60] [60]

Diffusion models: A comprehensive survey of methods and applications

Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. Diffusion models: A comprehensive survey of methods and applications. In ACM Computing Surveys, 2023

work page 2023

[61] [61]

Hadamard Matrix Analysis and Synthesis: With Applications to Communications and Signal/Image Processing

Rao Yarlagadda and John Hershey. Hadamard Matrix Analysis and Synthesis: With Applications to Communications and Signal/Image Processing. 1993

work page 1993

[62] [62]

F. Yates. A fast algorithm for hadamard transform. Mathematical Proceedings of the Cambridge Philosophical Society, 1968

work page 1968

[63] [63]

Flexible residual binarization for image super-resolution

Yulun Zhang, Haotong Qin, Zixiang Zhao, Xianglong Liu, Martin Danelljan, and Fisher Yu. Flexible residual binarization for image super-resolution. In ICML, 2024

work page 2024

[64] [64]

Shengen, Guohao Dai, and Yu Wang

Tianchen Zhao, Xuefei Ning, Tongcheng Fang, Enshu Liu, Guyue Huang, Zinan Lin, Yan. Shengen, Guohao Dai, and Yu Wang. Mixdq: Memory-efficient few-step text-to-image diffusion models with metric-decoupled mixed precision quantization. In ECCV, 2024 a

work page 2024

[65] [65]

Vidit-q: Efficient and accurate quantization of diffusion transformers for image and video generation

Tianchen Zhao, Tongcheng Fang, Haofeng Huang, Enshu Liu, Rui Wan, Widyadewi Soedarmadji, Shiyao Li, Zinan Lin, Guohao Dai, Shengen Yan, Xuefei Yang, Huazhong aand Nong, and Yu Wang. Vidit-q: Efficient and accurate quantization of diffusion transformers for image and video generation. In ICLR, 2025

work page 2025

[66] [66]

Dc-solver: Improving predictor-corrector diffusion sampler via dynamic compensation

Wenliang Zhao, Haolin Wang, Jie Zhou, and Jiwen Lu. Dc-solver: Improving predictor-corrector diffusion sampler via dynamic compensation. In arXiv preprint arXiv:2409.03755, 2024, 2024 b

work page arXiv 2024

[67] [67]

Bidm: Pushing the limit of quantization for diffusion models

Xinyu Zheng, Xianglong Liu, Yichen Bian, Xudong Ma, Yulun Zhang, Jiakai Wang, Jingyang Guo, and Haotong Qin. Bidm: Pushing the limit of quantization for diffusion models. In NeurIPS, 2024

work page 2024

[68] [68]

Binarydm: Accurate weight binarization for efficient diffusion models

Xinyu Zheng, Xianglong Liu, Haotong Qin, Xudong Ma, Mingyuan Zhang, Haojie Hao, Jiakai Wang, Zixiang Zhao, Jingyang Guo, and Michele Magno. Binarydm: Accurate weight binarization for efficient diffusion models. In ICLR, 2025

work page 2025

[69] [69]

Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients

Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. ICLR, 2016

work page 2016