Post-Training Pruning for Diffusion Transformers

Chengzhi Hu; Jing Zhang; Mengjuan Chen; Qingyi Gu; Xuewen Liu; Zhikai Li

arxiv: 2607.00927 · v1 · pith:ZQBX6DWInew · submitted 2026-07-01 · 💻 cs.CV · cs.AI

Post-Training Pruning for Diffusion Transformers

Chengzhi Hu , Xuewen Liu , Jing Zhang , Mengjuan Chen , Zhikai Li , Qingyi Gu This is my paper

Pith reviewed 2026-07-02 13:52 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords diffusion transformerspost-training pruningDiT pruningmodel compressionimage generationsaliency metricenergy-based pruningclustering granularity

0 comments

The pith

DiT-Pruning uses an energy-based saliency metric and clustering-aware granularity to prune diffusion transformers while preserving image quality at high sparsity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that standard pruning techniques from LLMs degrade DiT performance because they ignore DiT-specific weight magnitudes and structural variations. It introduces DiT-Pruning with a custom metric that balances weight and activation contributions from an energy standpoint plus a pruning granularity that exploits observed two-dimensional clustering in weight space. A sympathetic reader would care because DiTs power high-quality image generation yet carry heavy compute costs; if the method works, it enables practical sparsity levels without meaningful quality drop. The central demonstration is near-zero CLIP score loss on FLUX.1-dev at 50 percent sparsity on MJHQ.

Core claim

DiT-Pruning improves post-training pruning of diffusion transformers by replacing LLM-derived saliency approximations with an energy-based criterion that balances weights and activations, while replacing uniform granularity with a clustering-aware allocation that matches distinct two-dimensional patterns in DiT weight space; this combination maintains generation quality far better than prior methods under high sparsity.

What carries the argument

Energy-based saliency metric that balances weight and activation contributions, paired with clustering-aware pruning granularity derived from two-dimensional weight-space patterns.

If this is right

DiT-Pruning produces only a 0.001 CLIP score drop on FLUX.1-dev at 512x512 resolution and 50 percent sparsity on MJHQ.
The method consistently outperforms recent pruning approaches on multiple DiT models while preserving image quality.
Clustering-aware granularity enables more effective sparse allocation than structure-agnostic baselines.
Post-training pruning becomes viable for DiTs without the severe degradation seen in LLM-adapted methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same energy-balance approach might reduce memory footprint enough to run larger DiTs on consumer hardware.
If the clustering pattern generalizes, similar granularity rules could apply to other attention-heavy generative architectures.
Combining this pruning with quantization could compound efficiency gains beyond what either technique achieves alone.

Load-bearing premise

The two-dimensional clustering patterns observed in DiT weight space remain stable enough across architectures and datasets that the same energy-balance parameter works without retuning.

What would settle it

Apply DiT-Pruning at 50 percent sparsity to a DiT variant outside the evaluated set and measure whether CLIP score degradation on a standard benchmark stays below 0.005.

Figures

Figures reproduced from arXiv: 2607.00927 by Chengzhi Hu, Jing Zhang, Mengjuan Chen, Qingyi Gu, Xuewen Liu, Zhikai Li.

**Figure 3.** Figure 3: Overview.(Left) Existing saliency metrics amplify the relative contribution of weights due to a series of approximations. We introduce a squared transformation on the weights (STW) to balance them. (Right) The importance scores exhibit clear clustering patterns in the two-dimensional weight space, enabling our clustering-aware granularity (CAG) for sparse allocation. Whereas Wanda in Eq. (4) further simpli… view at source ↗

**Figure 4.** Figure 4: (Left) Pruning loss increase grows quadratically with parameter magnitude. (Right) Analogous to elastic potential energy, loss increase can be interpreted as perturbation energy, we square the weights to restore the quadratic energy structure. Iij = (|Wij |) 2 · (∥Xj∥2) 1 (7) Motivated by this observation, we instantiate our parametric formulation by applying a squared transformation to the weight term (S… view at source ↗

**Figure 5.** Figure 5: Remaining layers’ weight distribution of DiT trans [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Calibration timestep evaluation. structured pruning results on FLUX [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Random samples generated by the pruned DiT [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 9.** Figure 9: Samples generated by pruned PixArt model at 50% [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 11.** Figure 11: Samples generated by pruned Flux model at 50% [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗

**Figure 12.** Figure 12: Samples generated by pruned Flux model at 50% [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗

read the original abstract

Diffusion Transformers (DiTs) have demonstrated impressive performance in image generation but suffer from substantial computational overhead and resource consumption. Post-training pruning offers a promising solution; however, due to DiTs' unique architectural design and parameter distribution, traditional pruning methods are inapplicable, leading to significant performance degradation. Specifically, prior methods developed for LLMs, which derive metrics through a series of approximations, amplify the relative contribution of weights in the saliency metric. In addition, weights in DiTs exhibit significantly larger magnitudes than those in LLMs. Moreover, existing pruning granularity overlooks variations in model structures. In this paper, we propose DiT-Pruning, which improves pruning performance by introducing customized saliency criteria and pruning granularity. We design a novel metric that balances the contributions of weights and activations from an energy-based perspective, enabling more effective identification of important elements. Furthermore, we observe distinct clustering patterns in the two-dimensional weight space. Accordingly, we adopt a clustering-aware pruning granularity, enabling effective sparse allocation. Extensive evaluations on various DiTs show that our method consistently preserves image quality, especially under high sparsity. For FLUX.1-dev at 512x512 resolution on MJHQ, DiT-Pruning achieves only a 0.001 loss in CLIP score at 50% sparsity, dramatically outperforming recent pruning methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DiT-Pruning adapts energy-based saliency and clustering granularity to DiTs with strong reported sparsity results, but the metric lacks derivation and generalization checks.

read the letter

The main point is that this paper gives a post-training pruning method for Diffusion Transformers that uses a custom energy-based saliency score balancing weights and activations, plus pruning granularity informed by observed 2D weight clustering. This setup is presented as necessary because standard LLM pruning metrics and uniform granularity do not transfer well to DiTs due to larger weight magnitudes and structural differences.

What stands out is the empirical side: on FLUX.1-dev at 512x512 on MJHQ, they report only a 0.001 CLIP score drop at 50% sparsity while outperforming recent baselines. That kind of preservation at high sparsity is the practical win for inference cost on image generation models.

The approach is new in its specific combination for DiTs, and the clustering observation leads to a reasonable adjustment in how sparsity is allocated. The paper does a service by targeting a deployed model class rather than staying in the LLM pruning lane.

Soft spots are clear from the abstract. There is no formula or derivation shown for the energy metric, no error bars on the numbers, and no account of how the balance scalar or clustering thresholds were picked. The stress-test concern lands: the headline result depends on the clustering patterns and single balance parameter transferring without retuning, yet the abstract supplies no cross-architecture or sensitivity checks. If those do not hold, the margin over baselines shrinks.

This is for people working on efficient deployment of generative models on limited hardware. It has enough concrete results and a timely problem to deserve peer review, though referees will need to see the metric details and ablations before the claims can be taken as general.

Recommendation: send it for review with requests for the missing derivations, error analysis, and transfer tests.

Referee Report

3 major / 1 minor

Summary. The paper proposes DiT-Pruning, a post-training method for pruning Diffusion Transformers that replaces standard saliency metrics with a custom energy-based criterion balancing weight and activation contributions and adopts clustering-aware granularity based on observed 2D weight-space patterns. It reports that the approach preserves generation quality far better than prior LLM-derived methods at high sparsity levels, with a 0.001 CLIP-score drop at 50% sparsity on FLUX.1-dev (512×512, MJHQ) while outperforming recent baselines across multiple DiT models.

Significance. If the energy metric and clustering structure prove stable without per-model retuning, the work would supply a practical, architecture-aware pruning recipe that materially reduces the inference cost of large DiTs while keeping image quality nearly intact; the headline empirical margin is large enough to matter for deployment. The absence of derivation details, hyper-parameter sensitivity, and cross-architecture validation, however, leaves the central claim dependent on unverified assumptions.

major comments (3)

[§3.1] §3.1 (Energy-based Saliency Metric): the balance scalar that weights the energy contributions of weights versus activations is introduced without an explicit equation, derivation, or procedure for its selection; because the entire saliency ranking depends on this scalar, the lack of any sensitivity analysis or cross-model stability test makes the reported 0.001 CLIP margin impossible to reproduce or generalize.
[§3.2] §3.2 (Clustering-aware Granularity): the claim that 2D weight-space clustering patterns are stable across DiT architectures and permit effective sparse allocation is supported only by observations on the models used for the main tables; no ablation transferring the same clustering thresholds or patterns to held-out DiT variants is provided, directly undermining the assertion of consistent outperformance.
[Results tables] Results tables (e.g., Table 2, FLUX.1-dev row): the headline metrics are reported as single-point values with neither error bars nor multiple random seeds, so it is impossible to determine whether the margin over baselines exceeds run-to-run variance.

minor comments (1)

[Abstract / §3.2] The abstract states that patterns were 'observed' but supplies no figure or quantitative description of the 2D clustering; a brief illustration in §3.2 would clarify the granularity choice.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects for improving reproducibility and generalizability. We address each major comment below and will revise the manuscript to incorporate the suggested additions.

read point-by-point responses

Referee: [§3.1] §3.1 (Energy-based Saliency Metric): the balance scalar that weights the energy contributions of weights versus activations is introduced without an explicit equation, derivation, or procedure for its selection; because the entire saliency ranking depends on this scalar, the lack of any sensitivity analysis or cross-model stability test makes the reported 0.001 CLIP margin impossible to reproduce or generalize.

Authors: We agree that the balance scalar requires an explicit formulation for reproducibility. In the revised version, we will add the precise equation defining the energy-based saliency metric (incorporating the scalar λ that balances weight and activation energy terms), a brief derivation from the energy perspective, and the selection procedure (calibration on a small held-out set). We will also include a sensitivity analysis varying λ across a range of values and report results on multiple DiT models to demonstrate stability of the 0.001 CLIP margin. revision: yes
Referee: [§3.2] §3.2 (Clustering-aware Granularity): the claim that 2D weight-space clustering patterns are stable across DiT architectures and permit effective sparse allocation is supported only by observations on the models used for the main tables; no ablation transferring the same clustering thresholds or patterns to held-out DiT variants is provided, directly undermining the assertion of consistent outperformance.

Authors: The clustering patterns were derived from empirical observations on the primary models. To strengthen the claim of stability, the revision will add an ablation study that applies the identical clustering thresholds and granularity rules to at least one additional held-out DiT architecture, reporting the resulting performance to confirm transferability and consistent outperformance. revision: yes
Referee: Results tables (e.g., Table 2, FLUX.1-dev row): the headline metrics are reported as single-point values with neither error bars nor multiple random seeds, so it is impossible to determine whether the margin over baselines exceeds run-to-run variance.

Authors: We acknowledge that single-point reporting limits variance assessment. Although the core pruning procedure is deterministic given fixed calibration data, the revision will include additional runs with varied random seeds for calibration-set sampling and will report means accompanied by standard deviations (error bars) in the updated tables. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical design choices evaluated on held-out metrics

full rationale

The paper introduces an energy-based saliency metric and clustering-aware granularity based on observed 2D weight patterns, presented as empirical observations rather than a derivation chain. No equations, self-citations, or fitted parameters are shown that reduce the reported CLIP-score preservation or outperformance claims to quantities defined from the same data by construction. The strongest claim is an empirical result on held-out MJHQ evaluations, making the proposal self-contained against external benchmarks with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

2 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields limited visibility into parameters; the energy balance appears to introduce at least one tunable scalar and clustering requires distance or density thresholds.

free parameters (2)

energy balance scalar
Balances weight and activation contributions in the novel saliency metric; value not stated and must be chosen to achieve reported results.
clustering thresholds
Parameters controlling how weights are grouped in 2D space before sparse allocation.

pith-pipeline@v0.9.1-grok · 5773 in / 1197 out tokens · 17661 ms · 2026-07-02T13:52:22.218834+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 10 canonical work pages · 7 internal anchors

[1]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Scalable diffusion models with transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
[2]

Chen, Junsong and Yu, Jincheng and Ge, Chongjian and Yao, Lewei and Xie, Enze and Wu, Yue and Wang, Zhongdao and Kwok, James and Luo, Ping and Lu, Huchuan and others , journal=. Pixart-
[3]

FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space

FLUX. 1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space , author=. arXiv preprint arXiv:2506.15742 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Advances in neural information processing systems , volume=

Optimal brain damage , author=. Advances in neural information processing systems , volume=
[5]

IEEE international conference on neural networks , pages=

Optimal brain surgeon and general network pruning , author=. IEEE international conference on neural networks , pages=. 1993 , organization=

1993
[6]

Advances in neural information processing systems , volume=

Learning both weights and connections for efficient neural network , author=. Advances in neural information processing systems , volume=
[7]

Advances in Neural Information Processing Systems , volume=

Optimal brain compression: A framework for accurate post-training quantization and pruning , author=. Advances in Neural Information Processing Systems , volume=
[8]

Rethinking the Value of Network Pruning

Rethinking the value of network pruning , author=. arXiv preprint arXiv:1810.05270 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Proceedings of machine learning and systems , volume=

What is the state of neural network pruning? , author=. Proceedings of machine learning and systems , volume=
[10]

Pruning Convolutional Neural Networks for Resource Efficient Inference

Pruning convolutional neural networks for resource efficient inference , author=. arXiv preprint arXiv:1611.06440 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Advances in neural information processing systems , volume=

Are sixteen heads really better than one? , author=. Advances in neural information processing systems , volume=
[12]

International conference on machine learning , pages=

Sparsegpt: Massive language models can be accurately pruned in one-shot , author=. International conference on machine learning , pages=. 2023 , organization=

2023
[13]

A Simple and Effective Pruning Approach for Large Language Models

A simple and effective pruning approach for large language models , author=. arXiv preprint arXiv:2306.11695 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Advances in Neural Information Processing Systems , year=

Structural pruning for diffusion models , author=. Advances in Neural Information Processing Systems , year=
[15]

International Conference on Medical image computing and computer-assisted intervention , pages=

U-net: Convolutional networks for biomedical image segmentation , author=. International Conference on Medical image computing and computer-assisted intervention , pages=. 2015 , organization=

2015
[16]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=
[17]

The Thirteenth International Conference on Learning Representations , year=

SANA: Efficient high-resolution text-to-image synthesis with linear diffusion transformers , author=. The Thirteenth International Conference on Learning Representations , year=
[18]

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding , author=. arXiv preprint arXiv:1510.00149 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[19]

Advances in Neural Information Processing Systems , volume=

A fast post-training pruning framework for transformers , author=. Advances in Neural Information Processing Systems , volume=
[20]

arXiv preprint arXiv:2002.04809 , year=

Lookahead: A far-sighted alternative of magnitude-based pruning , author=. arXiv preprint arXiv:2002.04809 , year=

work page arXiv 2002
[21]

Advances in neural information processing systems , volume=

Llm-pruner: On the structural pruning of large language models , author=. Advances in neural information processing systems , volume=
[22]

Advances in Neural Information Processing Systems , volume=

Slimgpt: Layer-wise structured pruning for large language models , author=. Advances in Neural Information Processing Systems , volume=
[23]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

Structured optimal brain pruning for large language models , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

2024
[24]

European conference on computer vision , pages=

Microsoft coco: Common objects in context , author=. European conference on computer vision , pages=. 2014 , organization=

2014
[25]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Q-diffusion: Quantizing diffusion models , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[26]

European Conference on Computer Vision , pages=

Mixdq: Memory-efficient few-step text-to-image diffusion models with metric-decoupled mixed precision quantization , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024
[27]

International conference on machine learning , pages=

Improved denoising diffusion probabilistic models , author=. International conference on machine learning , pages=. 2021 , organization=

2021
[28]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Post-training quantization on diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[29]

Advances in neural information processing systems , volume=

Ptq4dit: Post-training quantization for diffusion transformers , author=. Advances in neural information processing systems , volume=
[30]

International journal of computer vision , volume=

Imagenet large scale visual recognition challenge , author=. International journal of computer vision , volume=. 2015 , publisher=

2015
[31]

Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation

Playground v2. 5: Three insights towards enhancing aesthetic quality in text-to-image generation , author=. arXiv preprint arXiv:2402.17245 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[32]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Q-dit: Accurate post-training quantization for diffusion transformers , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
[33]

Advances in neural information processing systems , volume=

Gans trained by a two time-scale update rule converge to a local nash equilibrium , author=. Advances in neural information processing systems , volume=
[34]

arXiv preprint arXiv:2103.03841 , year=

Generating images with sparse representations , author=. arXiv preprint arXiv:2103.03841 , year=

work page arXiv
[35]

Improved Techniques for Training GANs , volume =

Salimans, Tim and Goodfellow, Ian and Zaremba, Wojciech and Cheung, Vicki and Radford, Alec and Chen, Xi and Chen, Xi , booktitle =. Improved Techniques for Training GANs , volume =
[36]

A Note on the Inception Score

A note on the inception score , author=. arXiv preprint arXiv:1801.01973 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[37]

Proceedings of the 2021 conference on empirical methods in natural language processing , pages=

Clipscore: A reference-free evaluation metric for image captioning , author=. Proceedings of the 2021 conference on empirical methods in natural language processing , pages=

2021
[38]

Advances in Neural Information Processing Systems , volume=

Imagereward: Learning and evaluating human preferences for text-to-image generation , author=. Advances in Neural Information Processing Systems , volume=
[39]

Advances in neural information processing systems , volume=

Improved precision and recall metric for assessing generative models , author=. Advances in neural information processing systems , volume=
[40]

IEEE transactions on image processing , volume=

Image quality assessment: from error visibility to structural similarity , author=. IEEE transactions on image processing , volume=. 2004 , publisher=

2004
[41]

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , pages=

Dasp: Specific dense matrix multiply-accumulate units accelerated general sparse matrix-vector multiplication , author=. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , pages=
[42]

arXiv preprint arXiv:2511.13061 , year=

MACKO: Sparse Matrix-Vector Multiplication for Low Sparsity , author=. arXiv preprint arXiv:2511.13061 , year=

work page arXiv

[1] [1]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Scalable diffusion models with transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

[2] [2]

Chen, Junsong and Yu, Jincheng and Ge, Chongjian and Yao, Lewei and Xie, Enze and Wu, Yue and Wang, Zhongdao and Kwok, James and Luo, Ping and Lu, Huchuan and others , journal=. Pixart-

[3] [3]

FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space

FLUX. 1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space , author=. arXiv preprint arXiv:2506.15742 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Advances in neural information processing systems , volume=

Optimal brain damage , author=. Advances in neural information processing systems , volume=

[5] [5]

IEEE international conference on neural networks , pages=

Optimal brain surgeon and general network pruning , author=. IEEE international conference on neural networks , pages=. 1993 , organization=

1993

[6] [6]

Advances in neural information processing systems , volume=

Learning both weights and connections for efficient neural network , author=. Advances in neural information processing systems , volume=

[7] [7]

Advances in Neural Information Processing Systems , volume=

Optimal brain compression: A framework for accurate post-training quantization and pruning , author=. Advances in Neural Information Processing Systems , volume=

[8] [8]

Rethinking the Value of Network Pruning

Rethinking the value of network pruning , author=. arXiv preprint arXiv:1810.05270 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Proceedings of machine learning and systems , volume=

What is the state of neural network pruning? , author=. Proceedings of machine learning and systems , volume=

[10] [10]

Pruning Convolutional Neural Networks for Resource Efficient Inference

Pruning convolutional neural networks for resource efficient inference , author=. arXiv preprint arXiv:1611.06440 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

Advances in neural information processing systems , volume=

Are sixteen heads really better than one? , author=. Advances in neural information processing systems , volume=

[12] [12]

International conference on machine learning , pages=

Sparsegpt: Massive language models can be accurately pruned in one-shot , author=. International conference on machine learning , pages=. 2023 , organization=

2023

[13] [13]

A Simple and Effective Pruning Approach for Large Language Models

A simple and effective pruning approach for large language models , author=. arXiv preprint arXiv:2306.11695 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Advances in Neural Information Processing Systems , year=

Structural pruning for diffusion models , author=. Advances in Neural Information Processing Systems , year=

[15] [15]

International Conference on Medical image computing and computer-assisted intervention , pages=

U-net: Convolutional networks for biomedical image segmentation , author=. International Conference on Medical image computing and computer-assisted intervention , pages=. 2015 , organization=

2015

[16] [16]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=

[17] [17]

The Thirteenth International Conference on Learning Representations , year=

SANA: Efficient high-resolution text-to-image synthesis with linear diffusion transformers , author=. The Thirteenth International Conference on Learning Representations , year=

[18] [18]

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding , author=. arXiv preprint arXiv:1510.00149 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[19] [19]

Advances in Neural Information Processing Systems , volume=

A fast post-training pruning framework for transformers , author=. Advances in Neural Information Processing Systems , volume=

[20] [20]

arXiv preprint arXiv:2002.04809 , year=

Lookahead: A far-sighted alternative of magnitude-based pruning , author=. arXiv preprint arXiv:2002.04809 , year=

work page arXiv 2002

[21] [21]

Advances in neural information processing systems , volume=

Llm-pruner: On the structural pruning of large language models , author=. Advances in neural information processing systems , volume=

[22] [22]

Advances in Neural Information Processing Systems , volume=

Slimgpt: Layer-wise structured pruning for large language models , author=. Advances in Neural Information Processing Systems , volume=

[23] [23]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

Structured optimal brain pruning for large language models , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

2024

[24] [24]

European conference on computer vision , pages=

Microsoft coco: Common objects in context , author=. European conference on computer vision , pages=. 2014 , organization=

2014

[25] [25]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Q-diffusion: Quantizing diffusion models , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[26] [26]

European Conference on Computer Vision , pages=

Mixdq: Memory-efficient few-step text-to-image diffusion models with metric-decoupled mixed precision quantization , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024

[27] [27]

International conference on machine learning , pages=

Improved denoising diffusion probabilistic models , author=. International conference on machine learning , pages=. 2021 , organization=

2021

[28] [28]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Post-training quantization on diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

[29] [29]

Advances in neural information processing systems , volume=

Ptq4dit: Post-training quantization for diffusion transformers , author=. Advances in neural information processing systems , volume=

[30] [30]

International journal of computer vision , volume=

Imagenet large scale visual recognition challenge , author=. International journal of computer vision , volume=. 2015 , publisher=

2015

[31] [31]

Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation

Playground v2. 5: Three insights towards enhancing aesthetic quality in text-to-image generation , author=. arXiv preprint arXiv:2402.17245 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[32] [32]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Q-dit: Accurate post-training quantization for diffusion transformers , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

[33] [33]

Advances in neural information processing systems , volume=

Gans trained by a two time-scale update rule converge to a local nash equilibrium , author=. Advances in neural information processing systems , volume=

[34] [34]

arXiv preprint arXiv:2103.03841 , year=

Generating images with sparse representations , author=. arXiv preprint arXiv:2103.03841 , year=

work page arXiv

[35] [35]

Improved Techniques for Training GANs , volume =

Salimans, Tim and Goodfellow, Ian and Zaremba, Wojciech and Cheung, Vicki and Radford, Alec and Chen, Xi and Chen, Xi , booktitle =. Improved Techniques for Training GANs , volume =

[36] [36]

A Note on the Inception Score

A note on the inception score , author=. arXiv preprint arXiv:1801.01973 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[37] [37]

Proceedings of the 2021 conference on empirical methods in natural language processing , pages=

Clipscore: A reference-free evaluation metric for image captioning , author=. Proceedings of the 2021 conference on empirical methods in natural language processing , pages=

2021

[38] [38]

Advances in Neural Information Processing Systems , volume=

Imagereward: Learning and evaluating human preferences for text-to-image generation , author=. Advances in Neural Information Processing Systems , volume=

[39] [39]

Advances in neural information processing systems , volume=

Improved precision and recall metric for assessing generative models , author=. Advances in neural information processing systems , volume=

[40] [40]

IEEE transactions on image processing , volume=

Image quality assessment: from error visibility to structural similarity , author=. IEEE transactions on image processing , volume=. 2004 , publisher=

2004

[41] [41]

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , pages=

Dasp: Specific dense matrix multiply-accumulate units accelerated general sparse matrix-vector multiplication , author=. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , pages=

[42] [42]

arXiv preprint arXiv:2511.13061 , year=

MACKO: Sparse Matrix-Vector Multiplication for Low Sparsity , author=. arXiv preprint arXiv:2511.13061 , year=

work page arXiv