Cross-Resolution Diffusion Models via Network Pruning

Huan Wang; Jiaxuan Ren; Junhan Zhu

arxiv: 2604.05524 · v1 · submitted 2026-04-07 · 💻 cs.CV

Cross-Resolution Diffusion Models via Network Pruning

Jiaxuan Ren , Junhan Zhu , Huan Wang This is my paper

Pith reviewed 2026-05-10 19:57 UTC · model grok-4.3

classification 💻 cs.CV

keywords diffusion modelsimage synthesisnetwork pruningcross-resolution generationUNet architectureperceptual fidelitysemantic coherenceprompt refinement

0 comments

The pith

Pruning certain weights in diffusion models restores image quality at resolutions not seen during training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Diffusion models trained at fixed resolutions produce lower quality images when asked to generate at different sizes. The paper identifies that some network parameters helpful at the trained size turn harmful at other sizes, disrupting the model's internal structure. CR-Diff removes these problematic weights using a block-wise pruning process and then boosts the remaining output to clean up the result. This leads to better images across different scales on multiple model types, with little loss at the original resolution, and allows users to refine results for specific text prompts. The finding matters for applications where flexible image sizes are needed without retraining the entire system.

Core claim

The core discovery is that resolution shifts cause certain weights in the UNet of diffusion models to become adverse, weakening semantic alignment and causing instability. By selectively pruning these adverse weights in a block-wise manner and amplifying the pruned predictions, CR-Diff achieves improved perceptual fidelity and semantic coherence at unseen resolutions across various backbones while preserving default performance and enabling prompt-specific refinements.

What carries the argument

Block-wise pruning of resolution-dependent adverse weights in the diffusion UNet, followed by pruned output amplification to purify predictions.

Load-bearing premise

That the degradation at different resolutions stems mainly from identifiable adverse weights removable by block-wise pruning without causing new problems in the model.

What would settle it

Running the unpruned model and the pruned model on the same set of prompts at a shifted resolution and checking whether the pruned version shows measurably higher perceptual quality and fewer structural artifacts; failure to improve would challenge the claim.

Figures

Figures reproduced from arXiv: 2604.05524 by Huan Wang, Jiaxuan Ren, Junhan Zhu.

**Figure 1.** Figure 1: This paper presents CR-Diff, a method to improve the cross-resolution visual consistency of UNet–based diffusion models by masking out some parameters in the model, i.e., network pruning – a technique that has been widely used for reducing model size; while here, we novelly repurpose it for generalizing diffusion models to unseen resolutions. The samples above compare the original SDXL [33] model with its … view at source ↗

**Figure 2.** Figure 2: Effects of magnitude-based unstructured pruning on [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of CR-Diff. Most UNet–based diffusion models exhibit resolution-dependent degradation when generating at unseen scales. CR-Diff addresses this issue through a two-stage pruning and optimizing process, consisting of a block-wise (B-W) pruning ratio strategy and a pruned output amplification (POA) mechanism. As shown in [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Simulated annealing (SA) search process for deter [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Block-wise (B-W) pruning applies differentiated [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 6.** Figure 6: Visual comparison across three generation settings. [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Ablation results of block-wise pruning. (a) Performance comparison under uniform and block-wise pruning strategies [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Radar comparison across pruning strategies on [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: Additional cross-resolution comparisons between SDXL [ [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: Additional cross-resolution comparisons between SDXL [ [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

**Figure 11.** Figure 11: Additional cross-resolution comparison on a subset of 5K prompts from the MS-COCO 2014 validation set [ [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗

**Figure 12.** Figure 12: Additional cross-resolution comparison on a subset of 5K prompts from the MS-COCO 2014 validation set [ [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗

**Figure 13.** Figure 13: Additional cross-resolution comparison on a subset of 5K prompts from the MS-COCO 2014 validation set [ [PITH_FULL_IMAGE:figures/full_fig_p016_13.png] view at source ↗

**Figure 14.** Figure 14: Visual comparison across two generation settings. [PITH_FULL_IMAGE:figures/full_fig_p017_14.png] view at source ↗

read the original abstract

Diffusion models have demonstrated impressive image synthesis performance, yet many UNet-based models are trained at certain fixed resolutions. Their quality tends to degrade when generating images at out-of-training resolutions. We trace this issue to resolution-dependent parameter behaviors, where weights that function well at the default resolution can become adverse when spatial scales shift, weakening semantic alignment and causing structural instability in the UNet architecture. Based on this analysis, this paper introduces CR-Diff, a novel method that improves the cross-resolution visual consistency by pruning some parameters of the diffusion model. Specifically, CR-Diff has two stages. It first performs block-wise pruning to selectively eliminate adverse weights. Then, a pruned output amplification is conducted to further purify the pruned predictions. Empirically, extensive experiments suggest that CR-Diff can improve perceptual fidelity and semantic coherence across various diffusion backbones and unseen resolutions, while largely preserving the performance at default resolutions. Additionally, CR-Diff supports prompt-specific refinement, enabling quality enhancement on demand.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CR-Diff applies block-wise pruning plus output amplification to diffusion UNets for better out-of-training resolutions, but the gains are not shown to come from the claimed adverse-weight mechanism rather than generic regularization.

read the letter

The paper's core idea is straightforward: diffusion UNets degrade at unseen resolutions because some weights that work at the training scale become harmful when spatial dimensions change. CR-Diff tries to fix this with two steps—block-wise pruning to drop those weights, followed by amplifying the pruned model's output to clean up the predictions. This combination is presented as new for diffusion models, even though pruning itself is established elsewhere. The practical focus is useful; many people run these models at varying sizes and would like a lightweight way to stabilize them without retraining from scratch. The authors also note it works across backbones and can be applied prompt-specifically, which is a reasonable engineering angle. The main weakness is that the causal claim is not isolated. The abstract and stress-test note both flag the absence of ablations that would show the block-wise criterion outperforms random or uniform pruning, or that amplification is doing more than simple scaling. Without those controls or reported metrics, it is hard to rule out that any structured sparsity plus post-processing would produce similar regularization effects. The full paper presumably contains the experiments, but the current description leaves the central analysis vulnerable. This work is aimed at computer-vision practitioners who deploy diffusion models and need resolution flexibility. Someone building on UNet variants or model-compression techniques could extract the two-stage recipe and test it themselves. It is coherent enough on its own terms to warrant a serious referee, even if the mechanism needs tighter validation. I would send it to review rather than desk-reject, with the expectation that reviewers will press for quantitative ablations and comparisons to other resolution-robust baselines.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes CR-Diff, a two-stage post-training procedure for UNet-based diffusion models. It first performs block-wise pruning to remove weights that exhibit adverse behavior at out-of-training resolutions (identified via analysis of resolution-dependent parameter effects), then applies pruned output amplification to refine the predictions. The central claim is that this selectively improves perceptual fidelity and semantic coherence at unseen resolutions across multiple backbones while largely preserving performance at the default training resolution, and that it additionally supports prompt-specific refinement.

Significance. If the hypothesized causal mechanism is isolated and the empirical gains are reproducible, the work would provide a lightweight, training-free adaptation strategy for increasing resolution flexibility in pre-trained diffusion models. This addresses a practical limitation in current generative pipelines without the cost of full retraining or architectural changes.

major comments (2)

[Method (two-stage procedure)] The central claim requires that resolution-dependent adverse weights can be reliably identified and that their removal (plus amplification) produces gains beyond generic pruning effects. No ablation is described that compares the block-wise adverse-weight criterion against random pruning, magnitude-based pruning, or pruning without the subsequent amplification stage; without such controls, improvements could be explained by sparsity-induced regularization rather than the proposed mechanism.
[Abstract and Experiments] The abstract asserts that 'extensive experiments suggest' improvements in perceptual fidelity and semantic coherence, yet supplies no quantitative metrics, tables of FID/LPIPS scores, ablation tables, or error bars at specific unseen resolutions. This absence makes it impossible to assess effect sizes or consistency of the cross-resolution gains.

minor comments (2)

[Method] The term 'pruned output amplification' is introduced without a formal definition, equation, or pseudocode; a precise formulation of the amplification operator would improve reproducibility.
[Method] The manuscript should clarify whether the block-wise pruning decisions are made once per backbone or recomputed per prompt, as the prompt-specific refinement claim implies the latter but the description is ambiguous.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate the requested controls and quantitative results.

read point-by-point responses

Referee: [Method (two-stage procedure)] The central claim requires that resolution-dependent adverse weights can be reliably identified and that their removal (plus amplification) produces gains beyond generic pruning effects. No ablation is described that compares the block-wise adverse-weight criterion against random pruning, magnitude-based pruning, or pruning without the subsequent amplification stage; without such controls, improvements could be explained by sparsity-induced regularization rather than the proposed mechanism.

Authors: We agree that the manuscript would be strengthened by explicit ablations isolating the block-wise adverse-weight criterion. In the revised version we will add comparisons to random pruning, magnitude-based pruning, and the pruning stage without amplification. These controls will be reported with the same evaluation protocol to demonstrate that the observed gains exceed generic sparsity effects and arise from the resolution-dependent analysis. revision: yes
Referee: [Abstract and Experiments] The abstract asserts that 'extensive experiments suggest' improvements in perceptual fidelity and semantic coherence, yet supplies no quantitative metrics, tables of FID/LPIPS scores, ablation tables, or error bars at specific unseen resolutions. This absence makes it impossible to assess effect sizes or consistency of the cross-resolution gains.

Authors: We acknowledge that the abstract and main text currently lack the requested quantitative tables and error bars. The revised manuscript will expand the abstract to reference key metrics and will include new tables reporting FID, LPIPS, and other scores with standard deviations across multiple unseen resolutions and backbones. This will allow direct assessment of effect sizes and reproducibility. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper traces degradation to resolution-dependent adverse weights via empirical analysis, then applies block-wise pruning followed by output amplification as a two-stage procedure. This chain does not reduce any central claim to a self-defined quantity, a fitted parameter renamed as prediction, or a load-bearing self-citation; the method is presented as an external, experimentally validated intervention whose effectiveness is tested on multiple backbones and unseen resolutions rather than derived tautologically from its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone does not specify any free parameters, axioms, or invented entities; pruning thresholds and amplification factors are likely implicit but unstated.

pith-pipeline@v0.9.0 · 5464 in / 1021 out tokens · 21465 ms · 2026-05-10T19:57:07.150872+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We trace this issue to resolution-dependent parameter behaviors, where weights that function well at the default resolution can become adverse when spatial scales shift... block-wise pruning to selectively eliminate adverse weights. Then, a pruned output amplification...
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CR-Diff has two stages. It first performs block-wise pruning... pruned output amplification... k>1

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 1 internal anchor

[1]

Net-trim: Convex pruning of deep neural net- works with performance guarantee

Alireza Aghasi, Afshin Abdi, Nam Nguyen, and Justin Romberg. Net-trim: Convex pruning of deep neural net- works with performance guarantee. InNeurIPS, 2017. 3

work page 2017
[2]

Flux.https : / / blackforestlabs.ai/, 2024

Black Forest Labs. Flux.https : / / blackforestlabs.ai/, 2024. Accessed: 2025-09-

work page 2024
[3]

Ld-pruner: Efficient pruning of la- tent diffusion models using task-agnostic insights

Thibault Castells, Hyoung-Kyu Song, Bo-Kyeong Kim, and Shinkook Choi. Ld-pruner: Efficient pruning of la- tent diffusion models using task-agnostic insights. In CVPR, 2024. 2, 3

work page 2024
[4]

Pixart-σ: Weak-to-strong training of diffusion transformer for 4k text-to-image generation

Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, and Zhenguo Li. Pixart-σ: Weak-to-strong training of diffusion transformer for 4k text-to-image generation. InECCV, 2024. 2

work page 2024
[5]

Pixart-σ: Weak-to-strong training of diffusion transformer for 4k text-to-image generation

Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, and Zhenguo Li. Pixart-σ: Weak-to-strong training of diffusion transformer for 4k text-to-image generation. InECCV. Springer, 2024

work page 2024
[6]

Pixart-α: Fast training of diffusion transformer for photorealistic text-to-image synthesis

Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Zhongdao Wang, James T Kwok, Ping Luo, Huchuan Lu, and Zhenguo Li. Pixart-α: Fast training of diffusion transformer for photorealistic text-to-image synthesis. InICLR, 2024. 2

work page 2024
[7]

Sana-sprint: One-step diffusion with continuous-time consistency distillation

Junsong Chen, Shuchen Xue, Yuyang Zhao, Jincheng Yu, Sayak Paul, Junyu Chen, Han Cai, Song Han, and Enze Xie. Sana-sprint: One-step diffusion with continuous-time consistency distillation. InICCV, 2025. 2

work page 2025
[8]

Diffusion mod- els beat gans on image synthesis

Prafulla Dhariwal and Alexander Nichol. Diffusion mod- els beat gans on image synthesis. InNeurIPS, 2021. 2

work page 2021
[9]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale. InICLR,

work page
[10]

Scaling recti- fied flow transformers for high-resolution image synthe- sis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling recti- fied flow transformers for high-resolution image synthe- sis. InICML, 2024. 2, 3, 7

work page 2024
[11]

Depgraph: Towards any structural pruning

Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, and Xinchao Wang. Depgraph: Towards any structural pruning. InCVPR, 2023. 2, 3

work page 2023
[12]

Struc- tural pruning for diffusion models

Gongfan Fang, Xinyin Ma, and Xinchao Wang. Struc- tural pruning for diffusion models. InNeurIPS, 2023. 2

work page 2023
[13]

Tinyfusion: Diffusion transformers learned shal- low

Gongfan Fang, Kunjun Li, Xinyin Ma, and Xinchao Wang. Tinyfusion: Diffusion transformers learned shal- low. InCVPR, 2025. 2

work page 2025
[14]

Is oracle prun- ing the true oracle?arXiv preprint arXiv:2412.00143,

Sicheng Feng, Keda Tao, and Huan Wang. Is oracle prun- ing the true oracle?arXiv preprint arXiv:2412.00143,

work page arXiv
[15]

Optimal brain compres- sion: A framework for accurate post-training quantiza- tion and pruning

Elias Frantar and Dan Alistarh. Optimal brain compres- sion: A framework for accurate post-training quantiza- tion and pruning. InNeurIPS, 2022. 3

work page 2022
[16]

Sparsegpt: Massive lan- guage models can be accurately pruned in one-shot

Elias Frantar and Dan Alistarh. Sparsegpt: Massive lan- guage models can be accurately pruned in one-shot. In ICML, 2023. 3

work page 2023
[17]

Learning both weights and connections for efficient neu- ral network

Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neu- ral network. InNeurIPS, 2015. 2, 3

work page 2015
[18]

Deep com- pression: Compressing deep neural network with prun- ing, trained quantization and huffman coding

Song Han, Huizi Mao, and William J Dally. Deep com- pression: Compressing deep neural network with prun- ing, trained quantization and huffman coding. InICLR,

work page
[19]

Op- timal brain surgeon and general network pruning

Babak Hassibi, David G Stork, and Gregory J Wolff. Op- timal brain surgeon and general network pruning. In NeurIPS, 1992. 3

work page 1992
[20]

Clipscore: A reference-free evaluation metric for image captioning

Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. Clipscore: A reference-free evaluation metric for image captioning. InEMNLP. As- sociation for Computational Linguistics, 2021. 5

work page 2021
[21]

Gans trained by a two time-scale update rule converge to a local nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. InNeurIPS, 2017. 5

work page 2017
[22]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InNeurIPS, 2020. 2, 3

work page 2020
[23]

Bk-sdm: A lightweight, fast, and cheap version of stable diffusion

Bo-Kyeong Kim, Hyoung-Kyu Song, Thibault Castells, and Shinkook Choi. Bk-sdm: A lightweight, fast, and cheap version of stable diffusion. InECCV, 2024. 3

work page 2024
[24]

Pick-a-pic: An open dataset of user preferences for text-to-image gener- ation

Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-pic: An open dataset of user preferences for text-to-image gener- ation. InNeurIPS, 2023. 5

work page 2023
[25]

Optimal brain damage

Yann LeCun, John Denker, and Sara Solla. Optimal brain damage. InNeurIPS, 1989. 3

work page 1989
[26]

Snapfusion: Text-to-image diffusion model on mobile devices within two seconds

Yanyu Li, Huan Wang, Qing Jin, Ju Hu, Pavlo Chemerys, Yun Fu, Yanzhi Wang, Sergey Tulyakov, and Jian Ren. Snapfusion: Text-to-image diffusion model on mobile devices within two seconds. InNeurIPS, 2023. 2, 3

work page 2023
[27]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InECCV, 2014. 5, 2, 4, 6

work page 2014
[28]

Slimgpt: Layer-wise structured pruning for large language mod- els

Gui Ling, Ziyang Wang, and Qingwen Liu. Slimgpt: Layer-wise structured pruning for large language mod- els. InNeurIPS, 2024. 3

work page 2024
[29]

Importance estimation for neural net- work pruning

Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Fro- sio, and Jan Kautz. Importance estimation for neural net- work pruning. InCVPR, 2019. 8

work page 2019
[30]

Glide: Towards photo- realistic image generation and editing with text-guided diffusion models

Alexander Quinn Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob Mcgrew, Ilya Sutskever, and Mark Chen. Glide: Towards photo- realistic image generation and editing with text-guided diffusion models. InICML, 2022. 2 9

work page 2022
[31]

NovelAI improvements on Stable Diffusion

NovelAI. NovelAI improvements on Stable Diffusion. https : / / blog . novelai . net / novelai - improvements - on - stable - diffusion - e10d38db82ac, 2022. 2, 3

work page 2022
[32]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InICCV, 2023. 3

work page 2023
[33]

Sdxl: Improving latent diffusion mod- els for high-resolution image synthesis

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion mod- els for high-resolution image synthesis. InICLR, 2024. 1, 2, 3, 5, 8

work page 2024
[34]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents.arXiv preprint arXiv:2204.06125, 2022. 2

work page internal anchor Pith review Pith/arXiv arXiv 2022
[35]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, 2022. 2, 3, 5

work page 2022
[36]

U- net: Convolutional networks for biomedical image seg- mentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image seg- mentation. InMICCAI, 2015. 2

work page 2015
[37]

Photorealistic text-to-image diffusion models with deep language understanding

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Sali- mans, et al. Photorealistic text-to-image diffusion models with deep language understanding. InNeurIPS, 2022. 2

work page 2022
[38]

Progressive distillation for fast sampling of diffusion models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InICLR, 2022. 2

work page 2022
[39]

Sparse learning for state space models on mobile

Xuan Shen, Hangyu Zheng, Yifan Gong, Zhenglun Kong, Changdi Yang, Zheng Zhan, Yushu Wu, Xue Lin, Yanzhi Wang, Pu Zhao, et al. Sparse learning for state space models on mobile. InICLR, 2025. 3

work page 2025
[40]

Ef- ficient unstructured pruning of mamba state-space mod- els for resource-constrained environments

Ibne Farabi Shihab, Sanjeda Akter, and Anuj Sharma. Ef- ficient unstructured pruning of mamba state-space mod- els for resource-constrained environments. InEMNLP,

work page
[41]

Deep unsupervised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Mah- eswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, 2015. 2

work page 2015
[42]

De- noising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. De- noising diffusion implicit models. InICLR, 2021

work page 2021
[43]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. InNeurIPS, 2019

work page 2019
[44]

Score- based generative modeling through stochastic differential equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score- based generative modeling through stochastic differential equations. InICLR, 2021. 2

work page 2021
[45]

A simple and effective pruning approach for large lan- guage models

Mingjie Sun, Zhuang Liu, Anna Bair, and J Zico Kolter. A simple and effective pruning approach for large lan- guage models. InICLR, 2024. 3, 8

work page 2024
[46]

Sparsessm: Efficient se- lective structured state space models can be pruned in one-shot.arXiv preprint arXiv:2506.09613, 2025

Kaiwen Tuo and Huan Wang. Sparsessm: Efficient se- lective structured state space models can be pruned in one-shot.arXiv preprint arXiv:2506.09613, 2025. 3

work page arXiv 2025
[47]

Trainability preserving neural pruning

Huan Wang and Yun Fu. Trainability preserving neural pruning. InICLR, 2023. 2

work page 2023
[48]

Neural pruning via growing regularization

Huan Wang, Can Qin, Yulun Zhang, and Yun Fu. Neural pruning via growing regularization. InICLR, 2021. 2, 3

work page 2021
[49]

Structured optimal brain pruning for large language models

Jiateng Wei, Quan Lu, Ning Jiang, Siqi Li, Jingyang Xi- ang, Jun Chen, and Yong Liu. Structured optimal brain pruning for large language models. InEMNLP, 2024. 3

work page 2024
[50]

Sana: Efficient high-resolution image syn- thesis with linear diffusion transformers

Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, et al. Sana: Efficient high-resolution image syn- thesis with linear diffusion transformers. InICLR, 2025. 2

work page 2025
[51]

Sana 1.5: Efficient scaling of training-time and inference-time compute in linear diffu- sion transformer

Enze Xie, Junsong Chen, Yuyang Zhao, Jincheng YU, Ligeng Zhu, Yujun Lin, Zhekai Zhang, Muyang Li, Junyu Chen, Han Cai, et al. Sana 1.5: Efficient scaling of training-time and inference-time compute in linear diffu- sion transformer. InICML, 2025. 2

work page 2025
[52]

Im- agereward: Learning and evaluating human preferences for text-to-image generation

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Im- agereward: Learning and evaluating human preferences for text-to-image generation. InNeurIPS, 2023. 5

work page 2023
[53]

Laptop-diff: Layer pruning and normalized dis- tillation for compressing diffusion models.arXiv preprint arXiv:2404.11098, 2024

Dingkun Zhang, Sijia Li, Chen Chen, Qingsong Xie, and Haonan Lu. Laptop-diff: Layer pruning and normal- ized distillation for compressing diffusion models.arXiv preprint arXiv:2404.11098, 2024. 3

work page arXiv 2024
[54]

Effortless efficiency: Low-cost pruning of diffusion models

Yang Zhang, Er Jin, Yanfei Dong, Ashkan Khakzar, Philip Torr, Johannes Stegmaier, and Kenji Kawaguchi. Effortless efficiency: Low-cost pruning of diffusion models.arXiv preprint arXiv:2412.02852, 2024. 3

work page arXiv 2024
[55]

Mobilediffusion: Instant text-to-image gen- eration on mobile devices

Yang Zhao, Yanwu Xu, Zhisheng Xiao, Haolin Jia, and Tingbo Hou. Mobilediffusion: Instant text-to-image gen- eration on mobile devices. InECCV, 2024. 2, 3

work page 2024
[56]

arXiv preprint arXiv:2510.06751 (2025)

Junhan Zhu, Hesong Wang, Mingluo Su, Zefang Wang, and Huan Wang. Obs-diff: Accurate prun- ing for diffusion models in one-shot.arXiv preprint arXiv:2510.06751, 2025. 3, 8 10 Cross-Resolution Diffusion Models via Network Pruning Supplementary Material

work page arXiv 2025
[57]

Block-wise Pruning Ratio Configurations As discussed in Section 3.1, the UNet architecture com- prises downsampling, middle, and upsampling blocks, which differ in redundancy and tolerance to parameter removal. This is further supported by our pruning ra- tio search experiments across multiple diffusion model families and sampling resolutions, with the re...

work page
[58]

This output-level refinement consistently improves generative quality across architectures and resolutions

Full Ablation Study of POA To more comprehensively illustrate the effect of the pruned output amplification(POA) mechanism, we pro- vide the full ablation results across models and resolu- tions in Table 7, which were omitted from the main pa- per due to space constraints. This output-level refinement consistently improves generative quality across archit...

work page
[59]

The hyperparameters in- clude the initial temperatureT init, cooling rateα, it- eration budgetN iter, a set of candidate seedsS seeds, and a restart limitR max

Simulated Annealing (SA) Algorithm Algorithm 1 summarizes the simulated annealing (SA) routine used to search for the optimal pruning ratio con- figurationr=r down, rmid, rup. The hyperparameters in- clude the initial temperatureT init, cooling rateα, it- eration budgetN iter, a set of candidate seedsS seeds, and a restart limitR max. Starting from the be...

work page arXiv
[60]

SDXL, natively trained at 1024×1024with a resampler and high-resolution cross- attention, effectively internalizes dense object struc- tures and sharp boundaries

Analyses on Unseen Resolutions Beyond the detailed analysis in Section 4.2, which demonstrates consistent improvements under CR-Diff at unseen resolutions, we provide additional analyses at higher resolutions for SDXL. SDXL, natively trained at 1024×1024with a resampler and high-resolution cross- attention, effectively internalizes dense object struc- tur...

work page
[61]

Expanded Qualitative Analyses Representative Teaser Results.In Figures 9 and 10, we present additional representative teaser examples fol- lowing the style of Figure 1, further illustrating the ef- fectiveness of CR-Diff in enhancing cross-resolution vi- sual consistency over the dense SDXL [33]. Results on the 5K Dataset.In Figures 11, 12, and 13, we pre...

work page 2014

[1] [1]

Net-trim: Convex pruning of deep neural net- works with performance guarantee

Alireza Aghasi, Afshin Abdi, Nam Nguyen, and Justin Romberg. Net-trim: Convex pruning of deep neural net- works with performance guarantee. InNeurIPS, 2017. 3

work page 2017

[2] [2]

Flux.https : / / blackforestlabs.ai/, 2024

Black Forest Labs. Flux.https : / / blackforestlabs.ai/, 2024. Accessed: 2025-09-

work page 2024

[3] [3]

Ld-pruner: Efficient pruning of la- tent diffusion models using task-agnostic insights

Thibault Castells, Hyoung-Kyu Song, Bo-Kyeong Kim, and Shinkook Choi. Ld-pruner: Efficient pruning of la- tent diffusion models using task-agnostic insights. In CVPR, 2024. 2, 3

work page 2024

[4] [4]

Pixart-σ: Weak-to-strong training of diffusion transformer for 4k text-to-image generation

Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, and Zhenguo Li. Pixart-σ: Weak-to-strong training of diffusion transformer for 4k text-to-image generation. InECCV, 2024. 2

work page 2024

[5] [5]

Pixart-σ: Weak-to-strong training of diffusion transformer for 4k text-to-image generation

Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, and Zhenguo Li. Pixart-σ: Weak-to-strong training of diffusion transformer for 4k text-to-image generation. InECCV. Springer, 2024

work page 2024

[6] [6]

Pixart-α: Fast training of diffusion transformer for photorealistic text-to-image synthesis

Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Zhongdao Wang, James T Kwok, Ping Luo, Huchuan Lu, and Zhenguo Li. Pixart-α: Fast training of diffusion transformer for photorealistic text-to-image synthesis. InICLR, 2024. 2

work page 2024

[7] [7]

Sana-sprint: One-step diffusion with continuous-time consistency distillation

Junsong Chen, Shuchen Xue, Yuyang Zhao, Jincheng Yu, Sayak Paul, Junyu Chen, Han Cai, Song Han, and Enze Xie. Sana-sprint: One-step diffusion with continuous-time consistency distillation. InICCV, 2025. 2

work page 2025

[8] [8]

Diffusion mod- els beat gans on image synthesis

Prafulla Dhariwal and Alexander Nichol. Diffusion mod- els beat gans on image synthesis. InNeurIPS, 2021. 2

work page 2021

[9] [9]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale. InICLR,

work page

[10] [10]

Scaling recti- fied flow transformers for high-resolution image synthe- sis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling recti- fied flow transformers for high-resolution image synthe- sis. InICML, 2024. 2, 3, 7

work page 2024

[11] [11]

Depgraph: Towards any structural pruning

Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, and Xinchao Wang. Depgraph: Towards any structural pruning. InCVPR, 2023. 2, 3

work page 2023

[12] [12]

Struc- tural pruning for diffusion models

Gongfan Fang, Xinyin Ma, and Xinchao Wang. Struc- tural pruning for diffusion models. InNeurIPS, 2023. 2

work page 2023

[13] [13]

Tinyfusion: Diffusion transformers learned shal- low

Gongfan Fang, Kunjun Li, Xinyin Ma, and Xinchao Wang. Tinyfusion: Diffusion transformers learned shal- low. InCVPR, 2025. 2

work page 2025

[14] [14]

Is oracle prun- ing the true oracle?arXiv preprint arXiv:2412.00143,

Sicheng Feng, Keda Tao, and Huan Wang. Is oracle prun- ing the true oracle?arXiv preprint arXiv:2412.00143,

work page arXiv

[15] [15]

Optimal brain compres- sion: A framework for accurate post-training quantiza- tion and pruning

Elias Frantar and Dan Alistarh. Optimal brain compres- sion: A framework for accurate post-training quantiza- tion and pruning. InNeurIPS, 2022. 3

work page 2022

[16] [16]

Sparsegpt: Massive lan- guage models can be accurately pruned in one-shot

Elias Frantar and Dan Alistarh. Sparsegpt: Massive lan- guage models can be accurately pruned in one-shot. In ICML, 2023. 3

work page 2023

[17] [17]

Learning both weights and connections for efficient neu- ral network

Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neu- ral network. InNeurIPS, 2015. 2, 3

work page 2015

[18] [18]

Deep com- pression: Compressing deep neural network with prun- ing, trained quantization and huffman coding

Song Han, Huizi Mao, and William J Dally. Deep com- pression: Compressing deep neural network with prun- ing, trained quantization and huffman coding. InICLR,

work page

[19] [19]

Op- timal brain surgeon and general network pruning

Babak Hassibi, David G Stork, and Gregory J Wolff. Op- timal brain surgeon and general network pruning. In NeurIPS, 1992. 3

work page 1992

[20] [20]

Clipscore: A reference-free evaluation metric for image captioning

Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. Clipscore: A reference-free evaluation metric for image captioning. InEMNLP. As- sociation for Computational Linguistics, 2021. 5

work page 2021

[21] [21]

Gans trained by a two time-scale update rule converge to a local nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. InNeurIPS, 2017. 5

work page 2017

[22] [22]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InNeurIPS, 2020. 2, 3

work page 2020

[23] [23]

Bk-sdm: A lightweight, fast, and cheap version of stable diffusion

Bo-Kyeong Kim, Hyoung-Kyu Song, Thibault Castells, and Shinkook Choi. Bk-sdm: A lightweight, fast, and cheap version of stable diffusion. InECCV, 2024. 3

work page 2024

[24] [24]

Pick-a-pic: An open dataset of user preferences for text-to-image gener- ation

Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-pic: An open dataset of user preferences for text-to-image gener- ation. InNeurIPS, 2023. 5

work page 2023

[25] [25]

Optimal brain damage

Yann LeCun, John Denker, and Sara Solla. Optimal brain damage. InNeurIPS, 1989. 3

work page 1989

[26] [26]

Snapfusion: Text-to-image diffusion model on mobile devices within two seconds

Yanyu Li, Huan Wang, Qing Jin, Ju Hu, Pavlo Chemerys, Yun Fu, Yanzhi Wang, Sergey Tulyakov, and Jian Ren. Snapfusion: Text-to-image diffusion model on mobile devices within two seconds. InNeurIPS, 2023. 2, 3

work page 2023

[27] [27]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InECCV, 2014. 5, 2, 4, 6

work page 2014

[28] [28]

Slimgpt: Layer-wise structured pruning for large language mod- els

Gui Ling, Ziyang Wang, and Qingwen Liu. Slimgpt: Layer-wise structured pruning for large language mod- els. InNeurIPS, 2024. 3

work page 2024

[29] [29]

Importance estimation for neural net- work pruning

Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Fro- sio, and Jan Kautz. Importance estimation for neural net- work pruning. InCVPR, 2019. 8

work page 2019

[30] [30]

Glide: Towards photo- realistic image generation and editing with text-guided diffusion models

Alexander Quinn Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob Mcgrew, Ilya Sutskever, and Mark Chen. Glide: Towards photo- realistic image generation and editing with text-guided diffusion models. InICML, 2022. 2 9

work page 2022

[31] [31]

NovelAI improvements on Stable Diffusion

NovelAI. NovelAI improvements on Stable Diffusion. https : / / blog . novelai . net / novelai - improvements - on - stable - diffusion - e10d38db82ac, 2022. 2, 3

work page 2022

[32] [32]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InICCV, 2023. 3

work page 2023

[33] [33]

Sdxl: Improving latent diffusion mod- els for high-resolution image synthesis

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion mod- els for high-resolution image synthesis. InICLR, 2024. 1, 2, 3, 5, 8

work page 2024

[34] [34]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents.arXiv preprint arXiv:2204.06125, 2022. 2

work page internal anchor Pith review Pith/arXiv arXiv 2022

[35] [35]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, 2022. 2, 3, 5

work page 2022

[36] [36]

U- net: Convolutional networks for biomedical image seg- mentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image seg- mentation. InMICCAI, 2015. 2

work page 2015

[37] [37]

Photorealistic text-to-image diffusion models with deep language understanding

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Sali- mans, et al. Photorealistic text-to-image diffusion models with deep language understanding. InNeurIPS, 2022. 2

work page 2022

[38] [38]

Progressive distillation for fast sampling of diffusion models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InICLR, 2022. 2

work page 2022

[39] [39]

Sparse learning for state space models on mobile

Xuan Shen, Hangyu Zheng, Yifan Gong, Zhenglun Kong, Changdi Yang, Zheng Zhan, Yushu Wu, Xue Lin, Yanzhi Wang, Pu Zhao, et al. Sparse learning for state space models on mobile. InICLR, 2025. 3

work page 2025

[40] [40]

Ef- ficient unstructured pruning of mamba state-space mod- els for resource-constrained environments

Ibne Farabi Shihab, Sanjeda Akter, and Anuj Sharma. Ef- ficient unstructured pruning of mamba state-space mod- els for resource-constrained environments. InEMNLP,

work page

[41] [41]

Deep unsupervised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Mah- eswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, 2015. 2

work page 2015

[42] [42]

De- noising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. De- noising diffusion implicit models. InICLR, 2021

work page 2021

[43] [43]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. InNeurIPS, 2019

work page 2019

[44] [44]

Score- based generative modeling through stochastic differential equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score- based generative modeling through stochastic differential equations. InICLR, 2021. 2

work page 2021

[45] [45]

A simple and effective pruning approach for large lan- guage models

Mingjie Sun, Zhuang Liu, Anna Bair, and J Zico Kolter. A simple and effective pruning approach for large lan- guage models. InICLR, 2024. 3, 8

work page 2024

[46] [46]

Sparsessm: Efficient se- lective structured state space models can be pruned in one-shot.arXiv preprint arXiv:2506.09613, 2025

Kaiwen Tuo and Huan Wang. Sparsessm: Efficient se- lective structured state space models can be pruned in one-shot.arXiv preprint arXiv:2506.09613, 2025. 3

work page arXiv 2025

[47] [47]

Trainability preserving neural pruning

Huan Wang and Yun Fu. Trainability preserving neural pruning. InICLR, 2023. 2

work page 2023

[48] [48]

Neural pruning via growing regularization

Huan Wang, Can Qin, Yulun Zhang, and Yun Fu. Neural pruning via growing regularization. InICLR, 2021. 2, 3

work page 2021

[49] [49]

Structured optimal brain pruning for large language models

Jiateng Wei, Quan Lu, Ning Jiang, Siqi Li, Jingyang Xi- ang, Jun Chen, and Yong Liu. Structured optimal brain pruning for large language models. InEMNLP, 2024. 3

work page 2024

[50] [50]

Sana: Efficient high-resolution image syn- thesis with linear diffusion transformers

Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, et al. Sana: Efficient high-resolution image syn- thesis with linear diffusion transformers. InICLR, 2025. 2

work page 2025

[51] [51]

Sana 1.5: Efficient scaling of training-time and inference-time compute in linear diffu- sion transformer

Enze Xie, Junsong Chen, Yuyang Zhao, Jincheng YU, Ligeng Zhu, Yujun Lin, Zhekai Zhang, Muyang Li, Junyu Chen, Han Cai, et al. Sana 1.5: Efficient scaling of training-time and inference-time compute in linear diffu- sion transformer. InICML, 2025. 2

work page 2025

[52] [52]

Im- agereward: Learning and evaluating human preferences for text-to-image generation

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Im- agereward: Learning and evaluating human preferences for text-to-image generation. InNeurIPS, 2023. 5

work page 2023

[53] [53]

Laptop-diff: Layer pruning and normalized dis- tillation for compressing diffusion models.arXiv preprint arXiv:2404.11098, 2024

Dingkun Zhang, Sijia Li, Chen Chen, Qingsong Xie, and Haonan Lu. Laptop-diff: Layer pruning and normal- ized distillation for compressing diffusion models.arXiv preprint arXiv:2404.11098, 2024. 3

work page arXiv 2024

[54] [54]

Effortless efficiency: Low-cost pruning of diffusion models

Yang Zhang, Er Jin, Yanfei Dong, Ashkan Khakzar, Philip Torr, Johannes Stegmaier, and Kenji Kawaguchi. Effortless efficiency: Low-cost pruning of diffusion models.arXiv preprint arXiv:2412.02852, 2024. 3

work page arXiv 2024

[55] [55]

Mobilediffusion: Instant text-to-image gen- eration on mobile devices

Yang Zhao, Yanwu Xu, Zhisheng Xiao, Haolin Jia, and Tingbo Hou. Mobilediffusion: Instant text-to-image gen- eration on mobile devices. InECCV, 2024. 2, 3

work page 2024

[56] [56]

arXiv preprint arXiv:2510.06751 (2025)

Junhan Zhu, Hesong Wang, Mingluo Su, Zefang Wang, and Huan Wang. Obs-diff: Accurate prun- ing for diffusion models in one-shot.arXiv preprint arXiv:2510.06751, 2025. 3, 8 10 Cross-Resolution Diffusion Models via Network Pruning Supplementary Material

work page arXiv 2025

[57] [57]

Block-wise Pruning Ratio Configurations As discussed in Section 3.1, the UNet architecture com- prises downsampling, middle, and upsampling blocks, which differ in redundancy and tolerance to parameter removal. This is further supported by our pruning ra- tio search experiments across multiple diffusion model families and sampling resolutions, with the re...

work page

[58] [58]

This output-level refinement consistently improves generative quality across architectures and resolutions

Full Ablation Study of POA To more comprehensively illustrate the effect of the pruned output amplification(POA) mechanism, we pro- vide the full ablation results across models and resolu- tions in Table 7, which were omitted from the main pa- per due to space constraints. This output-level refinement consistently improves generative quality across archit...

work page

[59] [59]

The hyperparameters in- clude the initial temperatureT init, cooling rateα, it- eration budgetN iter, a set of candidate seedsS seeds, and a restart limitR max

Simulated Annealing (SA) Algorithm Algorithm 1 summarizes the simulated annealing (SA) routine used to search for the optimal pruning ratio con- figurationr=r down, rmid, rup. The hyperparameters in- clude the initial temperatureT init, cooling rateα, it- eration budgetN iter, a set of candidate seedsS seeds, and a restart limitR max. Starting from the be...

work page arXiv

[60] [60]

SDXL, natively trained at 1024×1024with a resampler and high-resolution cross- attention, effectively internalizes dense object struc- tures and sharp boundaries

Analyses on Unseen Resolutions Beyond the detailed analysis in Section 4.2, which demonstrates consistent improvements under CR-Diff at unseen resolutions, we provide additional analyses at higher resolutions for SDXL. SDXL, natively trained at 1024×1024with a resampler and high-resolution cross- attention, effectively internalizes dense object struc- tur...

work page

[61] [61]

Expanded Qualitative Analyses Representative Teaser Results.In Figures 9 and 10, we present additional representative teaser examples fol- lowing the style of Figure 1, further illustrating the ef- fectiveness of CR-Diff in enhancing cross-resolution vi- sual consistency over the dense SDXL [33]. Results on the 5K Dataset.In Figures 11, 12, and 13, we pre...

work page 2014