Efficient One-Step Diffusion Restoration Model with Compact Token Compression and Linear Attention

Bingtian Qiao; Guangtao Zhai; Jiezhang Cao; Yingjie Zhou; Yong Guo; Yue Shi

arxiv: 2605.23451 · v1 · pith:6LMTFGM5new · submitted 2026-05-22 · 💻 cs.CV

Efficient One-Step Diffusion Restoration Model with Compact Token Compression and Linear Attention

Bingtian Qiao , Yue Shi , Yingjie Zhou , Yong Guo , Guangtao Zhai , Jiezhang Cao This is my paper

Pith reviewed 2026-05-25 04:39 UTC · model grok-4.3

classification 💻 cs.CV

keywords real-world image super-resolutiondiffusion modelstoken compressionlinear attentionefficient inferenceimage restorationDiTone-step generation

0 comments

The pith

SANA-SR restores real-world images via 32x token compression and linear-attention DiT in a single diffusion step.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies excessive token redundancy and quadratic-cost interactions as the core barrier to practical high-resolution real-world image super-resolution. It counters this by first applying a deep compression autoencoder that shrinks latent tokens by a factor of 32 while keeping restoration-relevant structures, then running a linear-attention diffusion transformer with LoRA fine-tuning on that compact space. The resulting one-step model matches or exceeds existing methods on standard benchmarks in both quantitative scores and visual texture realism. A reader would care because the pruned version delivers the quality at 0.019 seconds, 407.95G MACs, and 344M parameters, opening the door to mobile deployment.

Core claim

SANA-SR is an efficient one-step restoration framework that employs a deep compression autoencoder with a 32x compression ratio to drastically reduce latent tokens while preserving restoration-relevant structures and textures. On top of this compact latent space, a linear-attention DiT with LoRA fine-tuning performs high-resolution restoration with linear-complexity token mixing. Extensive experiments on all benchmark datasets show that SANA-SR achieves highly competitive and often superior quantitative performance against existing methods while restoring clearer and more realistic textures, and the deployed model runs in 0.019s with 407.95G MACs and 344M parameters.

What carries the argument

Deep compression autoencoder at 32x ratio combined with linear-attention DiT for token mixing.

If this is right

The model matches or exceeds existing Real-ISR methods on quantitative metrics across all tested benchmarks.
Restored images exhibit clearer and more realistic textures than prior generative approaches.
After pruning, inference completes in 0.019 seconds using 407.95G MACs and 344M parameters.
The linear-complexity design removes the unfavorable scaling of computation and memory with image resolution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same compression-plus-linear-attention pattern could be tested on related tasks such as real-world denoising or deblurring.
Further increases in compression ratio beyond 32x could be explored if the autoencoder continues to retain high-frequency texture cues.
The LoRA fine-tuning step on the linear DiT suggests a route for adapting the model to new degradation distributions without full retraining.

Load-bearing premise

The 32x compression autoencoder preserves all restoration-relevant structures and textures without introducing artifacts that later stages cannot correct.

What would settle it

Running SANA-SR on the standard Real-ISR benchmark suites and finding either lower perceptual quality scores or visible uncorrectable artifacts compared with quadratic-attention baselines would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2605.23451 by Bingtian Qiao, Guangtao Zhai, Jiezhang Cao, Yingjie Zhou, Yong Guo, Yue Shi.

**Figure 1.** Figure 1: SANA-SR achieves a strong quality–efficiency trade-off for real-world image superresolution. Left: qualitative comparison on a real LR input against seven baselines, the yellow box is zoom region. Right: DRealSR scatter of normalized perceptual score and inference time; marker color encodes method family and size scales with parameters. SANA-SR yields a best perceptual at the lowest latency. Abstract Real… view at source ↗

**Figure 2.** Figure 2: Overview of SANA-SR. Given an LQ input, SANA-SR first maps the image into a compact latent space with a frozen deep-compression VAE, then restores the latent with a prompt-conditioned one-step LinearDiT adapted by LoRA. Training is regularized by frozen-prior alignment and adapter consistency, and the final model is further compressed by prompt-aware structured pruning for efficient deployment. 3.1 Degrada… view at source ↗

**Figure 3.** Figure 3: Qualitative comparison on challenging examples from DRealSR. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 5.** Figure 5: Additional qualitative comparison on examples from DIV2K-Val, RealSR, and DRealSR. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

read the original abstract

Real-world image super-resolution aims to recover high-quality images from complex and unknown real-world degradations. However, existing generative Real-ISR methods largely inherit the dense latent representations and quadratic-cost global modeling paradigm developed for high-resolution image synthesis, causing computation, memory usage, and inference latency to scale unfavorably with resolution and thus limiting practical deployment. We argue that the key bottleneck lies not in insufficient restoration priors, but in excessive token redundancy and costly token interactions during high-resolution restoration. Motivated by this observation, we revisit Real-ISR from the perspectives of compact latent representation and linear-complexity modeling, and propose SANA-SR, an efficient one-step restoration framework. Specifically, SANA-SR employs a deep compression autoencoder with a 32x compression ratio to drastically reduce latent tokens while preserving restoration-relevant structures and textures. On top of this compact latent space, we introduce a linear-attention DiT with LoRA fine-tuning, enabling efficient high-resolution restoration with linear-complexity token mixing. Extensive experiments on all benchmark datasets demonstrate that SANA-SR achieves highly competitive and often superior quantitative performance against existing methods, while restoring clearer and more realistic textures. Moreover, after pruning, the deployed model runs in 0.019s with 407.95G MACs and 344M parameters, highlighting its strong potential for practical mobile deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SANA-SR pairs 32x compression with linear-attention DiT for one-step Real-ISR, but the autoencoder's ability to retain needed details remains the untested hinge.

read the letter

The paper's core move is to treat token redundancy as the main limiter in generative real-world super-resolution and respond with a deep 32x compression autoencoder plus a linear-attention DiT fine-tuned via LoRA. This produces a one-step model aimed at mobile speeds, with the abstract citing 0.019 s inference, 407.95 G MACs, and 344 M parameters after pruning. The framing that existing dense-latent quadratic methods scale poorly with resolution is straightforward and on point for deployment work. The concrete efficiency targets are also useful to see stated explicitly. What is new is the particular combination of that compression ratio with linear token mixing for restoration rather than synthesis. The paper does a service by shifting attention from adding more priors to removing unnecessary computation. The soft spot is exactly the one flagged in the stress-test note. The claim that the 32x autoencoder preserves all restoration-relevant structures and textures is asserted without visible supporting measurements, ablations, or failure-case analysis in the abstract. Linear attention already has reduced modeling power compared with standard attention, so any high-frequency loss upstream cannot be easily fixed downstream. If that assumption fails, the competitive quality numbers and the low-latency numbers cannot both be true. No error analysis or baseline tables appear here, which leaves the central performance claim uncheckable. This work is aimed at engineers who need fast generative restoration on constrained hardware. A reader already working on compact latent spaces or linear transformers for vision might pick up the architecture sketch, but the lack of verifiable results limits how far the claims can be taken. It should go to peer review so the experiments can be examined directly rather than desk-rejected on the abstract alone.

Referee Report

2 major / 1 minor

Summary. The paper proposes SANA-SR, an efficient one-step diffusion-based framework for real-world image super-resolution. It identifies token redundancy and quadratic attention costs as the primary bottlenecks in existing generative Real-ISR methods and addresses them via a deep compression autoencoder (32x ratio) to produce compact latent tokens while preserving structures and textures, followed by a linear-attention DiT backbone with LoRA fine-tuning for linear-complexity token mixing. The authors report that the resulting model achieves highly competitive or superior quantitative performance (PSNR/SSIM/perceptual metrics) on standard benchmarks, restores clearer textures, and after pruning runs at 0.019 s inference with 407.95 G MACs and 344 M parameters.

Significance. If the central claims hold, the work would be significant for enabling practical, mobile deployment of high-quality generative Real-ISR by demonstrating that extreme latent compression combined with linear attention can maintain restoration fidelity at dramatically reduced compute and latency. The explicit focus on token redundancy rather than prior insufficiency, together with the reported efficiency numbers, offers a concrete path toward scalable restoration models.

major comments (2)

[Abstract, §3] Abstract and §3 (method): The claim that the 32x deep compression autoencoder 'preserves restoration-relevant structures and textures' is load-bearing for both the performance and efficiency assertions, yet no ablation quantifies information loss at this ratio or demonstrates that degradation-specific high-frequency cues remain recoverable by the linear-attention DiT (even with LoRA). If the autoencoder discards unrecoverable details, the reported competitive metrics and 0.019 s latency cannot simultaneously hold.
[§4] §4 (experiments): The abstract asserts 'highly competitive and often superior quantitative performance' and 'clearer and more realistic textures' but supplies no tables, baselines, or error bars in the visible text; without these, the cross-method comparison and the claim that linear attention suffices cannot be evaluated.

minor comments (1)

[§3] Notation for the linear-attention mechanism and the precise definition of the 32x compression ratio should be introduced with equations in the method section for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will incorporate revisions to strengthen the paper where the concerns are valid.

read point-by-point responses

Referee: [Abstract, §3] Abstract and §3 (method): The claim that the 32x deep compression autoencoder 'preserves restoration-relevant structures and textures' is load-bearing for both the performance and efficiency assertions, yet no ablation quantifies information loss at this ratio or demonstrates that degradation-specific high-frequency cues remain recoverable by the linear-attention DiT (even with LoRA). If the autoencoder discards unrecoverable details, the reported competitive metrics and 0.019 s latency cannot simultaneously hold.

Authors: We agree that an explicit ablation quantifying information loss at the 32x ratio and demonstrating recoverability of degradation-specific cues would strengthen the central claim. In the revised manuscript we will add: (1) a compression-ratio ablation (8x/16x/32x) reporting reconstruction PSNR/SSIM on both clean and degraded inputs, (2) latent-space visualizations and high-frequency energy spectra before/after encoding, and (3) a controlled study measuring how much of the final restoration quality is attributable to the autoencoder versus the DiT. These additions will directly address whether the linear-attention DiT can recover the necessary cues. revision: yes
Referee: [§4] §4 (experiments): The abstract asserts 'highly competitive and often superior quantitative performance' and 'clearer and more realistic textures' but supplies no tables, baselines, or error bars in the visible text; without these, the cross-method comparison and the claim that linear attention suffices cannot be evaluated.

Authors: Section 4 of the full manuscript contains multiple tables comparing SANA-SR against recent Real-ISR baselines on standard benchmarks (PSNR, SSIM, LPIPS, MUSIQ, etc.), together with qualitative results. We will ensure all tables are clearly referenced from the abstract and §3, add standard-error bars from three independent runs where they were omitted, and include an additional table isolating the contribution of linear attention versus quadratic attention under identical latent tokens. If any tables were missing from the reviewed version due to rendering, we apologize and will correct the submission. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central proposal rests on an architectural argument (token redundancy as the primary bottleneck) followed by a design choice (32x deep compression autoencoder + linear-attention DiT with LoRA) and empirical reporting on external benchmarks. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citation chains appear in the provided text that would reduce the performance claims to the inputs by construction. The autoencoder fidelity assumption is stated explicitly as a design premise rather than derived from prior self-work, and the reported metrics (PSNR/SSIM, latency, MACs) are positioned as measured outcomes on standard datasets. This satisfies the criteria for a self-contained derivation.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

Only abstract available; no explicit free parameters, axioms, or invented entities can be extracted beyond the stated design choices.

free parameters (1)

compression ratio
32x ratio is presented as a design choice to reduce tokens while preserving structures.

pith-pipeline@v0.9.0 · 5786 in / 1028 out tokens · 19315 ms · 2026-05-25T04:39:16.013079+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 2 internal anchors

[1]

Deep learning for image super-resolution: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10):3365–3387, 2020

Zhihao Wang, Jian Chen, and Steven CH Hoi. Deep learning for image super-resolution: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10):3365–3387, 2020

work page 2020
[2]

Toward real-world single image super-resolution: A new benchmark and a new model

Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, and Lei Zhang. Toward real-world single image super-resolution: A new benchmark and a new model. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3086–3095, 2019

work page 2019
[3]

Exploiting diffusion prior for real-world image super-resolution.International Journal of Computer Vision, 132(12):5929–5949, 2024

Jianyi Wang, Zongsheng Yue, Shangchen Zhou, Kelvin CK Chan, and Chen Change Loy. Exploiting diffusion prior for real-world image super-resolution.International Journal of Computer Vision, 132(12):5929–5949, 2024

work page 2024
[4]

Ntire 2020 challenge on real-world image super-resolution: Methods and results

Andreas Lugmayr, Martin Danelljan, and Radu Timofte. Ntire 2020 challenge on real-world image super-resolution: Methods and results. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 494–495, 2020

work page 2020
[5]

Quantized image super-resolution on mobile npus, mobile ai 2025 challenge: Report

Andrey Ignatov, Georgy Perevozchikov, Radu Timofte, Zhiyu Zhang, Tianxiao Gao, Yukun Yang, Shiai Zhu, Shihao Wang, Kihwan Yoon, Ganzorig Gankhuyag, et al. Quantized image super-resolution on mobile npus, mobile ai 2025 challenge: Report. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 1908– 1921, 2025

work page 2025
[6]

Reversible primitive–composition alignment for continual vision–language learning

Canran Xiao, Tianxiang Xu, Siyuan Ma, Yiyang Jiang, Haoyu Gao, and Yuhan Wu. Reversible primitive–composition alignment for continual vision–language learning. InInternational Conference on Learning Representations, 2026

work page 2026
[7]

Diffbir: Toward blind image restoration with generative diffusion prior

Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Bo Dai, Fanghua Yu, Yu Qiao, Wanli Ouyang, and Chao Dong. Diffbir: Toward blind image restoration with generative diffusion prior. InEuropean Conference on Computer Vision, pages 430–448. Springer, 2024

work page 2024
[8]

Seesr: Towards semantics-aware real-world image super-resolution

Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, and Lei Zhang. Seesr: Towards semantics-aware real-world image super-resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25456–25467, 2024

work page 2024
[9]

One-step effective diffusion network for real-world image super-resolution.Advances in Neural Information Processing Systems, 37: 92529–92553, 2024

Rongyuan Wu, Lingchen Sun, Zhiyuan Ma, and Lei Zhang. One-step effective diffusion network for real-world image super-resolution.Advances in Neural Information Processing Systems, 37: 92529–92553, 2024

work page 2024
[10]

Adversarial diffusion compression for real-world image super-resolution

Bin Chen, Gehui Li, Rongyuan Wu, Xindong Zhang, Jie Chen, Jian Zhang, and Lei Zhang. Adversarial diffusion compression for real-world image super-resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 28208–28220, 2025

work page 2025
[11]

Tsd-sr: One-step diffusion with target score distillation for real-world image super-resolution

Linwei Dong, Qingnan Fan, Yihong Guo, Zhonghao Wang, Qi Zhang, Jinwei Chen, Yawei Luo, and Changqing Zou. Tsd-sr: One-step diffusion with target score distillation for real-world image super-resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23174–23184, 2025

work page 2025
[12]

Learning a deep convolutional network for image super-resolution

Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Learning a deep convolutional network for image super-resolution. InEuropean Conference on Computer Vision, pages 184–199. Springer, 2014

work page 2014
[13]

Enhanced deep residual networks for single image super-resolution

Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 136–144, 2017

work page 2017
[14]

Image super- resolution using very deep residual channel attention networks

Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. Image super- resolution using very deep residual channel attention networks. InEuropean Conference on Computer Vision, pages 286–301, 2018. 10

work page 2018
[15]

Swinir: Image restoration using swin transformer

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1833–1844, 2021

work page 2021
[16]

Activating more pixels in image super-resolution transformer

Xiangyu Chen, Xintao Wang, Jiantao Zhou, Yu Qiao, and Chao Dong. Activating more pixels in image super-resolution transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22367–22377, 2023

work page 2023
[17]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022

work page 2022
[18]

Flux.https://github.com/black-forest-labs/flux, 2024

Black Forest Labs. Flux.https://github.com/black-forest-labs/flux, 2024

work page 2024
[19]

FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Müller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context image ...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[20]

Transformers are rnns: Fast autoregressive transformers with linear attention

Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. Transformers are rnns: Fast autoregressive transformers with linear attention. InInternational Conference on Machine Learning, pages 5156–5165. PMLR, 2020

work page 2020
[21]

Sana: Efficient high-resolution image synthesis with linear diffusion transformers

Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, et al. Sana: Efficient high-resolution image synthesis with linear diffusion transformers. InInternational Conference on Learning Representations, 2025

work page 2025
[22]

LinearSR: Unlocking linear attention for stable and efficient image super-resolution

Xiaohui Li, Shaobin Zhuang, Shuo Cao, Yang Yang, Yuandong Pu, Qi Qin, Siqi Luo, Bin Fu, and Yihao Liu. LinearSR: Unlocking linear attention for stable and efficient image super-resolution. InInternational Conference on Learning Representations, 2026

work page 2026
[23]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4195–4205, 2023

work page 2023
[24]

LoRA: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022

work page 2022
[25]

One diffusion step to real- world super-resolution via flow trajectory distillation

Jianze Li, Jiezhang Cao, Yong Guo, Wenbo Li, and Yulun Zhang. One diffusion step to real- world super-resolution via flow trajectory distillation. InInternational Conference on Machine Learning, pages 34044–34053. PMLR, 2025

work page 2025
[26]

Esrgan: Enhanced super-resolution generative adversarial networks

Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. Esrgan: Enhanced super-resolution generative adversarial networks. InEuropean Conference on Computer Vision Workshops, 2018

work page 2018
[27]

Designing a practical degradation model for deep blind image super-resolution

Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timofte. Designing a practical degradation model for deep blind image super-resolution. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4791–4800, 2021

work page 2021
[28]

Real-esrgan: Training real-world blind super-resolution with pure synthetic data

Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1905–1914, 2021

work page 1905
[29]

Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild

Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, and Chao Dong. Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25669–25680, 2024

work page 2024
[30]

Dreamclear: High-capacity real-world image restoration with privacy-safe dataset curation.Advances in Neural Information Processing Systems, 37:55443–55469, 2024

Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Xiaotian Han, Zhengyu Chen, Quanzeng You, and Hongxia Yang. Dreamclear: High-capacity real-world image restoration with privacy-safe dataset curation.Advances in Neural Information Processing Systems, 37:55443–55469, 2024. 11

work page 2024
[31]

Dit4sr: Taming diffusion transformer for real-world image super-resolution

Zheng-Peng Duan, Jiawei Zhang, Xin Jin, Ziheng Zhang, Zheng Xiong, Dongqing Zou, Jimmy S Ren, Chunle Guo, and Chongyi Li. Dit4sr: Taming diffusion transformer for real-world image super-resolution. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 18948–18958, 2025

work page 2025
[32]

Sinsr: diffusion-based image super-resolution in a single step

Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C Kot, and Bihan Wen. Sinsr: diffusion-based image super-resolution in a single step. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25796–25805, 2024

work page 2024
[33]

Taming diffusion prior for image super-resolution with domain shift sdes.Advances in Neural Information Processing Systems, 37:42765–42797, 2024

Qinpeng Cui, Yixuan Liu, Xinyi Zhang, Qiqi Bao, Qingmin Liao, Li Wang, Tian Lu, Zhongdao Wang, Emad Barsoum, et al. Taming diffusion prior for image super-resolution with domain shift sdes.Advances in Neural Information Processing Systems, 37:42765–42797, 2024

work page 2024
[34]

Arbitrary-steps image super-resolution via diffusion inversion

Zongsheng Yue, Kang Liao, and Chen Change Loy. Arbitrary-steps image super-resolution via diffusion inversion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23153–23163, 2025

work page 2025
[35]

Pixel- level and semantic-level adjustable super-resolution: A dual-lora approach

Lingchen Sun, Rongyuan Wu, Zhiyuan Ma, Shuaizheng Liu, Qiaosi Yi, and Lei Zhang. Pixel- level and semantic-level adjustable super-resolution: A dual-lora approach. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2333–2343, 2025

work page 2025
[36]

Unleashing the power of one-step diffusion based image super-resolution via a large-scale diffusion discriminator

Jianze Li, Jiezhang Cao, Zichen Zou, Xiongfei Su, Xin Yuan, Yulun Zhang, Yong Guo, and Xiaokang Yang. Unleashing the power of one-step diffusion based image super-resolution via a large-scale diffusion discriminator. InAdvances in Neural Information Processing Systems, 2025

work page 2025
[37]

Q-DiT4SR: Exploration of Detail-Preserving Diffusion Transformer Quantization for Real-World Image Super-Resolution

Xun Zhang, Kaicheng Yang, Hongliang Lu, Haotong Qin, Yong Guo, and Yulun Zhang. Q- dit4sr: Exploration of detail-preserving diffusion transformer quantization for real-world image super-resolution.arXiv preprint arXiv:2602.01273, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[38]

Optimal brain damage.Advances in Neural Information Processing Systems, 2, 1989

Yann LeCun, John Denker, and Sara Solla. Optimal brain damage.Advances in Neural Information Processing Systems, 2, 1989

work page 1989
[39]

Learning both weights and connections for efficient neural network.Advances in Neural Information Processing Systems, 28, 2015

Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network.Advances in Neural Information Processing Systems, 28, 2015

work page 2015
[40]

Learning efficient convolutional networks through network slimming

Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. Learning efficient convolutional networks through network slimming. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2736–2744, 2017

work page 2017
[41]

The lottery ticket hypothesis: Finding sparse, trainable neural networks

Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. InInternational Conference on Learning Representations, 2019

work page 2019
[42]

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and Alexandra Peste. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. Journal of Machine Learning Research, 22(241):1–124, 2021

work page 2021
[43]

Depgraph: Towards any structural pruning

Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, and Xinchao Wang. Depgraph: Towards any structural pruning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16091–16101, 2023

work page 2023
[44]

Llm-pruner: On the structural pruning of large language models.Advances in Neural Information Processing Systems, 36:21702–21720, 2023

Xinyin Ma, Gongfan Fang, and Xinchao Wang. Llm-pruner: On the structural pruning of large language models.Advances in Neural Information Processing Systems, 36:21702–21720, 2023

work page 2023
[45]

Tinyfusion: Diffusion transformers learned shallow

Gongfan Fang, Kunjun Li, Xinyin Ma, and Xinchao Wang. Tinyfusion: Diffusion transformers learned shallow. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18144–18154, 2025

work page 2025
[46]

Ntire 2017 challenge on single image super-resolution: Dataset and study

Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 126–135, 2017. 12

work page 2017
[47]

Lsdir: A large scale dataset for image restoration

Yawei Li, Kai Zhang, Jingyun Liang, Jiezhang Cao, Ce Liu, Rui Gong, Yulun Zhang, Hao Tang, Yun Liu, Denis Demandolx, et al. Lsdir: A large scale dataset for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1775–1787, 2023

work page 2023
[48]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019

work page 2019
[49]

Pixel-aware stable diffu- sion for realistic image super-resolution and personalized stylization

Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, and Lei Zhang. Pixel-aware stable diffu- sion for realistic image super-resolution and personalized stylization. InEuropean Conference on Computer Vision, pages 74–91. Springer, 2024

work page 2024
[50]

Resshift: Efficient diffusion model for image super-resolution by residual shifting.Advances in Neural Information Processing Systems, 36:13294–13307, 2023

Zongsheng Yue, Jianyi Wang, and Chen Change Loy. Resshift: Efficient diffusion model for image super-resolution by residual shifting.Advances in Neural Information Processing Systems, 36:13294–13307, 2023

work page 2023
[51]

Degradation- guided one-step image super-resolution with diffusion priors.arXiv preprint arXiv:2409.17058, 2024

Aiping Zhang, Zongsheng Yue, Renjing Pei, Wenqi Ren, and Xiaochun Cao. Degradation- guided one-step image super-resolution with diffusion priors.arXiv preprint arXiv:2409.17058, 2024

work page arXiv 2024
[52]

Addsr: Ac- celerating diffusion-based blind super-resolution with adversarial diffusion distillation.Pattern Recognition, page 113012, 2026

Ying Tai, Rui Xie, Chen Zhao, Kai Zhang, Zhenyu Zhang, Jun Zhou, and Jian Yang. Addsr: Ac- celerating diffusion-based blind super-resolution with adversarial diffusion distillation.Pattern Recognition, page 113012, 2026

work page 2026
[53]

Component divide-and-conquer for real-world image super-resolution

Pengxu Wei, Ziwei Xie, Hannan Lu, Zongyuan Zhan, Qixiang Ye, Wangmeng Zuo, and Liang Lin. Component divide-and-conquer for real-world image super-resolution. InEuropean Conference on Computer Vision, pages 101–117. Springer, 2020. 13 Appendix A Additional Technical Details A.1 Prompt Construction and Protocol For every experimental run, the prompt source ...

work page 2020

[1] [1]

Deep learning for image super-resolution: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10):3365–3387, 2020

Zhihao Wang, Jian Chen, and Steven CH Hoi. Deep learning for image super-resolution: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10):3365–3387, 2020

work page 2020

[2] [2]

Toward real-world single image super-resolution: A new benchmark and a new model

Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, and Lei Zhang. Toward real-world single image super-resolution: A new benchmark and a new model. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3086–3095, 2019

work page 2019

[3] [3]

Exploiting diffusion prior for real-world image super-resolution.International Journal of Computer Vision, 132(12):5929–5949, 2024

Jianyi Wang, Zongsheng Yue, Shangchen Zhou, Kelvin CK Chan, and Chen Change Loy. Exploiting diffusion prior for real-world image super-resolution.International Journal of Computer Vision, 132(12):5929–5949, 2024

work page 2024

[4] [4]

Ntire 2020 challenge on real-world image super-resolution: Methods and results

Andreas Lugmayr, Martin Danelljan, and Radu Timofte. Ntire 2020 challenge on real-world image super-resolution: Methods and results. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 494–495, 2020

work page 2020

[5] [5]

Quantized image super-resolution on mobile npus, mobile ai 2025 challenge: Report

Andrey Ignatov, Georgy Perevozchikov, Radu Timofte, Zhiyu Zhang, Tianxiao Gao, Yukun Yang, Shiai Zhu, Shihao Wang, Kihwan Yoon, Ganzorig Gankhuyag, et al. Quantized image super-resolution on mobile npus, mobile ai 2025 challenge: Report. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 1908– 1921, 2025

work page 2025

[6] [6]

Reversible primitive–composition alignment for continual vision–language learning

Canran Xiao, Tianxiang Xu, Siyuan Ma, Yiyang Jiang, Haoyu Gao, and Yuhan Wu. Reversible primitive–composition alignment for continual vision–language learning. InInternational Conference on Learning Representations, 2026

work page 2026

[7] [7]

Diffbir: Toward blind image restoration with generative diffusion prior

Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Bo Dai, Fanghua Yu, Yu Qiao, Wanli Ouyang, and Chao Dong. Diffbir: Toward blind image restoration with generative diffusion prior. InEuropean Conference on Computer Vision, pages 430–448. Springer, 2024

work page 2024

[8] [8]

Seesr: Towards semantics-aware real-world image super-resolution

Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, and Lei Zhang. Seesr: Towards semantics-aware real-world image super-resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25456–25467, 2024

work page 2024

[9] [9]

One-step effective diffusion network for real-world image super-resolution.Advances in Neural Information Processing Systems, 37: 92529–92553, 2024

Rongyuan Wu, Lingchen Sun, Zhiyuan Ma, and Lei Zhang. One-step effective diffusion network for real-world image super-resolution.Advances in Neural Information Processing Systems, 37: 92529–92553, 2024

work page 2024

[10] [10]

Adversarial diffusion compression for real-world image super-resolution

Bin Chen, Gehui Li, Rongyuan Wu, Xindong Zhang, Jie Chen, Jian Zhang, and Lei Zhang. Adversarial diffusion compression for real-world image super-resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 28208–28220, 2025

work page 2025

[11] [11]

Tsd-sr: One-step diffusion with target score distillation for real-world image super-resolution

Linwei Dong, Qingnan Fan, Yihong Guo, Zhonghao Wang, Qi Zhang, Jinwei Chen, Yawei Luo, and Changqing Zou. Tsd-sr: One-step diffusion with target score distillation for real-world image super-resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23174–23184, 2025

work page 2025

[12] [12]

Learning a deep convolutional network for image super-resolution

Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Learning a deep convolutional network for image super-resolution. InEuropean Conference on Computer Vision, pages 184–199. Springer, 2014

work page 2014

[13] [13]

Enhanced deep residual networks for single image super-resolution

Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 136–144, 2017

work page 2017

[14] [14]

Image super- resolution using very deep residual channel attention networks

Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. Image super- resolution using very deep residual channel attention networks. InEuropean Conference on Computer Vision, pages 286–301, 2018. 10

work page 2018

[15] [15]

Swinir: Image restoration using swin transformer

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1833–1844, 2021

work page 2021

[16] [16]

Activating more pixels in image super-resolution transformer

Xiangyu Chen, Xintao Wang, Jiantao Zhou, Yu Qiao, and Chao Dong. Activating more pixels in image super-resolution transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22367–22377, 2023

work page 2023

[17] [17]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022

work page 2022

[18] [18]

Flux.https://github.com/black-forest-labs/flux, 2024

Black Forest Labs. Flux.https://github.com/black-forest-labs/flux, 2024

work page 2024

[19] [19]

FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Müller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context image ...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[20] [20]

Transformers are rnns: Fast autoregressive transformers with linear attention

Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. Transformers are rnns: Fast autoregressive transformers with linear attention. InInternational Conference on Machine Learning, pages 5156–5165. PMLR, 2020

work page 2020

[21] [21]

Sana: Efficient high-resolution image synthesis with linear diffusion transformers

Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, et al. Sana: Efficient high-resolution image synthesis with linear diffusion transformers. InInternational Conference on Learning Representations, 2025

work page 2025

[22] [22]

LinearSR: Unlocking linear attention for stable and efficient image super-resolution

Xiaohui Li, Shaobin Zhuang, Shuo Cao, Yang Yang, Yuandong Pu, Qi Qin, Siqi Luo, Bin Fu, and Yihao Liu. LinearSR: Unlocking linear attention for stable and efficient image super-resolution. InInternational Conference on Learning Representations, 2026

work page 2026

[23] [23]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4195–4205, 2023

work page 2023

[24] [24]

LoRA: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022

work page 2022

[25] [25]

One diffusion step to real- world super-resolution via flow trajectory distillation

Jianze Li, Jiezhang Cao, Yong Guo, Wenbo Li, and Yulun Zhang. One diffusion step to real- world super-resolution via flow trajectory distillation. InInternational Conference on Machine Learning, pages 34044–34053. PMLR, 2025

work page 2025

[26] [26]

Esrgan: Enhanced super-resolution generative adversarial networks

Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. Esrgan: Enhanced super-resolution generative adversarial networks. InEuropean Conference on Computer Vision Workshops, 2018

work page 2018

[27] [27]

Designing a practical degradation model for deep blind image super-resolution

Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timofte. Designing a practical degradation model for deep blind image super-resolution. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4791–4800, 2021

work page 2021

[28] [28]

Real-esrgan: Training real-world blind super-resolution with pure synthetic data

Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1905–1914, 2021

work page 1905

[29] [29]

Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild

Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, and Chao Dong. Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25669–25680, 2024

work page 2024

[30] [30]

Dreamclear: High-capacity real-world image restoration with privacy-safe dataset curation.Advances in Neural Information Processing Systems, 37:55443–55469, 2024

Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Xiaotian Han, Zhengyu Chen, Quanzeng You, and Hongxia Yang. Dreamclear: High-capacity real-world image restoration with privacy-safe dataset curation.Advances in Neural Information Processing Systems, 37:55443–55469, 2024. 11

work page 2024

[31] [31]

Dit4sr: Taming diffusion transformer for real-world image super-resolution

Zheng-Peng Duan, Jiawei Zhang, Xin Jin, Ziheng Zhang, Zheng Xiong, Dongqing Zou, Jimmy S Ren, Chunle Guo, and Chongyi Li. Dit4sr: Taming diffusion transformer for real-world image super-resolution. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 18948–18958, 2025

work page 2025

[32] [32]

Sinsr: diffusion-based image super-resolution in a single step

Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C Kot, and Bihan Wen. Sinsr: diffusion-based image super-resolution in a single step. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25796–25805, 2024

work page 2024

[33] [33]

Taming diffusion prior for image super-resolution with domain shift sdes.Advances in Neural Information Processing Systems, 37:42765–42797, 2024

Qinpeng Cui, Yixuan Liu, Xinyi Zhang, Qiqi Bao, Qingmin Liao, Li Wang, Tian Lu, Zhongdao Wang, Emad Barsoum, et al. Taming diffusion prior for image super-resolution with domain shift sdes.Advances in Neural Information Processing Systems, 37:42765–42797, 2024

work page 2024

[34] [34]

Arbitrary-steps image super-resolution via diffusion inversion

Zongsheng Yue, Kang Liao, and Chen Change Loy. Arbitrary-steps image super-resolution via diffusion inversion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23153–23163, 2025

work page 2025

[35] [35]

Pixel- level and semantic-level adjustable super-resolution: A dual-lora approach

Lingchen Sun, Rongyuan Wu, Zhiyuan Ma, Shuaizheng Liu, Qiaosi Yi, and Lei Zhang. Pixel- level and semantic-level adjustable super-resolution: A dual-lora approach. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2333–2343, 2025

work page 2025

[36] [36]

Unleashing the power of one-step diffusion based image super-resolution via a large-scale diffusion discriminator

Jianze Li, Jiezhang Cao, Zichen Zou, Xiongfei Su, Xin Yuan, Yulun Zhang, Yong Guo, and Xiaokang Yang. Unleashing the power of one-step diffusion based image super-resolution via a large-scale diffusion discriminator. InAdvances in Neural Information Processing Systems, 2025

work page 2025

[37] [37]

Q-DiT4SR: Exploration of Detail-Preserving Diffusion Transformer Quantization for Real-World Image Super-Resolution

Xun Zhang, Kaicheng Yang, Hongliang Lu, Haotong Qin, Yong Guo, and Yulun Zhang. Q- dit4sr: Exploration of detail-preserving diffusion transformer quantization for real-world image super-resolution.arXiv preprint arXiv:2602.01273, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[38] [38]

Optimal brain damage.Advances in Neural Information Processing Systems, 2, 1989

Yann LeCun, John Denker, and Sara Solla. Optimal brain damage.Advances in Neural Information Processing Systems, 2, 1989

work page 1989

[39] [39]

Learning both weights and connections for efficient neural network.Advances in Neural Information Processing Systems, 28, 2015

Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network.Advances in Neural Information Processing Systems, 28, 2015

work page 2015

[40] [40]

Learning efficient convolutional networks through network slimming

Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. Learning efficient convolutional networks through network slimming. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2736–2744, 2017

work page 2017

[41] [41]

The lottery ticket hypothesis: Finding sparse, trainable neural networks

Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. InInternational Conference on Learning Representations, 2019

work page 2019

[42] [42]

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and Alexandra Peste. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. Journal of Machine Learning Research, 22(241):1–124, 2021

work page 2021

[43] [43]

Depgraph: Towards any structural pruning

Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, and Xinchao Wang. Depgraph: Towards any structural pruning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16091–16101, 2023

work page 2023

[44] [44]

Llm-pruner: On the structural pruning of large language models.Advances in Neural Information Processing Systems, 36:21702–21720, 2023

Xinyin Ma, Gongfan Fang, and Xinchao Wang. Llm-pruner: On the structural pruning of large language models.Advances in Neural Information Processing Systems, 36:21702–21720, 2023

work page 2023

[45] [45]

Tinyfusion: Diffusion transformers learned shallow

Gongfan Fang, Kunjun Li, Xinyin Ma, and Xinchao Wang. Tinyfusion: Diffusion transformers learned shallow. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18144–18154, 2025

work page 2025

[46] [46]

Ntire 2017 challenge on single image super-resolution: Dataset and study

Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 126–135, 2017. 12

work page 2017

[47] [47]

Lsdir: A large scale dataset for image restoration

Yawei Li, Kai Zhang, Jingyun Liang, Jiezhang Cao, Ce Liu, Rui Gong, Yulun Zhang, Hao Tang, Yun Liu, Denis Demandolx, et al. Lsdir: A large scale dataset for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1775–1787, 2023

work page 2023

[48] [48]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019

work page 2019

[49] [49]

Pixel-aware stable diffu- sion for realistic image super-resolution and personalized stylization

Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, and Lei Zhang. Pixel-aware stable diffu- sion for realistic image super-resolution and personalized stylization. InEuropean Conference on Computer Vision, pages 74–91. Springer, 2024

work page 2024

[50] [50]

Resshift: Efficient diffusion model for image super-resolution by residual shifting.Advances in Neural Information Processing Systems, 36:13294–13307, 2023

Zongsheng Yue, Jianyi Wang, and Chen Change Loy. Resshift: Efficient diffusion model for image super-resolution by residual shifting.Advances in Neural Information Processing Systems, 36:13294–13307, 2023

work page 2023

[51] [51]

Degradation- guided one-step image super-resolution with diffusion priors.arXiv preprint arXiv:2409.17058, 2024

Aiping Zhang, Zongsheng Yue, Renjing Pei, Wenqi Ren, and Xiaochun Cao. Degradation- guided one-step image super-resolution with diffusion priors.arXiv preprint arXiv:2409.17058, 2024

work page arXiv 2024

[52] [52]

Addsr: Ac- celerating diffusion-based blind super-resolution with adversarial diffusion distillation.Pattern Recognition, page 113012, 2026

Ying Tai, Rui Xie, Chen Zhao, Kai Zhang, Zhenyu Zhang, Jun Zhou, and Jian Yang. Addsr: Ac- celerating diffusion-based blind super-resolution with adversarial diffusion distillation.Pattern Recognition, page 113012, 2026

work page 2026

[53] [53]

Component divide-and-conquer for real-world image super-resolution

Pengxu Wei, Ziwei Xie, Hannan Lu, Zongyuan Zhan, Qixiang Ye, Wangmeng Zuo, and Liang Lin. Component divide-and-conquer for real-world image super-resolution. InEuropean Conference on Computer Vision, pages 101–117. Springer, 2020. 13 Appendix A Additional Technical Details A.1 Prompt Construction and Protocol For every experimental run, the prompt source ...

work page 2020