Efficient One-Step Diffusion Restoration Model with Compact Token Compression and Linear Attention
Pith reviewed 2026-05-25 04:39 UTC · model grok-4.3
The pith
SANA-SR restores real-world images via 32x token compression and linear-attention DiT in a single diffusion step.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SANA-SR is an efficient one-step restoration framework that employs a deep compression autoencoder with a 32x compression ratio to drastically reduce latent tokens while preserving restoration-relevant structures and textures. On top of this compact latent space, a linear-attention DiT with LoRA fine-tuning performs high-resolution restoration with linear-complexity token mixing. Extensive experiments on all benchmark datasets show that SANA-SR achieves highly competitive and often superior quantitative performance against existing methods while restoring clearer and more realistic textures, and the deployed model runs in 0.019s with 407.95G MACs and 344M parameters.
What carries the argument
Deep compression autoencoder at 32x ratio combined with linear-attention DiT for token mixing.
If this is right
- The model matches or exceeds existing Real-ISR methods on quantitative metrics across all tested benchmarks.
- Restored images exhibit clearer and more realistic textures than prior generative approaches.
- After pruning, inference completes in 0.019 seconds using 407.95G MACs and 344M parameters.
- The linear-complexity design removes the unfavorable scaling of computation and memory with image resolution.
Where Pith is reading between the lines
- The same compression-plus-linear-attention pattern could be tested on related tasks such as real-world denoising or deblurring.
- Further increases in compression ratio beyond 32x could be explored if the autoencoder continues to retain high-frequency texture cues.
- The LoRA fine-tuning step on the linear DiT suggests a route for adapting the model to new degradation distributions without full retraining.
Load-bearing premise
The 32x compression autoencoder preserves all restoration-relevant structures and textures without introducing artifacts that later stages cannot correct.
What would settle it
Running SANA-SR on the standard Real-ISR benchmark suites and finding either lower perceptual quality scores or visible uncorrectable artifacts compared with quadratic-attention baselines would falsify the performance claim.
Figures
read the original abstract
Real-world image super-resolution aims to recover high-quality images from complex and unknown real-world degradations. However, existing generative Real-ISR methods largely inherit the dense latent representations and quadratic-cost global modeling paradigm developed for high-resolution image synthesis, causing computation, memory usage, and inference latency to scale unfavorably with resolution and thus limiting practical deployment. We argue that the key bottleneck lies not in insufficient restoration priors, but in excessive token redundancy and costly token interactions during high-resolution restoration. Motivated by this observation, we revisit Real-ISR from the perspectives of compact latent representation and linear-complexity modeling, and propose SANA-SR, an efficient one-step restoration framework. Specifically, SANA-SR employs a deep compression autoencoder with a 32x compression ratio to drastically reduce latent tokens while preserving restoration-relevant structures and textures. On top of this compact latent space, we introduce a linear-attention DiT with LoRA fine-tuning, enabling efficient high-resolution restoration with linear-complexity token mixing. Extensive experiments on all benchmark datasets demonstrate that SANA-SR achieves highly competitive and often superior quantitative performance against existing methods, while restoring clearer and more realistic textures. Moreover, after pruning, the deployed model runs in 0.019s with 407.95G MACs and 344M parameters, highlighting its strong potential for practical mobile deployment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SANA-SR, an efficient one-step diffusion-based framework for real-world image super-resolution. It identifies token redundancy and quadratic attention costs as the primary bottlenecks in existing generative Real-ISR methods and addresses them via a deep compression autoencoder (32x ratio) to produce compact latent tokens while preserving structures and textures, followed by a linear-attention DiT backbone with LoRA fine-tuning for linear-complexity token mixing. The authors report that the resulting model achieves highly competitive or superior quantitative performance (PSNR/SSIM/perceptual metrics) on standard benchmarks, restores clearer textures, and after pruning runs at 0.019 s inference with 407.95 G MACs and 344 M parameters.
Significance. If the central claims hold, the work would be significant for enabling practical, mobile deployment of high-quality generative Real-ISR by demonstrating that extreme latent compression combined with linear attention can maintain restoration fidelity at dramatically reduced compute and latency. The explicit focus on token redundancy rather than prior insufficiency, together with the reported efficiency numbers, offers a concrete path toward scalable restoration models.
major comments (2)
- [Abstract, §3] Abstract and §3 (method): The claim that the 32x deep compression autoencoder 'preserves restoration-relevant structures and textures' is load-bearing for both the performance and efficiency assertions, yet no ablation quantifies information loss at this ratio or demonstrates that degradation-specific high-frequency cues remain recoverable by the linear-attention DiT (even with LoRA). If the autoencoder discards unrecoverable details, the reported competitive metrics and 0.019 s latency cannot simultaneously hold.
- [§4] §4 (experiments): The abstract asserts 'highly competitive and often superior quantitative performance' and 'clearer and more realistic textures' but supplies no tables, baselines, or error bars in the visible text; without these, the cross-method comparison and the claim that linear attention suffices cannot be evaluated.
minor comments (1)
- [§3] Notation for the linear-attention mechanism and the precise definition of the 32x compression ratio should be introduced with equations in the method section for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will incorporate revisions to strengthen the paper where the concerns are valid.
read point-by-point responses
-
Referee: [Abstract, §3] Abstract and §3 (method): The claim that the 32x deep compression autoencoder 'preserves restoration-relevant structures and textures' is load-bearing for both the performance and efficiency assertions, yet no ablation quantifies information loss at this ratio or demonstrates that degradation-specific high-frequency cues remain recoverable by the linear-attention DiT (even with LoRA). If the autoencoder discards unrecoverable details, the reported competitive metrics and 0.019 s latency cannot simultaneously hold.
Authors: We agree that an explicit ablation quantifying information loss at the 32x ratio and demonstrating recoverability of degradation-specific cues would strengthen the central claim. In the revised manuscript we will add: (1) a compression-ratio ablation (8x/16x/32x) reporting reconstruction PSNR/SSIM on both clean and degraded inputs, (2) latent-space visualizations and high-frequency energy spectra before/after encoding, and (3) a controlled study measuring how much of the final restoration quality is attributable to the autoencoder versus the DiT. These additions will directly address whether the linear-attention DiT can recover the necessary cues. revision: yes
-
Referee: [§4] §4 (experiments): The abstract asserts 'highly competitive and often superior quantitative performance' and 'clearer and more realistic textures' but supplies no tables, baselines, or error bars in the visible text; without these, the cross-method comparison and the claim that linear attention suffices cannot be evaluated.
Authors: Section 4 of the full manuscript contains multiple tables comparing SANA-SR against recent Real-ISR baselines on standard benchmarks (PSNR, SSIM, LPIPS, MUSIQ, etc.), together with qualitative results. We will ensure all tables are clearly referenced from the abstract and §3, add standard-error bars from three independent runs where they were omitted, and include an additional table isolating the contribution of linear attention versus quadratic attention under identical latent tokens. If any tables were missing from the reviewed version due to rendering, we apologize and will correct the submission. revision: partial
Circularity Check
No significant circularity in derivation chain
full rationale
The paper's central proposal rests on an architectural argument (token redundancy as the primary bottleneck) followed by a design choice (32x deep compression autoencoder + linear-attention DiT with LoRA) and empirical reporting on external benchmarks. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citation chains appear in the provided text that would reduce the performance claims to the inputs by construction. The autoencoder fidelity assumption is stated explicitly as a design premise rather than derived from prior self-work, and the reported metrics (PSNR/SSIM, latency, MACs) are positioned as measured outcomes on standard datasets. This satisfies the criteria for a self-contained derivation.
Axiom & Free-Parameter Ledger
free parameters (1)
- compression ratio
Reference graph
Works this paper leans on
-
[1]
Zhihao Wang, Jian Chen, and Steven CH Hoi. Deep learning for image super-resolution: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10):3365–3387, 2020
work page 2020
-
[2]
Toward real-world single image super-resolution: A new benchmark and a new model
Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, and Lei Zhang. Toward real-world single image super-resolution: A new benchmark and a new model. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3086–3095, 2019
work page 2019
-
[3]
Jianyi Wang, Zongsheng Yue, Shangchen Zhou, Kelvin CK Chan, and Chen Change Loy. Exploiting diffusion prior for real-world image super-resolution.International Journal of Computer Vision, 132(12):5929–5949, 2024
work page 2024
-
[4]
Ntire 2020 challenge on real-world image super-resolution: Methods and results
Andreas Lugmayr, Martin Danelljan, and Radu Timofte. Ntire 2020 challenge on real-world image super-resolution: Methods and results. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 494–495, 2020
work page 2020
-
[5]
Quantized image super-resolution on mobile npus, mobile ai 2025 challenge: Report
Andrey Ignatov, Georgy Perevozchikov, Radu Timofte, Zhiyu Zhang, Tianxiao Gao, Yukun Yang, Shiai Zhu, Shihao Wang, Kihwan Yoon, Ganzorig Gankhuyag, et al. Quantized image super-resolution on mobile npus, mobile ai 2025 challenge: Report. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 1908– 1921, 2025
work page 2025
-
[6]
Reversible primitive–composition alignment for continual vision–language learning
Canran Xiao, Tianxiang Xu, Siyuan Ma, Yiyang Jiang, Haoyu Gao, and Yuhan Wu. Reversible primitive–composition alignment for continual vision–language learning. InInternational Conference on Learning Representations, 2026
work page 2026
-
[7]
Diffbir: Toward blind image restoration with generative diffusion prior
Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Bo Dai, Fanghua Yu, Yu Qiao, Wanli Ouyang, and Chao Dong. Diffbir: Toward blind image restoration with generative diffusion prior. InEuropean Conference on Computer Vision, pages 430–448. Springer, 2024
work page 2024
-
[8]
Seesr: Towards semantics-aware real-world image super-resolution
Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, and Lei Zhang. Seesr: Towards semantics-aware real-world image super-resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25456–25467, 2024
work page 2024
-
[9]
Rongyuan Wu, Lingchen Sun, Zhiyuan Ma, and Lei Zhang. One-step effective diffusion network for real-world image super-resolution.Advances in Neural Information Processing Systems, 37: 92529–92553, 2024
work page 2024
-
[10]
Adversarial diffusion compression for real-world image super-resolution
Bin Chen, Gehui Li, Rongyuan Wu, Xindong Zhang, Jie Chen, Jian Zhang, and Lei Zhang. Adversarial diffusion compression for real-world image super-resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 28208–28220, 2025
work page 2025
-
[11]
Tsd-sr: One-step diffusion with target score distillation for real-world image super-resolution
Linwei Dong, Qingnan Fan, Yihong Guo, Zhonghao Wang, Qi Zhang, Jinwei Chen, Yawei Luo, and Changqing Zou. Tsd-sr: One-step diffusion with target score distillation for real-world image super-resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23174–23184, 2025
work page 2025
-
[12]
Learning a deep convolutional network for image super-resolution
Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Learning a deep convolutional network for image super-resolution. InEuropean Conference on Computer Vision, pages 184–199. Springer, 2014
work page 2014
-
[13]
Enhanced deep residual networks for single image super-resolution
Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 136–144, 2017
work page 2017
-
[14]
Image super- resolution using very deep residual channel attention networks
Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. Image super- resolution using very deep residual channel attention networks. InEuropean Conference on Computer Vision, pages 286–301, 2018. 10
work page 2018
-
[15]
Swinir: Image restoration using swin transformer
Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1833–1844, 2021
work page 2021
-
[16]
Activating more pixels in image super-resolution transformer
Xiangyu Chen, Xintao Wang, Jiantao Zhou, Yu Qiao, and Chao Dong. Activating more pixels in image super-resolution transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22367–22377, 2023
work page 2023
-
[17]
High- resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022
work page 2022
-
[18]
Flux.https://github.com/black-forest-labs/flux, 2024
Black Forest Labs. Flux.https://github.com/black-forest-labs/flux, 2024
work page 2024
-
[19]
FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space
Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Müller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context image ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[20]
Transformers are rnns: Fast autoregressive transformers with linear attention
Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. Transformers are rnns: Fast autoregressive transformers with linear attention. InInternational Conference on Machine Learning, pages 5156–5165. PMLR, 2020
work page 2020
-
[21]
Sana: Efficient high-resolution image synthesis with linear diffusion transformers
Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, et al. Sana: Efficient high-resolution image synthesis with linear diffusion transformers. InInternational Conference on Learning Representations, 2025
work page 2025
-
[22]
LinearSR: Unlocking linear attention for stable and efficient image super-resolution
Xiaohui Li, Shaobin Zhuang, Shuo Cao, Yang Yang, Yuandong Pu, Qi Qin, Siqi Luo, Bin Fu, and Yihao Liu. LinearSR: Unlocking linear attention for stable and efficient image super-resolution. InInternational Conference on Learning Representations, 2026
work page 2026
-
[23]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4195–4205, 2023
work page 2023
-
[24]
LoRA: Low-rank adaptation of large language models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022
work page 2022
-
[25]
One diffusion step to real- world super-resolution via flow trajectory distillation
Jianze Li, Jiezhang Cao, Yong Guo, Wenbo Li, and Yulun Zhang. One diffusion step to real- world super-resolution via flow trajectory distillation. InInternational Conference on Machine Learning, pages 34044–34053. PMLR, 2025
work page 2025
-
[26]
Esrgan: Enhanced super-resolution generative adversarial networks
Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. Esrgan: Enhanced super-resolution generative adversarial networks. InEuropean Conference on Computer Vision Workshops, 2018
work page 2018
-
[27]
Designing a practical degradation model for deep blind image super-resolution
Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timofte. Designing a practical degradation model for deep blind image super-resolution. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4791–4800, 2021
work page 2021
-
[28]
Real-esrgan: Training real-world blind super-resolution with pure synthetic data
Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1905–1914, 2021
work page 1905
-
[29]
Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild
Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, and Chao Dong. Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25669–25680, 2024
work page 2024
-
[30]
Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Xiaotian Han, Zhengyu Chen, Quanzeng You, and Hongxia Yang. Dreamclear: High-capacity real-world image restoration with privacy-safe dataset curation.Advances in Neural Information Processing Systems, 37:55443–55469, 2024. 11
work page 2024
-
[31]
Dit4sr: Taming diffusion transformer for real-world image super-resolution
Zheng-Peng Duan, Jiawei Zhang, Xin Jin, Ziheng Zhang, Zheng Xiong, Dongqing Zou, Jimmy S Ren, Chunle Guo, and Chongyi Li. Dit4sr: Taming diffusion transformer for real-world image super-resolution. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 18948–18958, 2025
work page 2025
-
[32]
Sinsr: diffusion-based image super-resolution in a single step
Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C Kot, and Bihan Wen. Sinsr: diffusion-based image super-resolution in a single step. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25796–25805, 2024
work page 2024
-
[33]
Qinpeng Cui, Yixuan Liu, Xinyi Zhang, Qiqi Bao, Qingmin Liao, Li Wang, Tian Lu, Zhongdao Wang, Emad Barsoum, et al. Taming diffusion prior for image super-resolution with domain shift sdes.Advances in Neural Information Processing Systems, 37:42765–42797, 2024
work page 2024
-
[34]
Arbitrary-steps image super-resolution via diffusion inversion
Zongsheng Yue, Kang Liao, and Chen Change Loy. Arbitrary-steps image super-resolution via diffusion inversion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23153–23163, 2025
work page 2025
-
[35]
Pixel- level and semantic-level adjustable super-resolution: A dual-lora approach
Lingchen Sun, Rongyuan Wu, Zhiyuan Ma, Shuaizheng Liu, Qiaosi Yi, and Lei Zhang. Pixel- level and semantic-level adjustable super-resolution: A dual-lora approach. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2333–2343, 2025
work page 2025
-
[36]
Jianze Li, Jiezhang Cao, Zichen Zou, Xiongfei Su, Xin Yuan, Yulun Zhang, Yong Guo, and Xiaokang Yang. Unleashing the power of one-step diffusion based image super-resolution via a large-scale diffusion discriminator. InAdvances in Neural Information Processing Systems, 2025
work page 2025
-
[37]
Xun Zhang, Kaicheng Yang, Hongliang Lu, Haotong Qin, Yong Guo, and Yulun Zhang. Q- dit4sr: Exploration of detail-preserving diffusion transformer quantization for real-world image super-resolution.arXiv preprint arXiv:2602.01273, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[38]
Optimal brain damage.Advances in Neural Information Processing Systems, 2, 1989
Yann LeCun, John Denker, and Sara Solla. Optimal brain damage.Advances in Neural Information Processing Systems, 2, 1989
work page 1989
-
[39]
Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network.Advances in Neural Information Processing Systems, 28, 2015
work page 2015
-
[40]
Learning efficient convolutional networks through network slimming
Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. Learning efficient convolutional networks through network slimming. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2736–2744, 2017
work page 2017
-
[41]
The lottery ticket hypothesis: Finding sparse, trainable neural networks
Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. InInternational Conference on Learning Representations, 2019
work page 2019
-
[42]
Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and Alexandra Peste. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. Journal of Machine Learning Research, 22(241):1–124, 2021
work page 2021
-
[43]
Depgraph: Towards any structural pruning
Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, and Xinchao Wang. Depgraph: Towards any structural pruning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16091–16101, 2023
work page 2023
-
[44]
Xinyin Ma, Gongfan Fang, and Xinchao Wang. Llm-pruner: On the structural pruning of large language models.Advances in Neural Information Processing Systems, 36:21702–21720, 2023
work page 2023
-
[45]
Tinyfusion: Diffusion transformers learned shallow
Gongfan Fang, Kunjun Li, Xinyin Ma, and Xinchao Wang. Tinyfusion: Diffusion transformers learned shallow. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18144–18154, 2025
work page 2025
-
[46]
Ntire 2017 challenge on single image super-resolution: Dataset and study
Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 126–135, 2017. 12
work page 2017
-
[47]
Lsdir: A large scale dataset for image restoration
Yawei Li, Kai Zhang, Jingyun Liang, Jiezhang Cao, Ce Liu, Rui Gong, Yulun Zhang, Hao Tang, Yun Liu, Denis Demandolx, et al. Lsdir: A large scale dataset for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1775–1787, 2023
work page 2023
-
[48]
A style-based generator architecture for generative adversarial networks
Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019
work page 2019
-
[49]
Pixel-aware stable diffu- sion for realistic image super-resolution and personalized stylization
Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, and Lei Zhang. Pixel-aware stable diffu- sion for realistic image super-resolution and personalized stylization. InEuropean Conference on Computer Vision, pages 74–91. Springer, 2024
work page 2024
-
[50]
Zongsheng Yue, Jianyi Wang, and Chen Change Loy. Resshift: Efficient diffusion model for image super-resolution by residual shifting.Advances in Neural Information Processing Systems, 36:13294–13307, 2023
work page 2023
-
[51]
Aiping Zhang, Zongsheng Yue, Renjing Pei, Wenqi Ren, and Xiaochun Cao. Degradation- guided one-step image super-resolution with diffusion priors.arXiv preprint arXiv:2409.17058, 2024
-
[52]
Ying Tai, Rui Xie, Chen Zhao, Kai Zhang, Zhenyu Zhang, Jun Zhou, and Jian Yang. Addsr: Ac- celerating diffusion-based blind super-resolution with adversarial diffusion distillation.Pattern Recognition, page 113012, 2026
work page 2026
-
[53]
Component divide-and-conquer for real-world image super-resolution
Pengxu Wei, Ziwei Xie, Hannan Lu, Zongyuan Zhan, Qixiang Ye, Wangmeng Zuo, and Liang Lin. Component divide-and-conquer for real-world image super-resolution. InEuropean Conference on Computer Vision, pages 101–117. Springer, 2020. 13 Appendix A Additional Technical Details A.1 Prompt Construction and Protocol For every experimental run, the prompt source ...
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.