DreamSR: Towards Ultra-High-Resolution Image Super-Resolution via a Receptive-Field Enhanced Diffusion Transformer
Pith reviewed 2026-05-20 20:01 UTC · model grok-4.3
The pith
DreamSR pairs patch-level local prompts with global diffusion features to cut over-generation in ultra-high-resolution super-resolution.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DreamSR suppresses local over-generation and improves fine-detail synthesis by means of a dual-branch MM-ControlNet in which the ControlNet branch produces local textual features from patch-level prompts while the pre-trained DiT supplies global textual features from global prompts, together with a Receptive-Field Enhancement strategy and stage-specific data pipelines that together restore local textures and maintain semantic consistency across patches.
What carries the argument
Dual-branch MM-ControlNet that routes patch-level local prompts through ControlNet and global prompts through the pre-trained DiT, augmented by Receptive-Field Enhancement to strengthen local information capture.
If this is right
- Local over-generation is suppressed during each patch inference step.
- Fine local textures and details are synthesized more accurately.
- Semantic consistency holds across adjacent patches of the final image.
- Visually faithful results with ultra-high-quality details are obtained.
- Performance exceeds prior state-of-the-art methods on ultra-high-resolution inputs.
Where Pith is reading between the lines
- The same local-global prompt split could be tested on other tiled generative tasks such as large-scale image inpainting to reduce seam artifacts.
- Receptive-field enhancement might improve detail recovery in any diffusion pipeline that processes images larger than the model's native resolution.
- Staged training with patch-specific data could be reused to adapt existing DiT models for other resolution-sensitive restoration problems without full retraining.
Load-bearing premise
The method assumes that patch-level prompts from the ControlNet branch plus global prompts from the DiT will align semantics across patches during inference without creating fresh alignment problems or requiring per-image fixes.
What would settle it
Super-resolved outputs that display semantic mismatches or unnatural textures exactly at patch boundaries would show the central claim is not holding.
Figures
read the original abstract
Large-scale pre-trained diffusion models have been extensively adopted for real-world image Super-Resolution because of their powerful generative priors through textual guidance. However, when super-resolving high-resolution images with patch-wise inference strategy, most existing diffusion-based SR methods tend to suffer from over-generation, due to the misalignment between the global prompt from LR image and the incomplete semantic information of local patches during each inference step. On the other hand, most existing methods also failed to generate detailed texture in local patches due to the overemphasis on global generation capabilities in network designs and training strategies. To address this issue, we present DreamSR, a novel SR model that suppresses local over-generation and improves fine-detail synthesis, thereby achieving visually faithful results with ultra-high-quality details. Specifically, we propose a dual-branch MM-ControlNet, where the ControlNet generates local textual feature with patch-level prompts while the pre-trained DiT provides global textual feature with global prompts, thereby mitigating over-generation and ensuring semantic consistency across patches. We also design a comprehensive training strategy with stage-specific data processing pipelines and a Receptive-Field Enhancement strategy, enhancing the model's capability to capture patch information and effectively restore local textures. Extensive experiments demonstrate that DreamSR outperforms state-of-the-art methods, providing high-quality SR results. Code and model are available at https://github.com/jerrydong0219/DreamSR.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces DreamSR, a diffusion transformer-based model for ultra-high-resolution image super-resolution. It identifies over-generation in patch-wise inference as arising from misalignment between global prompts (from LR images) and incomplete local patch semantics, plus insufficient local texture detail due to overemphasis on global generation. The proposed solution is a dual-branch MM-ControlNet in which one branch (ControlNet) supplies local textual features via patch-level prompts and the pre-trained DiT branch supplies global textual features, combined with a Receptive-Field Enhancement strategy and stage-specific data-processing pipelines during training. The authors claim this yields semantically consistent, high-detail SR outputs that outperform prior SOTA methods, with code and models released.
Significance. If the central architectural and training claims are substantiated by quantitative results and ablations, the work would offer a practical advance for real-world diffusion SR at ultra-high resolutions by directly targeting the local-global consistency problem that arises in patch-based inference. The public release of code and models strengthens reproducibility and potential impact.
major comments (2)
- Abstract and §3 (method description): the central claim that the dual-branch MM-ControlNet 'mitigates over-generation and ensures semantic consistency across patches' rests on the fusion of local patch-level prompts with global DiT features, yet no derivation, diagram, or specification is given for the fusion operator (cross-attention weights, concatenation point inside transformer blocks, or conditioning scale). Without this, it is impossible to verify that the same local-global misalignment the paper diagnoses does not reappear at patch boundaries.
- §4 (experiments): the abstract states 'extensive experiments demonstrate that DreamSR outperforms state-of-the-art methods' but provides no quantitative tables, PSNR/SSIM/LPIPS numbers, or ablation studies on the fusion mechanism or Receptive-Field Enhancement. This absence makes the performance claim load-bearing yet unverifiable from the given text.
minor comments (2)
- Notation: the acronym 'MM-ControlNet' is introduced without expansion or reference to prior ControlNet literature; a brief definition would improve clarity.
- The Receptive-Field Enhancement strategy is mentioned but not located to a specific subsection or equation; adding a dedicated paragraph or figure would help readers trace its contribution.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and recommendation for major revision. We address each point below with clarifications and commit to specific revisions that strengthen the methodological description and experimental validation without altering the core contributions.
read point-by-point responses
-
Referee: Abstract and §3 (method description): the central claim that the dual-branch MM-ControlNet 'mitigates over-generation and ensures semantic consistency across patches' rests on the fusion of local patch-level prompts with global DiT features, yet no derivation, diagram, or specification is given for the fusion operator (cross-attention weights, concatenation point inside transformer blocks, or conditioning scale). Without this, it is impossible to verify that the same local-global misalignment the paper diagnoses does not reappear at patch boundaries.
Authors: We agree that the current description of the fusion operator lacks sufficient technical detail. In the revised manuscript we will add an explicit mathematical formulation of the fusion step, specifying that patch-level features from the ControlNet branch are injected into the pre-trained DiT blocks via cross-attention with learnable conditioning scales. We will also insert a new figure that diagrams the exact insertion point inside each transformer block and the attention-weight computation. These additions will directly demonstrate how the architecture prevents re-introduction of local-global misalignment at patch boundaries. revision: yes
-
Referee: §4 (experiments): the abstract states 'extensive experiments demonstrate that DreamSR outperforms state-of-the-art methods' but provides no quantitative tables, PSNR/SSIM/LPIPS numbers, or ablation studies on the fusion mechanism or Receptive-Field Enhancement. This absence makes the performance claim load-bearing yet unverifiable from the given text.
Authors: We acknowledge that the experimental presentation would be strengthened by more prominent quantitative reporting. In the revision we will add a new table (Table 1) reporting PSNR, SSIM and LPIPS on standard benchmarks against recent SOTA diffusion SR methods, and expand Section 4.3 with dedicated ablations that isolate the contribution of the fusion operator and the Receptive-Field Enhancement strategy, including both quantitative metrics and qualitative examples. revision: yes
Circularity Check
No significant circularity detected in DreamSR derivation
full rationale
The paper proposes DreamSR as a new architecture featuring a dual-branch MM-ControlNet (ControlNet for local patch-level prompts, pre-trained DiT for global prompts) plus a Receptive-Field Enhancement strategy and stage-specific training pipelines. These elements are presented as direct design responses to diagnosed issues of over-generation and texture loss in existing patch-wise diffusion SR methods. No equations, fitted parameters, or predictions are described that reduce by construction to the model's own inputs or outputs. The central claims rest on architectural and empirical innovations rather than self-referential derivations, self-citation chains, or renamed known results, rendering the approach self-contained.
Axiom & Free-Parameter Ledger
free parameters (1)
- Stage-specific data processing parameters
axioms (1)
- domain assumption Large-scale pre-trained diffusion models provide powerful generative priors through textual guidance.
Reference graph
Works this paper leans on
-
[1]
Dreamclear: High-capacity real-world image restoration with privacy-safe dataset curation
Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Xiaotian Han, Zhengyu Chen, Quanzeng You, and Hongxia Yang. Dreamclear: High-capacity real-world image restoration with privacy-safe dataset curation. Advances in Neural Information Processing Systems, 37:55443–55469, 2024. 3, 4, 5, 6
work page 2024
-
[2]
Multidiffusion: Fusing diffusion paths for controlled image generation
Omer Bar-Tal, Lior Yariv, Yaron Lipman, and Tali Dekel. Multidiffusion: Fusing diffusion paths for controlled image generation. 2023. 1, 4
work page 2023
-
[3]
Flux.https://github.com/ black- forest- labs/flux, 2024
Black Forest Labs. Flux.https://github.com/ black- forest- labs/flux, 2024. Accessed: 2024. 1, 3
work page 2024
-
[4]
The perception-distortion tradeoff
Yochai Blau and Tomer Michaeli. The perception-distortion tradeoff. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6228–6237, 2018. 6
work page 2018
-
[5]
Toward real-world single image super-resolution: A new benchmark and a new model
Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, and Lei Zhang. Toward real-world single image super-resolution: A new benchmark and a new model. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3086–3095, 2019. 5
work page 2019
-
[6]
Glean: Generative latent bank for large-factor image super-resolution
Kelvin CK Chan, Xintao Wang, Xiangyu Xu, Jinwei Gu, and Chen Change Loy. Glean: Generative latent bank for large-factor image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14245–14254, 2021. 1
work page 2021
-
[7]
Adversarial diffu- sion compression for real-world image super-resolution
Bin Chen, Gehui Li, Rongyuan Wu, Xindong Zhang, Jie Chen, Jian Zhang, and Lei Zhang. Adversarial diffu- sion compression for real-world image super-resolution. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 28208–28220, 2025. 2
work page 2025
-
[8]
Real-world blind super-resolution via feature matching with implicit high-resolution priors
Chaofeng Chen, Xinyu Shi, Yipeng Qin, Xiaoming Li, Xi- aoguang Han, Tao Yang, and Shihui Guo. Real-world blind super-resolution via feature matching with implicit high-resolution priors. In Proceedings of the 30th ACM International Conference on Multimedia, pages 1329–1338,
-
[9]
Pre-trained image processing transformer
Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yip- ing Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, and Wen Gao. Pre-trained image processing transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12299–12310, 2021. 2
work page 2021
-
[10]
PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, et al. Pixart-α: Fast training of diffusion transformer for photorealistic text-to-image synthesis. arXiv preprint arXiv:2310.00426, 2023. 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[11]
Faithd- iff: Unleashing diffusion priors for faithful image super- resolution
Junyang Chen, Jinshan Pan, and Jiangxin Dong. Faithd- iff: Unleashing diffusion priors for faithful image super- resolution. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 28188–28197, 2025. 5, 6
work page 2025
-
[12]
Activating more pixels in image super- resolution transformer
Xiangyu Chen, Xintao Wang, Jiantao Zhou, Yu Qiao, and Chao Dong. Activating more pixels in image super- resolution transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22367–22377, 2023. 2
work page 2023
-
[13]
Dual aggregation transformer for image super-resolution
Zheng Chen, Yulun Zhang, Jinjin Gu, Linghe Kong, Xi- aokang Yang, and Fisher Yu. Dual aggregation transformer for image super-resolution. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12312– 12321, 2023. 2
work page 2023
-
[14]
Effective diffusion transformer architecture for image super- resolution
Kun Cheng, Lei Yu, Zhijun Tu, Xiao He, Liyu Chen, Yong Guo, Mingrui Zhu, Nannan Wang, Xinbo Gao, and Jie Hu. Effective diffusion transformer architecture for image super- resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2455–2463, 2025. 2
work page 2025
-
[15]
Taming diffusion prior for image super-resolution with domain shift sdes
Qinpeng Cui, Yixuan Liu, Xinyi Zhang, Qiqi Bao, Qing- min Liao, Li Wang, Tian Lu, Zicheng Liu, Zhongdao Wang, and Emad Barsoum. Taming diffusion prior for image super-resolution with domain shift sdes. arXiv preprint arXiv:2409.17778, 2024. 2
-
[16]
Second-order attention network for single im- age super-resolution
Tao Dai, Jianrui Cai, Yongbing Zhang, Shu-Tao Xia, and Lei Zhang. Second-order attention network for single im- age super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11065–11074, 2019. 2
work page 2019
-
[17]
Acquire and then adapt: Squeezing out text-to-image model for image restoration
Junyuan Deng, Xinyi Wu, Yongxing Yang, Congchao Zhu, Song Wang, and Zhenyao Wu. Acquire and then adapt: Squeezing out text-to-image model for image restoration. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 23195–23206, 2025. 2
work page 2025
-
[18]
Diffusion mod- els beat gans on image synthesis
Prafulla Dhariwal and Alexander Nichol. Diffusion mod- els beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021. 3
work page 2021
-
[19]
Learning a deep convolutional network for im- age super-resolution
Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Learning a deep convolutional network for im- age super-resolution. In European conference on computer vision, pages 184–199. Springer, 2014. 2
work page 2014
-
[20]
Tsd-sr: One-step diffusion with target score distillation for real-world image super-resolution
Linwei Dong, Qingnan Fan, Yihong Guo, Zhonghao Wang, Qi Zhang, Jinwei Chen, Yawei Luo, and Changqing Zou. Tsd-sr: One-step diffusion with target score distillation for real-world image super-resolution. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 23174–23184, 2025. 4
work page 2025
-
[21]
Dit4sr: Taming diffusion transformer for real-world image super-resolution
Zheng-Peng Duan, Jiawei Zhang, Xin Jin, Ziheng Zhang, Zheng Xiong, Dongqing Zou, Jimmy S Ren, Chunle Guo, and Chongyi Li. Dit4sr: Taming diffusion transformer for real-world image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 18948–18958, 2025. 6
work page 2025
-
[22]
Scaling recti- fied flow transformers for high-resolution image synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M ¨uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling recti- fied flow transformers for high-resolution image synthesis. In Forty-first international conference on machine learning,
-
[23]
Consissr: Delving deep into consistency in diffusion-based image super-resolution
Junhao Gu, Peng-Tao Jiang, Hao Zhang, Mi Zhou, Jinwei Chen, Wenming Yang, and Bo Li. Consissr: Delving deep into consistency in diffusion-based image super-resolution
-
[24]
Denoising dif- fusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020. 3
work page 2020
-
[25]
Lora: Low-rank adaptation of large language models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022. 3
work page 2022
-
[26]
Pipal: a large-scale image quality assessment dataset for perceptual image restoration
Gu Jinjin, Cai Haoming, Chen Haoyu, Ye Xiaoxing, Jimmy S Ren, and Dong Chao. Pipal: a large-scale image quality assessment dataset for perceptual image restoration. In European conference on computer vision, pages 633–651. Springer, 2020. 6
work page 2020
-
[27]
A style-based generator architecture for generative adversarial networks
Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019. 5
work page 2019
-
[28]
Musiq: Multi-scale image quality transformer
Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5148–5157, 2021. 5
work page 2021
-
[29]
Photo- realistic single image super-resolution using a generative ad- versarial network
Christian Ledig, Lucas Theis, Ferenc Husz´ar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo- realistic single image super-resolution using a generative ad- versarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690,
-
[30]
Distillation-free one-step diffusion for real-world image super-resolution
Jianze Li, Jiezhang Cao, Zichen Zou, Xiongfei Su, Xin Yuan, Yulun Zhang, Yong Guo, and Xiaokang Yang. Distillation-free one-step diffusion for real-world image super-resolution. 2024. 2
work page 2024
-
[31]
Jianze Li, Jiezhang Cao, Yong Guo, Wenbo Li, and Yulun Zhang. One diffusion step to real-world super-resolution via flow trajectory distillation.arXiv preprint arXiv:2502.01993,
-
[32]
Lsdir: A large scale dataset for image restoration
Yawei Li, Kai Zhang, Jingyun Liang, Jiezhang Cao, Ce Liu, Rui Gong, Yulun Zhang, Hao Tang, Yun Liu, Denis Deman- dolx, et al. Lsdir: A large scale dataset for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1775–1787, 2023. 5
work page 2023
-
[33]
Swinir: Image restoration using swin transformer
Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1833– 1844, 2021. 2
work page 2021
-
[34]
Jie Liang, Hui Zeng, and Lei Zhang. Details or artifacts: A locally discriminative learning approach to realistic im- age super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5657–5666, 2022. 2
work page 2022
-
[35]
Efficient and degradation-adaptive network for real-world image super- resolution
Jie Liang, Hui Zeng, and Lei Zhang. Efficient and degradation-adaptive network for real-world image super- resolution. In European Conference on Computer Vision, pages 574–591. Springer, 2022. 2
work page 2022
-
[36]
Enhanced deep residual networks for single image super-resolution
Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 136–144, 2017. 2
work page 2017
-
[37]
Diff- bir: Toward blind image restoration with generative diffusion prior
Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Bo Dai, Fanghua Yu, Yu Qiao, Wanli Ouyang, and Chao Dong. Diff- bir: Toward blind image restoration with generative diffusion prior. In European conference on computer vision, pages 430–448. Springer, 2024. 4, 6
work page 2024
-
[38]
Harnessing diffusion-yielded score priors for image restoration
Xinqi Lin, Fanghua Yu, Jinfan Hu, Zhiyuan You, Wu Shi, Jimmy S Ren, Jinjin Gu, and Chao Dong. Harnessing diffusion-yielded score priors for image restoration. arXiv preprint arXiv:2507.20590, 2025. 2
-
[39]
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. Advances in neural information processing systems, 36:34892–34916, 2023. 4
work page 2023
-
[40]
Unfolding once is enough: A deployment-friendly trans- former unit for super-resolution
Yong Liu, Hang Dong, Boyang Liang, Songwei Liu, Qingji Dong, Kai Chen, Fangmin Chen, Lean Fu, and Fei Wang. Unfolding once is enough: A deployment-friendly trans- former unit for super-resolution. In Proceedings of the 31st ACM international conference on multimedia, pages 7952– 7960, 2023. 2
work page 2023
-
[41]
Patchscaler: An efficient patch-independent diffusion model for image super- resolution
Yong Liu, Hang Dong, Jinshan Pan, Qingji Dong, Kai Chen, Rongxiang Zhang, Lean Fu, and Fei Wang. Patchscaler: An efficient patch-independent diffusion model for image super- resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11283–11293, 2025. 2
work page 2025
-
[42]
You only need one step: Fast super-resolution with stable diffusion via scale distillation
Mehdi Noroozi, Isma Hadji, Brais Martinez, Adrian Bulat, and Georgios Tzimiropoulos. You only need one step: Fast super-resolution with stable diffusion via scale distillation. In European Conference on Computer Vision, pages 145–
-
[43]
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M ¨uller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion mod- els for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023. 1, 7
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[44]
Xpsr: Cross-modal priors for diffusion-based image super-resolution
Yunpeng Qu, Kun Yuan, Kai Zhao, Qizhi Xie, Jinhua Hao, Ming Sun, and Chao Zhou. Xpsr: Cross-modal priors for diffusion-based image super-resolution. In European Conference on Computer Vision, pages 285–303. Springer,
-
[45]
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image gener- ation with clip latents. arXiv preprint arXiv:2204.06125, 1 (2):3, 2022. 3
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[46]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 3, 7
work page 2022
-
[47]
Photorealistic text-to-image diffusion models with deep language understanding
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. Advances in neural information processing systems, 35:36479–36494, 2022. 3
work page 2022
-
[48]
Coser: Bridging image and language for cognitive super-resolution
Haoze Sun, Wenbo Li, Jianzhuang Liu, Haoyu Chen, Ren- jing Pei, Xueyi Zou, Youliang Yan, and Yujiu Yang. Coser: Bridging image and language for cognitive super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25868–25878, 2024. 2
work page 2024
-
[49]
Improving the stability of diffusion models for content consistent super-resolution
Lingchen Sun, Rongyuan Wu, Zhengqiang Zhang, Hongwei Yong, and Lei Zhang. Improving the stability of diffusion models for content consistent super-resolution. CoRR, 2024. 2
work page 2024
-
[50]
Pixel-level and semantic- level adjustable super-resolution: A dual-lora approach
Lingchen Sun, Rongyuan Wu, Zhiyuan Ma, Shuaizheng Liu, Qiaosi Yi, and Lei Zhang. Pixel-level and semantic- level adjustable super-resolution: A dual-lora approach. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 2333–2343, 2025. 2
work page 2025
-
[51]
Holisdip: Image super-resolution via holistic semantics and diffusion prior
Li-Yuan Tsao, Hao-Wei Chen, Hao-Wei Chung, Deqing Sun, Chun-Yi Lee, Kelvin CK Chan, and Ming-Hsuan Yang. Holisdip: Image super-resolution via holistic semantics and diffusion prior. arXiv preprint arXiv:2411.18662, 2024. 2
-
[52]
Yuhao Wan, Peng-Tao Jiang, Qibin Hou, Hao Zhang, Jin- wei Chen, Ming-Ming Cheng, and Bo Li. Clearsr: Latent low-resolution image embeddings help diffusion-based real- world super resolution models see clearer. 2024. 2
work page 2024
-
[53]
Exploring clip for assessing the look and feel of im- ages
Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Exploring clip for assessing the look and feel of im- ages. In Proceedings of the AAAI conference on artificial intelligence, pages 2555–2563, 2023. 5
work page 2023
-
[54]
Exploiting diffusion prior for real-world image super-resolution
Jianyi Wang, Zongsheng Yue, Shangchen Zhou, Kelvin CK Chan, and Chen Change Loy. Exploiting diffusion prior for real-world image super-resolution. International Journal of Computer Vision, 132(12):5929–5949, 2024. 2, 6
work page 2024
-
[55]
Esrgan: En- hanced super-resolution generative adversarial networks
Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. Esrgan: En- hanced super-resolution generative adversarial networks. In Proceedings of the European conference on computer vision (ECCV) workshops, pages 0–0, 2018. 2
work page 2018
-
[56]
To- wards real-world blind face restoration with generative fa- cial prior
Xintao Wang, Yu Li, Honglun Zhang, and Ying Shan. To- wards real-world blind face restoration with generative fa- cial prior. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9168–9178,
-
[57]
Real-esrgan: Training real-world blind super-resolution with pure synthetic data
Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1905– 1914, 2021. 2, 4, 6
work page 1905
-
[58]
Sinsr: diffusion-based image super- resolution in a single step
Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C Kot, and Bihan Wen. Sinsr: diffusion-based image super- resolution in a single step. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 25796–25805, 2024. 2
work page 2024
-
[59]
Image quality assessment: from error visibility to structural similarity
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004. 5
work page 2004
-
[60]
Component divide-and-conquer for real-world image super-resolution
Pengxu Wei, Ziwei Xie, Hannan Lu, Zongyuan Zhan, Qix- iang Ye, Wangmeng Zuo, and Liang Lin. Component divide-and-conquer for real-world image super-resolution. In European conference on computer vision, pages 101–117. Springer, 2020. 5
work page 2020
-
[61]
One-step effective diffusion network for real-world im- age super-resolution
Rongyuan Wu, Lingchen Sun, Zhiyuan Ma, and Lei Zhang. One-step effective diffusion network for real-world im- age super-resolution. Advances in Neural Information Processing Systems, 37:92529–92553, 2024. 4, 6
work page 2024
-
[62]
Seesr: Towards semantics- aware real-world image super-resolution
Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, and Lei Zhang. Seesr: Towards semantics- aware real-world image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 25456–25467, 2024. 2, 5, 6
work page 2024
-
[63]
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers
Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, et al. Sana: Efficient high-resolution image syn- thesis with linear diffusion transformers. arXiv preprint arXiv:2410.10629, 2024. 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[64]
Desra: detect and delete the artifacts of gan-based real-world super-resolution models
Liangbin Xie, Xintao Wang, Xiangyu Chen, Gen Li, Ying Shan, Jiantao Zhou, and Chao Dong. Desra: detect and delete the artifacts of gan-based real-world super-resolution models. arXiv preprint arXiv:2307.02457, 2023. 2
-
[65]
Rui Xie, Chen Zhao, Kai Zhang, Zhenyu Zhang, Jun Zhou, Jian Yang, and Ying Tai. Addsr: Accelerating diffusion- based blind super-resolution with adversarial diffusion dis- tillation. arXiv preprint arXiv:2404.01717, 2024. 2
-
[66]
Maniqa: Multi-dimension attention network for no-reference image quality assessment
Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1191–1200, 2022. 5
work page 2022
-
[67]
Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization
Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, and Lei Zhang. Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization. In European conference on computer vision, pages 74–91. Springer,
-
[68]
Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, and Chao Dong. Scaling up to excellence: Practicing model scaling for photo- realistic image restoration in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 25669–25680, 2024. 1, 2, 4, 6
work page 2024
-
[69]
Resshift: Efficient diffusion model for image super- resolution by residual shifting
Zongsheng Yue, Jianyi Wang, and Chen Change Loy. Resshift: Efficient diffusion model for image super- resolution by residual shifting. Advances in Neural Information Processing Systems, 36:13294–13307, 2023. 2
work page 2023
-
[70]
Effi- cient diffusion model for image restoration by residual shift- ing
Zongsheng Yue, Jianyi Wang, and Chen Change Loy. Effi- cient diffusion model for image restoration by residual shift- ing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 1
work page 2024
-
[71]
Aiping Zhang, Zongsheng Yue, Renjing Pei, Wenqi Ren, and Xiaochun Cao. Degradation-guided one-step im- age super-resolution with diffusion priors. arXiv preprint arXiv:2409.17058, 2024. 2
-
[72]
Designing a practical degradation model for deep blind image super-resolution
Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timo- fte. Designing a practical degradation model for deep blind image super-resolution. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4791– 4800, 2021. 2, 6
work page 2021
-
[73]
The unreasonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018. 5
work page 2018
-
[74]
Efficient long-range attention network for image super- resolution
Xindong Zhang, Hui Zeng, Shi Guo, and Lei Zhang. Efficient long-range attention network for image super- resolution. In European conference on computer vision, pages 649–667. Springer, 2022. 2
work page 2022
-
[75]
Image super-resolution using very deep residual channel attention networks
Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European conference on computer vision (ECCV), pages 286–301, 2018. 2
work page 2018
-
[76]
Residual dense network for image super-resolution
Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu. Residual dense network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2472–2481, 2018. 2
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.