pith. machine review for the scientific record. sign in

arxiv: 2603.21045 · v5 · submitted 2026-03-22 · 💻 cs.CV · cs.AI

Recognition: no theorem link

LPNSR: Optimal Noise-Guided Diffusion Image Super-Resolution Via Learnable Noise Prediction

Authors on Pith no claims yet

Pith reviewed 2026-05-15 07:26 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords diffusion modelsimage super-resolutionnoise predictionoptimal noisemaximum likelihood estimationresidual shiftingperceptual qualityfew-step sampling
0
0 comments X

The pith

Diffusion super-resolution models achieve stable results by using a learned predictor for optimal noise instead of random sampling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to solve the variability in diffusion-based image super-resolution caused by random noise injection during sampling, especially in short trajectories. It derives a closed-form solution for the optimal noise at each step using maximum likelihood estimation, which reveals a conditional dependence on the low-resolution input that holds across different diffusion approaches. This insight is applied in the residual-shifting paradigm by creating a noise predictor that takes multiple inputs including the low-resolution image. Combined with pre-upsampling, it allows training a compact four-step model end-to-end. Readers would care because it makes fast, reliable super-resolution possible without depending on massive pre-trained models.

Core claim

We establish a theoretical framework that derives the closed-form analytical solution for optimal intermediate noise in diffusion models from a maximum likelihood estimation perspective, revealing a consistent conditional dependence structure that generalizes across diffusion paradigms. We instantiate this framework under the residual-shifting diffusion paradigm and accordingly design an LR-guided multi-input-aware noise predictor to replace random Gaussian noise. We further mitigate initialization bias with a high-quality pre-upsampling network. The compact 4-step trajectory uniquely enables end-to-end optimization of the entire reverse chain.

What carries the argument

The LR-guided multi-input-aware noise predictor, which instantiates the derived conditional dependence structure to generate optimal noise at each diffusion step.

Load-bearing premise

The derived conditional dependence structure for optimal noise generalizes across diffusion paradigms and can be instantiated in the residual-shifting setup without introducing fitting biases that undermine the solution.

What would settle it

Comparing perceptual quality metrics and output variance between the proposed model and a version using standard random Gaussian noise on the same datasets would show if the learned predictor provides the claimed improvement and stability.

Figures

Figures reproduced from arXiv: 2603.21045 by Shizhuo Liu, Shuwei Huang, Zijun Wei.

Figure 1
Figure 1. Figure 1: Qualitative comparison of our PreSet-A and PreSet-B methods under different sampling [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of the intermediate noise maps generated by our proposed noise predictor [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Statistical distribution analysis of the outputs from our LR-guided noise predictor. From [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visual results of different methods on two typical real-world examples. (Zoom in for best [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison of different noise injection strategies. (Zoom in for best view) [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: More visualization comparisons of different models. (Zoom in for best view) [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: More visualization comparisons of different models. (Zoom in for best view) [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
read the original abstract

Diffusion-based image super-resolution (SR) aims to reconstruct high-resolution (HR) images from low-resolution (LR) observations. However, the inherent randomness injected during the reverse diffusion process causes the performance of diffusion-based SR models to vary significantly across different sampling runs, particularly when the sampling trajectory is compressed into a limited number of steps. A critical yet underexplored question is: what is the optimal noise to inject at each intermediate diffusion step? In this paper, we establish a theoretical framework that derives the closed-form analytical solution for optimal intermediate noise in diffusion models from a maximum likelihood estimation perspective, revealing a consistent conditional dependence structure that generalizes across diffusion paradigms. We instantiate this framework under the residual-shifting diffusion paradigm and accordingly design an LR-guided multi-input-aware noise predictor to replace random Gaussian noise. We further mitigate initialization bias with a high-quality pre-upsampling network. The compact 4-step trajectory uniquely enables end-to-end optimization of the entire reverse chain, which is computationally prohibitive for conventional long-trajectory diffusion models. Extensive experiments demonstrate that LPNSR achieves state-of-the-art perceptual performance on both synthetic and real-world datasets, without relying on any large-scale text-to-image priors. The source code of our method can be found at https://github.com/Faze-Hsw/LPNSR.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims to establish a theoretical framework deriving a closed-form analytical solution for optimal intermediate noise in diffusion models via maximum likelihood estimation, revealing a conditional dependence structure that generalizes across paradigms; it instantiates this under residual-shifting diffusion with an LR-guided multi-input-aware noise predictor, adds a high-quality pre-upsampling network to mitigate initialization bias, and demonstrates SOTA perceptual performance on synthetic and real-world SR datasets using a compact 4-step trajectory without large text-to-image priors.

Significance. If the MLE derivation is sound and the learnable predictor faithfully implements the claimed optimal noise without introducing unaccounted biases, the work would provide a principled way to reduce variance and sampling steps in diffusion SR while maintaining quality, offering an alternative to random Gaussian noise that could improve efficiency and consistency in low-step regimes.

major comments (2)
  1. [Abstract / Theoretical Framework] Abstract and theoretical framework section: the claim of a closed-form MLE solution yielding an invariant conditional dependence structure requires explicit algebraic steps (e.g., the derivation from the likelihood objective through the residual-shifting forward process) to confirm that assumptions such as noise-LR independence survive instantiation; without these steps the generalization claim rests on an unverified transition from derivation to learnable predictor.
  2. [Experiments] Experiments section: SOTA perceptual claims are made for the 4-step trajectory, yet the abstract (and visible summary) provides no error bars, ablation results on the noise predictor components, or quantitative baseline tables; these omissions make it impossible to assess whether the reported gains are robust or attributable to the derived noise structure versus the pre-upsampler.
minor comments (1)
  1. [Method] The manuscript should clarify the exact architecture of the LR-guided multi-input-aware noise predictor (input channels, conditioning mechanism) and confirm that the 4-step end-to-end optimization does not inadvertently overfit to the training distribution used for the predictor parameters.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We will revise the manuscript to address the points on the theoretical derivation and experimental reporting, as detailed below.

read point-by-point responses
  1. Referee: [Abstract / Theoretical Framework] Abstract and theoretical framework section: the claim of a closed-form MLE solution yielding an invariant conditional dependence structure requires explicit algebraic steps (e.g., the derivation from the likelihood objective through the residual-shifting forward process) to confirm that assumptions such as noise-LR independence survive instantiation; without these steps the generalization claim rests on an unverified transition from derivation to learnable predictor.

    Authors: We agree that explicit algebraic steps will strengthen the presentation. In the revised manuscript, we will expand the theoretical framework section with the full derivation: starting from the maximum likelihood estimation objective, proceeding step-by-step through the residual-shifting forward process, and explicitly verifying that the noise-LR independence assumption holds under the model. This will also clarify the transition from the closed-form optimal noise to the LR-guided multi-input-aware predictor and support the generalization claim across paradigms. revision: yes

  2. Referee: [Experiments] Experiments section: SOTA perceptual claims are made for the 4-step trajectory, yet the abstract (and visible summary) provides no error bars, ablation results on the noise predictor components, or quantitative baseline tables; these omissions make it impossible to assess whether the reported gains are robust or attributable to the derived noise structure versus the pre-upsampler.

    Authors: We acknowledge that the current experimental reporting lacks sufficient detail for full assessment. In the revised version, we will add error bars (computed over multiple independent runs) to all quantitative metrics, include ablation studies isolating the contributions of the noise predictor components (LR-guidance and multi-input awareness), and provide expanded quantitative baseline tables. These additions will demonstrate robustness and help attribute gains to the derived noise structure versus the pre-upsampler. revision: yes

Circularity Check

0 steps flagged

Theoretical MLE derivation of optimal noise stands independent of learnable instantiation

full rationale

The paper first presents a theoretical framework deriving a closed-form analytical solution for optimal intermediate noise via maximum likelihood estimation, revealing a conditional dependence structure claimed to generalize across paradigms. This derivation is positioned prior to and separate from the subsequent instantiation under the residual-shifting paradigm and the design of an LR-guided multi-input-aware noise predictor. No equations or steps reduce the MLE result to fitted parameters by construction, nor rely on self-citations, ansatz smuggling, or renaming of known results. The end-to-end optimization of the 4-step chain is a practical engineering choice enabled by the short trajectory, but does not make the initial analytical derivation circular. The central claim therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the validity of the MLE derivation for optimal noise and the assumption that the residual-shifting paradigm preserves the derived conditional structure; the learnable predictor introduces trainable parameters whose values are not supplied by the derivation.

free parameters (1)
  • parameters of the LR-guided noise predictor
    The multi-input-aware network that predicts noise is trained end-to-end; its weights are fitted to data and not fixed by the closed-form derivation.
axioms (1)
  • domain assumption The conditional dependence structure derived from MLE generalizes across diffusion paradigms
    Invoked when the authors instantiate the framework under the residual-shifting paradigm.
invented entities (1)
  • LR-guided multi-input-aware noise predictor no independent evidence
    purpose: Replaces random Gaussian noise with predicted optimal noise at each diffusion step
    New network component introduced to realize the derived optimal noise; no independent evidence outside the paper is provided.

pith-pipeline@v0.9.0 · 5541 in / 1439 out tokens · 31868 ms · 2026-05-15T07:26:17.805694+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 7 internal anchors

  1. [1]

    Fleet, and Mohammad Norouzi

    Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, and Mohammad Norouzi. Image super-resolution via iterative refinement.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4713–4726, 2023

  2. [2]

    Resshift: Efficient diffusion model for image super-resolution by residual shifting

    Zongsheng Yue, Jianyi Wang, and Chen Change Loy. Resshift: Efficient diffusion model for image super-resolution by residual shifting. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 13294–13307. Curran Associates, Inc., 2023

  3. [3]

    Exploit- ing diffusion prior for real-world image super-resolution.International Journal of Computer Vision, 132(12):5929–5949, 2024

    Jianyi Wang, Zongsheng Yue, Shangchen Zhou, Kelvin CK Chan, and Chen Change Loy. Exploit- ing diffusion prior for real-world image super-resolution.International Journal of Computer Vision, 132(12):5929–5949, 2024

  4. [4]

    Arbitrary-steps image super-resolution via diffusion inversion

    Zongsheng Yue, Kang Liao, and Chen Change Loy. Arbitrary-steps image super-resolution via diffusion inversion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23153–23163, June 2025

  5. [5]

    Denoising diffusion restoration models

    Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. Denoising diffusion restoration models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Information Processing Systems, volume 35, pages 23593–23606. Curran Associates, Inc., 2022

  6. [6]

    Come-closer-diffuse-faster: Accelerating conditional diffusion models for inverse problems through stochastic contraction

    Hyungjin Chung, Byeongsu Sim, and Jong Chul Ye. Come-closer-diffuse-faster: Accelerating conditional diffusion models for inverse problems through stochastic contraction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12413–12422, June 2022

  7. [7]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, June 2022

  8. [8]

    One-step effective diffusion network for real-world image super-resolution.Advances in Neural Information Processing Systems, 37:92529–92553, 2024

    Rongyuan Wu, Lingchen Sun, Zhiyuan Ma, and Lei Zhang. One-step effective diffusion network for real-world image super-resolution.Advances in Neural Information Processing Systems, 37:92529–92553, 2024

  9. [9]

    Seesr: Towards semantics-aware real-world image super-resolution

    Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, and Lei Zhang. Seesr: Towards semantics-aware real-world image super-resolution. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 25456–25467, 2024

  10. [10]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 6840–6851. Curran Associates, Inc., 2020

  11. [11]

    Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps.Advances in neural information processing systems, 35:5775–5787, 2022

    Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps.Advances in neural information processing systems, 35:5775–5787, 2022

  12. [12]

    Improved denoising diffusion probabilistic models

    Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International conference on machine learning, pages 8162–8171. PMLR, 2021

  13. [13]

    Denoising Diffusion Implicit Models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020

  14. [14]

    Sinsr: diffusion-based image super-resolution in a single step

    Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C Kot, and Bihan Wen. Sinsr: diffusion-based image super-resolution in a single step. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 25796–25805, 2024

  15. [15]

    Improving diffusion models for inverse problems using manifold constraints.Advances in Neural Information Processing Systems, 35:25683– 25696, 2022

    Hyungjin Chung, Byeongsu Sim, Dohoon Ryu, and Jong Chul Ye. Improving diffusion models for inverse problems using manifold constraints.Advances in Neural Information Processing Systems, 35:25683– 25696, 2022

  16. [16]

    Diffusion Posterior Sampling for General Noisy Inverse Problems

    Hyungjin Chung, Jeongsol Kim, Michael T Mccann, Marc L Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems.arXiv preprint arXiv:2209.14687, 2022

  17. [17]

    Generative diffusion prior for unified image restoration and enhancement

    Ben Fei, Zhaoyang Lyu, Liang Pan, Junzhe Zhang, Weidong Yang, Tianyue Luo, Bo Zhang, and Bo Dai. Generative diffusion prior for unified image restoration and enhancement. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9935–9946, 2023. 10

  18. [18]

    Pseudoinverse-guided diffusion models for inverse problems

    Jiaming Song, Arash Vahdat, Morteza Mardani, and Jan Kautz. Pseudoinverse-guided diffusion models for inverse problems. InInternational Conference on Learning Representations, 2023

  19. [19]

    Dreamclean: Restoring clean image using deep diffusion prior

    Jie Xiao, Ruili Feng, Han Zhang, Zhiheng Liu, Zhantao Yang, Yurui Zhu, Xueyang Fu, Kai Zhu, Yu Liu, and Zheng-Jun Zha. Dreamclean: Restoring clean image using deep diffusion prior. InThe Twelfth International Conference on Learning Representations, 2024

  20. [20]

    Difface: Blind face restoration with diffused error contraction

    Zongsheng Yue and Chen Change Loy. Difface: Blind face restoration with diffused error contraction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):9991–10004, 2024

  21. [21]

    Image super-resolution using deep convolutional networks.IEEE transactions on pattern analysis and machine intelligence, 38(2):295–307, 2015

    Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Image super-resolution using deep convolutional networks.IEEE transactions on pattern analysis and machine intelligence, 38(2):295–307, 2015

  22. [22]

    Deep learning for image super resolution

    Pablo Rojas Sedó. Deep learning for image super resolution. B.S. thesis, Universitat Politècnica de Catalunya, 2022

  23. [23]

    Image super-resolution via progressive cascading residual network

    Namhyuk Ahn, Byungkon Kang, and Kyung-Ah Sohn. Image super-resolution via progressive cascading residual network. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 791–799, 2018

  24. [24]

    Accurate image super-resolution using very deep convolutional networks

    Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Accurate image super-resolution using very deep convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1646–1654, 2016

  25. [25]

    Deep networks for image super-resolution with sparse prior

    Zhaowen Wang, Ding Liu, Jianchao Yang, Wei Han, and Thomas Huang. Deep networks for image super-resolution with sparse prior. InProceedings of the IEEE international conference on computer vision, pages 370–378, 2015

  26. [26]

    Photo-realistic single image super- resolution using a generative adversarial network

    Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-realistic single image super- resolution using a generative adversarial network. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690, 2017

  27. [27]

    Pulse: Self-supervised photo upsampling via latent space exploration of generative models

    Sachit Menon, Alexandru Damian, Shijia Hu, Nikhil Ravi, and Cynthia Rudin. Pulse: Self-supervised photo upsampling via latent space exploration of generative models. InProceedings of the ieee/cvf conference on computer vision and pattern recognition, pages 2437–2445, 2020

  28. [28]

    Enhancenet: Single image super-resolution through automated texture synthesis

    Mehdi SM Sajjadi, Bernhard Scholkopf, and Michael Hirsch. Enhancenet: Single image super-resolution through automated texture synthesis. InProceedings of the IEEE international conference on computer vision, pages 4491–4500, 2017

  29. [29]

    Pixel recursive super resolution

    Ryan Dahl, Mohammad Norouzi, and Jonathon Shlens. Pixel recursive super resolution. InProceedings of the IEEE international conference on computer vision, pages 5439–5448, 2017

  30. [30]

    Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling

    Jacob Menick and Nal Kalchbrenner. Generating high fidelity images with subscale pixel networks and multidimensional upscaling.arXiv preprint arXiv:1812.01608, 2018

  31. [31]

    Conditional image generation with pixelcnn decoders.Advances in neural information processing systems, 29, 2016

    Aaron Van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol Vinyals, Alex Graves, et al. Conditional image generation with pixelcnn decoders.Advances in neural information processing systems, 29, 2016

  32. [32]

    Image transformer

    Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. Image transformer. InInternational conference on machine learning, pages 4055–4064. PMLR, 2018

  33. [33]

    Lar-sr: A local autoregressive model for image super-resolution

    Baisong Guo, Xiaoyun Zhang, Haoning Wu, Yu Wang, Ya Zhang, and Yan-Feng Wang. Lar-sr: A local autoregressive model for image super-resolution. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1909–1918, 2022

  34. [34]

    Progressive Growing of GANs for Improved Quality, Stability, and Variation

    Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation.arXiv preprint arXiv:1710.10196, 2017

  35. [35]

    Ilvr: Conditioning method for denoising diffusion probabilistic models.arXiv preprint arXiv:2108.02938, 2021

    Jooyoung Choi, Sungwon Kim, Yonghyun Jeong, Youngjune Gwon, and Sungroh Yoon. Ilvr: Conditioning method for denoising diffusion probabilistic models.arXiv preprint arXiv:2108.02938, 2021

  36. [36]

    An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

    Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H Bermano, Gal Chechik, and Daniel Cohen-Or. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022. 11

  37. [37]

    Null-text inversion for editing real images using guided diffusion models

    Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Null-text inversion for editing real images using guided diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6038–6047, 2023

  38. [38]

    Negative-prompt inversion: Fast image inversion for editing with text-guided diffusion models

    Daiki Miyake, Akihiro Iohara, Yu Saito, and Toshiyuki Tanaka. Negative-prompt inversion: Fast image inversion for editing with text-guided diffusion models. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2063–2072. IEEE, 2025

  39. [39]

    Visual instruction inversion: Image editing via image prompting.Advances in Neural Information Processing Systems, 36:9598–9613, 2023

    Thao Nguyen, Yuheng Li, Utkarsh Ojha, and Yong Jae Lee. Visual instruction inversion: Image editing via image prompting.Advances in Neural Information Processing Systems, 36:9598–9613, 2023

  40. [40]

    Direct inversion: Boosting diffusion- based editing with 3 lines of code.arXiv preprint arXiv:2310.01506, 2023

    Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, and Qiang Xu. Direct inversion: Boosting diffusion- based editing with 3 lines of code.arXiv preprint arXiv:2310.01506, 2023

  41. [41]

    Eta inversion: Designing an optimal eta function for diffusion-based real image editing

    Wonjun Kang, Kevin Galim, and Hyung Il Koo. Eta inversion: Designing an optimal eta function for diffusion-based real image editing. InEuropean Conference on Computer Vision, pages 90–106. Springer, 2024

  42. [42]

    Fixed-point inversion for text-to-image diffusion models.CoRR, 2023

    Barak Meiri, Dvir Samuel, Nir Darshan, Gal Chechik, Shai Avidan, and Rami Ben-Ari. Fixed-point inversion for text-to-image diffusion models.CoRR, 2023

  43. [43]

    Edict: Exact diffusion inversion via coupled transformations

    Bram Wallace, Akash Gokul, and Nikhil Naik. Edict: Exact diffusion inversion via coupled transformations. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22532– 22541, 2023

  44. [44]

    Solving diffusion odes with optimal boundary conditions for better image super-resolution.arXiv preprint arXiv:2305.15357, 2023

    Yiyang Ma, Huan Yang, Wenhan Yang, Jianlong Fu, and Jiaying Liu. Solving diffusion odes with optimal boundary conditions for better image super-resolution.arXiv preprint arXiv:2305.15357, 2023

  45. [45]

    Swinir: Image restoration using swin transformer

    Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 1833–1844, 2021

  46. [46]

    U-net: Convolutional networks for biomedical image segmentation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. InInternational Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015

  47. [47]

    Taming transformers for high-resolution image synthesis

    Patrick Esser, Robin Rombach, and Bjorn Ommer. Taming transformers for high-resolution image synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12873–12883, June 2021

  48. [48]

    Real-esrgan: Training real-world blind super- resolution with pure synthetic data

    Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super- resolution with pure synthetic data. InProceedings of the IEEE/CVF international conference on computer vision, pages 1905–1914, 2021

  49. [49]

    The unreasonable effectiveness of deep features as a perceptual metric

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018

  50. [50]

    Generative adversarial nets

    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InAdvances in Neural Information Processing Systems, pages 2672–2680, 2014

  51. [51]

    Designing a practical degradation model for deep blind image super-resolution

    Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timofte. Designing a practical degradation model for deep blind image super-resolution. InProceedings of the IEEE/CVF international conference on computer vision, pages 4791–4800, 2021

  52. [52]

    Lsdir: A large scale dataset for image restoration

    Yawei Li, Kai Zhang, Jingyun Liang, Jiezhang Cao, Ce Liu, Rui Gong, Yulun Zhang, Hao Tang, Yun Liu, Denis Demandolx, et al. Lsdir: A large scale dataset for image restoration. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1775–1787, 2023

  53. [53]

    A style-based generator architecture for generative adversarial networks

    Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019

  54. [54]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

  55. [55]

    SGDR: Stochastic Gradient Descent with Warm Restarts

    Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts.arXiv preprint arXiv:1608.03983, 2016. 12

  56. [56]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

  57. [57]

    Toward real-world single image super-resolution: A new benchmark and a new model

    Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, and Lei Zhang. Toward real-world single image super-resolution: A new benchmark and a new model. InProceedings of the IEEE/CVF international conference on computer vision, pages 3086–3095, 2019

  58. [58]

    Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

    Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

  59. [59]

    completely blind

    Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Making a “completely blind” image quality analyzer. IEEE Signal processing letters, 20(3):209–212, 2012

  60. [60]

    The 2018 pirm challenge on perceptual image super-resolution

    Yochai Blau, Roey Mechrez, Radu Timofte, Tomer Michaeli, and Lihi Zelnik-Manor. The 2018 pirm challenge on perceptual image super-resolution. InProceedings of the European conference on computer vision (ECCV) workshops, pages 0–0, 2018

  61. [61]

    Musiq: Multi-scale image quality transformer

    Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 5148–5157, 2021

  62. [62]

    Exploring clip for assessing the look and feel of images

    Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Exploring clip for assessing the look and feel of images. InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 2555–2563, 2023. 13 A Appendix In the appendix, we provide the following materials: • Extension to the DDPM Paradigm • Quantitative and qualitative results of differen...

  63. [63]

    (12) to generate noise, and perform the full 4-step inference to produce the final result

    as the x0 substitute in Eq. (12) to generate noise, and perform the full 4-step inference to produce the final result. We compare its performance with that of random Gaussian noise, theoretical optimal noise (calculated from the ground-truth HR image), and our LR-guided noise predictor, with results presented in Tab. 5. It can be seen that the theoretical...