arxiv: 2603.21045 · v5 · submitted 2026-03-22 · 💻 cs.CV · cs.AI

Recognition: no theorem link

LPNSR: Optimal Noise-Guided Diffusion Image Super-Resolution Via Learnable Noise Prediction

Shuwei Huang , Shizhuo Liu , Zijun Wei

Authors on Pith no claims yet

Pith reviewed 2026-05-15 07:26 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords diffusion modelsimage super-resolutionnoise predictionoptimal noisemaximum likelihood estimationresidual shiftingperceptual qualityfew-step sampling

0 comments

The pith

Diffusion super-resolution models achieve stable results by using a learned predictor for optimal noise instead of random sampling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to solve the variability in diffusion-based image super-resolution caused by random noise injection during sampling, especially in short trajectories. It derives a closed-form solution for the optimal noise at each step using maximum likelihood estimation, which reveals a conditional dependence on the low-resolution input that holds across different diffusion approaches. This insight is applied in the residual-shifting paradigm by creating a noise predictor that takes multiple inputs including the low-resolution image. Combined with pre-upsampling, it allows training a compact four-step model end-to-end. Readers would care because it makes fast, reliable super-resolution possible without depending on massive pre-trained models.

Core claim

We establish a theoretical framework that derives the closed-form analytical solution for optimal intermediate noise in diffusion models from a maximum likelihood estimation perspective, revealing a consistent conditional dependence structure that generalizes across diffusion paradigms. We instantiate this framework under the residual-shifting diffusion paradigm and accordingly design an LR-guided multi-input-aware noise predictor to replace random Gaussian noise. We further mitigate initialization bias with a high-quality pre-upsampling network. The compact 4-step trajectory uniquely enables end-to-end optimization of the entire reverse chain.

What carries the argument

The LR-guided multi-input-aware noise predictor, which instantiates the derived conditional dependence structure to generate optimal noise at each diffusion step.

Load-bearing premise

The derived conditional dependence structure for optimal noise generalizes across diffusion paradigms and can be instantiated in the residual-shifting setup without introducing fitting biases that undermine the solution.

What would settle it

Comparing perceptual quality metrics and output variance between the proposed model and a version using standard random Gaussian noise on the same datasets would show if the learned predictor provides the claimed improvement and stability.

Figures

Figures reproduced from arXiv: 2603.21045 by Shizhuo Liu, Shuwei Huang, Zijun Wei.

**Figure 2.** Figure 2: Visualization of the intermediate noise maps generated by our proposed noise predictor [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Statistical distribution analysis of the outputs from our LR-guided noise predictor. From [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Visual results of different methods on two typical real-world examples. (Zoom in for best [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison of different noise injection strategies. (Zoom in for best view) [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: More visualization comparisons of different models. (Zoom in for best view) [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: More visualization comparisons of different models. (Zoom in for best view) [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

read the original abstract

Diffusion-based image super-resolution (SR) aims to reconstruct high-resolution (HR) images from low-resolution (LR) observations. However, the inherent randomness injected during the reverse diffusion process causes the performance of diffusion-based SR models to vary significantly across different sampling runs, particularly when the sampling trajectory is compressed into a limited number of steps. A critical yet underexplored question is: what is the optimal noise to inject at each intermediate diffusion step? In this paper, we establish a theoretical framework that derives the closed-form analytical solution for optimal intermediate noise in diffusion models from a maximum likelihood estimation perspective, revealing a consistent conditional dependence structure that generalizes across diffusion paradigms. We instantiate this framework under the residual-shifting diffusion paradigm and accordingly design an LR-guided multi-input-aware noise predictor to replace random Gaussian noise. We further mitigate initialization bias with a high-quality pre-upsampling network. The compact 4-step trajectory uniquely enables end-to-end optimization of the entire reverse chain, which is computationally prohibitive for conventional long-trajectory diffusion models. Extensive experiments demonstrate that LPNSR achieves state-of-the-art perceptual performance on both synthetic and real-world datasets, without relying on any large-scale text-to-image priors. The source code of our method can be found at https://github.com/Faze-Hsw/LPNSR.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main move is deriving a closed-form MLE solution for optimal intermediate noise in diffusion SR then replacing it with an LR-guided learned predictor in a 4-step residual-shifting setup.

read the letter

The core idea here is a closed-form MLE derivation for the best noise to inject at each reverse step, which they turn into a learnable predictor conditioned on the low-res input. They run the whole thing in four steps with end-to-end training under residual shifting and add a pre-upsampler to handle initialization. This directly targets the run-to-run variance that hurts short-trajectory diffusion SR, and the public code is a plus for checking the implementation.

Referee Report

2 major / 1 minor

Summary. The paper claims to establish a theoretical framework deriving a closed-form analytical solution for optimal intermediate noise in diffusion models via maximum likelihood estimation, revealing a conditional dependence structure that generalizes across paradigms; it instantiates this under residual-shifting diffusion with an LR-guided multi-input-aware noise predictor, adds a high-quality pre-upsampling network to mitigate initialization bias, and demonstrates SOTA perceptual performance on synthetic and real-world SR datasets using a compact 4-step trajectory without large text-to-image priors.

Significance. If the MLE derivation is sound and the learnable predictor faithfully implements the claimed optimal noise without introducing unaccounted biases, the work would provide a principled way to reduce variance and sampling steps in diffusion SR while maintaining quality, offering an alternative to random Gaussian noise that could improve efficiency and consistency in low-step regimes.

major comments (2)

[Abstract / Theoretical Framework] Abstract and theoretical framework section: the claim of a closed-form MLE solution yielding an invariant conditional dependence structure requires explicit algebraic steps (e.g., the derivation from the likelihood objective through the residual-shifting forward process) to confirm that assumptions such as noise-LR independence survive instantiation; without these steps the generalization claim rests on an unverified transition from derivation to learnable predictor.
[Experiments] Experiments section: SOTA perceptual claims are made for the 4-step trajectory, yet the abstract (and visible summary) provides no error bars, ablation results on the noise predictor components, or quantitative baseline tables; these omissions make it impossible to assess whether the reported gains are robust or attributable to the derived noise structure versus the pre-upsampler.

minor comments (1)

[Method] The manuscript should clarify the exact architecture of the LR-guided multi-input-aware noise predictor (input channels, conditioning mechanism) and confirm that the 4-step end-to-end optimization does not inadvertently overfit to the training distribution used for the predictor parameters.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We will revise the manuscript to address the points on the theoretical derivation and experimental reporting, as detailed below.

read point-by-point responses

Referee: [Abstract / Theoretical Framework] Abstract and theoretical framework section: the claim of a closed-form MLE solution yielding an invariant conditional dependence structure requires explicit algebraic steps (e.g., the derivation from the likelihood objective through the residual-shifting forward process) to confirm that assumptions such as noise-LR independence survive instantiation; without these steps the generalization claim rests on an unverified transition from derivation to learnable predictor.

Authors: We agree that explicit algebraic steps will strengthen the presentation. In the revised manuscript, we will expand the theoretical framework section with the full derivation: starting from the maximum likelihood estimation objective, proceeding step-by-step through the residual-shifting forward process, and explicitly verifying that the noise-LR independence assumption holds under the model. This will also clarify the transition from the closed-form optimal noise to the LR-guided multi-input-aware predictor and support the generalization claim across paradigms. revision: yes
Referee: [Experiments] Experiments section: SOTA perceptual claims are made for the 4-step trajectory, yet the abstract (and visible summary) provides no error bars, ablation results on the noise predictor components, or quantitative baseline tables; these omissions make it impossible to assess whether the reported gains are robust or attributable to the derived noise structure versus the pre-upsampler.

Authors: We acknowledge that the current experimental reporting lacks sufficient detail for full assessment. In the revised version, we will add error bars (computed over multiple independent runs) to all quantitative metrics, include ablation studies isolating the contributions of the noise predictor components (LR-guidance and multi-input awareness), and provide expanded quantitative baseline tables. These additions will demonstrate robustness and help attribute gains to the derived noise structure versus the pre-upsampler. revision: yes

Circularity Check

0 steps flagged

Theoretical MLE derivation of optimal noise stands independent of learnable instantiation

full rationale

The paper first presents a theoretical framework deriving a closed-form analytical solution for optimal intermediate noise via maximum likelihood estimation, revealing a conditional dependence structure claimed to generalize across paradigms. This derivation is positioned prior to and separate from the subsequent instantiation under the residual-shifting paradigm and the design of an LR-guided multi-input-aware noise predictor. No equations or steps reduce the MLE result to fitted parameters by construction, nor rely on self-citations, ansatz smuggling, or renaming of known results. The end-to-end optimization of the 4-step chain is a practical engineering choice enabled by the short trajectory, but does not make the initial analytical derivation circular. The central claim therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the validity of the MLE derivation for optimal noise and the assumption that the residual-shifting paradigm preserves the derived conditional structure; the learnable predictor introduces trainable parameters whose values are not supplied by the derivation.

free parameters (1)

parameters of the LR-guided noise predictor
The multi-input-aware network that predicts noise is trained end-to-end; its weights are fitted to data and not fixed by the closed-form derivation.

axioms (1)

domain assumption The conditional dependence structure derived from MLE generalizes across diffusion paradigms
Invoked when the authors instantiate the framework under the residual-shifting paradigm.

invented entities (1)

LR-guided multi-input-aware noise predictor no independent evidence
purpose: Replaces random Gaussian noise with predicted optimal noise at each diffusion step
New network component introduced to realize the derived optimal noise; no independent evidence outside the paper is provided.

pith-pipeline@v0.9.0 · 5541 in / 1439 out tokens · 31868 ms · 2026-05-15T07:26:17.805694+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 7 internal anchors

[1]

Fleet, and Mohammad Norouzi

Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, and Mohammad Norouzi. Image super-resolution via iterative refinement.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4713–4726, 2023

work page 2023
[2]

Resshift: Efficient diffusion model for image super-resolution by residual shifting

Zongsheng Yue, Jianyi Wang, and Chen Change Loy. Resshift: Efficient diffusion model for image super-resolution by residual shifting. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 13294–13307. Curran Associates, Inc., 2023

work page 2023
[3]

Exploit- ing diffusion prior for real-world image super-resolution.International Journal of Computer Vision, 132(12):5929–5949, 2024

Jianyi Wang, Zongsheng Yue, Shangchen Zhou, Kelvin CK Chan, and Chen Change Loy. Exploit- ing diffusion prior for real-world image super-resolution.International Journal of Computer Vision, 132(12):5929–5949, 2024

work page 2024
[4]

Arbitrary-steps image super-resolution via diffusion inversion

Zongsheng Yue, Kang Liao, and Chen Change Loy. Arbitrary-steps image super-resolution via diffusion inversion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23153–23163, June 2025

work page 2025
[5]

Denoising diffusion restoration models

Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. Denoising diffusion restoration models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Information Processing Systems, volume 35, pages 23593–23606. Curran Associates, Inc., 2022

work page 2022
[6]

Come-closer-diffuse-faster: Accelerating conditional diffusion models for inverse problems through stochastic contraction

Hyungjin Chung, Byeongsu Sim, and Jong Chul Ye. Come-closer-diffuse-faster: Accelerating conditional diffusion models for inverse problems through stochastic contraction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12413–12422, June 2022

work page 2022
[7]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, June 2022

work page 2022
[8]

One-step effective diffusion network for real-world image super-resolution.Advances in Neural Information Processing Systems, 37:92529–92553, 2024

Rongyuan Wu, Lingchen Sun, Zhiyuan Ma, and Lei Zhang. One-step effective diffusion network for real-world image super-resolution.Advances in Neural Information Processing Systems, 37:92529–92553, 2024

work page 2024
[9]

Seesr: Towards semantics-aware real-world image super-resolution

Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, and Lei Zhang. Seesr: Towards semantics-aware real-world image super-resolution. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 25456–25467, 2024

work page 2024
[10]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 6840–6851. Curran Associates, Inc., 2020

work page 2020
[11]

Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps.Advances in neural information processing systems, 35:5775–5787, 2022

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps.Advances in neural information processing systems, 35:5775–5787, 2022

work page 2022
[12]

Improved denoising diffusion probabilistic models

Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International conference on machine learning, pages 8162–8171. PMLR, 2021

work page 2021
[13]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[14]

Sinsr: diffusion-based image super-resolution in a single step

Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C Kot, and Bihan Wen. Sinsr: diffusion-based image super-resolution in a single step. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 25796–25805, 2024

work page 2024
[15]

Improving diffusion models for inverse problems using manifold constraints.Advances in Neural Information Processing Systems, 35:25683– 25696, 2022

Hyungjin Chung, Byeongsu Sim, Dohoon Ryu, and Jong Chul Ye. Improving diffusion models for inverse problems using manifold constraints.Advances in Neural Information Processing Systems, 35:25683– 25696, 2022

work page 2022
[16]

Diffusion Posterior Sampling for General Noisy Inverse Problems

Hyungjin Chung, Jeongsol Kim, Michael T Mccann, Marc L Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems.arXiv preprint arXiv:2209.14687, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[17]

Generative diffusion prior for unified image restoration and enhancement

Ben Fei, Zhaoyang Lyu, Liang Pan, Junzhe Zhang, Weidong Yang, Tianyue Luo, Bo Zhang, and Bo Dai. Generative diffusion prior for unified image restoration and enhancement. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9935–9946, 2023. 10

work page 2023
[18]

Pseudoinverse-guided diffusion models for inverse problems

Jiaming Song, Arash Vahdat, Morteza Mardani, and Jan Kautz. Pseudoinverse-guided diffusion models for inverse problems. InInternational Conference on Learning Representations, 2023

work page 2023
[19]

Dreamclean: Restoring clean image using deep diffusion prior

Jie Xiao, Ruili Feng, Han Zhang, Zhiheng Liu, Zhantao Yang, Yurui Zhu, Xueyang Fu, Kai Zhu, Yu Liu, and Zheng-Jun Zha. Dreamclean: Restoring clean image using deep diffusion prior. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[20]

Difface: Blind face restoration with diffused error contraction

Zongsheng Yue and Chen Change Loy. Difface: Blind face restoration with diffused error contraction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):9991–10004, 2024

work page 2024
[21]

Image super-resolution using deep convolutional networks.IEEE transactions on pattern analysis and machine intelligence, 38(2):295–307, 2015

Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Image super-resolution using deep convolutional networks.IEEE transactions on pattern analysis and machine intelligence, 38(2):295–307, 2015

work page 2015
[22]

Deep learning for image super resolution

Pablo Rojas Sedó. Deep learning for image super resolution. B.S. thesis, Universitat Politècnica de Catalunya, 2022

work page 2022
[23]

Image super-resolution via progressive cascading residual network

Namhyuk Ahn, Byungkon Kang, and Kyung-Ah Sohn. Image super-resolution via progressive cascading residual network. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 791–799, 2018

work page 2018
[24]

Accurate image super-resolution using very deep convolutional networks

Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Accurate image super-resolution using very deep convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1646–1654, 2016

work page 2016
[25]

Deep networks for image super-resolution with sparse prior

Zhaowen Wang, Ding Liu, Jianchao Yang, Wei Han, and Thomas Huang. Deep networks for image super-resolution with sparse prior. InProceedings of the IEEE international conference on computer vision, pages 370–378, 2015

work page 2015
[26]

Photo-realistic single image super- resolution using a generative adversarial network

Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-realistic single image super- resolution using a generative adversarial network. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690, 2017

work page 2017
[27]

Pulse: Self-supervised photo upsampling via latent space exploration of generative models

Sachit Menon, Alexandru Damian, Shijia Hu, Nikhil Ravi, and Cynthia Rudin. Pulse: Self-supervised photo upsampling via latent space exploration of generative models. InProceedings of the ieee/cvf conference on computer vision and pattern recognition, pages 2437–2445, 2020

work page 2020
[28]

Enhancenet: Single image super-resolution through automated texture synthesis

Mehdi SM Sajjadi, Bernhard Scholkopf, and Michael Hirsch. Enhancenet: Single image super-resolution through automated texture synthesis. InProceedings of the IEEE international conference on computer vision, pages 4491–4500, 2017

work page 2017
[29]

Pixel recursive super resolution

Ryan Dahl, Mohammad Norouzi, and Jonathon Shlens. Pixel recursive super resolution. InProceedings of the IEEE international conference on computer vision, pages 5439–5448, 2017

work page 2017
[30]

Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling

Jacob Menick and Nal Kalchbrenner. Generating high fidelity images with subscale pixel networks and multidimensional upscaling.arXiv preprint arXiv:1812.01608, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[31]

Conditional image generation with pixelcnn decoders.Advances in neural information processing systems, 29, 2016

Aaron Van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol Vinyals, Alex Graves, et al. Conditional image generation with pixelcnn decoders.Advances in neural information processing systems, 29, 2016

work page 2016
[32]

Image transformer

Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. Image transformer. InInternational conference on machine learning, pages 4055–4064. PMLR, 2018

work page 2018
[33]

Lar-sr: A local autoregressive model for image super-resolution

Baisong Guo, Xiaoyun Zhang, Haoning Wu, Yu Wang, Ya Zhang, and Yan-Feng Wang. Lar-sr: A local autoregressive model for image super-resolution. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1909–1918, 2022

work page 1909
[34]

Progressive Growing of GANs for Improved Quality, Stability, and Variation

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation.arXiv preprint arXiv:1710.10196, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[35]

Ilvr: Conditioning method for denoising diffusion probabilistic models.arXiv preprint arXiv:2108.02938, 2021

Jooyoung Choi, Sungwon Kim, Yonghyun Jeong, Youngjune Gwon, and Sungroh Yoon. Ilvr: Conditioning method for denoising diffusion probabilistic models.arXiv preprint arXiv:2108.02938, 2021

work page arXiv 2021
[36]

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H Bermano, Gal Chechik, and Daniel Cohen-Or. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022. 11

work page internal anchor Pith review Pith/arXiv arXiv 2022
[37]

Null-text inversion for editing real images using guided diffusion models

Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Null-text inversion for editing real images using guided diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6038–6047, 2023

work page 2023
[38]

Negative-prompt inversion: Fast image inversion for editing with text-guided diffusion models

Daiki Miyake, Akihiro Iohara, Yu Saito, and Toshiyuki Tanaka. Negative-prompt inversion: Fast image inversion for editing with text-guided diffusion models. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2063–2072. IEEE, 2025

work page 2063
[39]

Visual instruction inversion: Image editing via image prompting.Advances in Neural Information Processing Systems, 36:9598–9613, 2023

Thao Nguyen, Yuheng Li, Utkarsh Ojha, and Yong Jae Lee. Visual instruction inversion: Image editing via image prompting.Advances in Neural Information Processing Systems, 36:9598–9613, 2023

work page 2023
[40]

Direct inversion: Boosting diffusion- based editing with 3 lines of code.arXiv preprint arXiv:2310.01506, 2023

Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, and Qiang Xu. Direct inversion: Boosting diffusion- based editing with 3 lines of code.arXiv preprint arXiv:2310.01506, 2023

work page arXiv 2023
[41]

Eta inversion: Designing an optimal eta function for diffusion-based real image editing

Wonjun Kang, Kevin Galim, and Hyung Il Koo. Eta inversion: Designing an optimal eta function for diffusion-based real image editing. InEuropean Conference on Computer Vision, pages 90–106. Springer, 2024

work page 2024
[42]

Fixed-point inversion for text-to-image diffusion models.CoRR, 2023

Barak Meiri, Dvir Samuel, Nir Darshan, Gal Chechik, Shai Avidan, and Rami Ben-Ari. Fixed-point inversion for text-to-image diffusion models.CoRR, 2023

work page 2023
[43]

Edict: Exact diffusion inversion via coupled transformations

Bram Wallace, Akash Gokul, and Nikhil Naik. Edict: Exact diffusion inversion via coupled transformations. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22532– 22541, 2023

work page 2023
[44]

Solving diffusion odes with optimal boundary conditions for better image super-resolution.arXiv preprint arXiv:2305.15357, 2023

Yiyang Ma, Huan Yang, Wenhan Yang, Jianlong Fu, and Jiaying Liu. Solving diffusion odes with optimal boundary conditions for better image super-resolution.arXiv preprint arXiv:2305.15357, 2023

work page arXiv 2023
[45]

Swinir: Image restoration using swin transformer

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 1833–1844, 2021

work page 2021
[46]

U-net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. InInternational Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015

work page 2015
[47]

Taming transformers for high-resolution image synthesis

Patrick Esser, Robin Rombach, and Bjorn Ommer. Taming transformers for high-resolution image synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12873–12883, June 2021

work page 2021
[48]

Real-esrgan: Training real-world blind super- resolution with pure synthetic data

Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super- resolution with pure synthetic data. InProceedings of the IEEE/CVF international conference on computer vision, pages 1905–1914, 2021

work page 1905
[49]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018

work page 2018
[50]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InAdvances in Neural Information Processing Systems, pages 2672–2680, 2014

work page 2014
[51]

Designing a practical degradation model for deep blind image super-resolution

Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timofte. Designing a practical degradation model for deep blind image super-resolution. InProceedings of the IEEE/CVF international conference on computer vision, pages 4791–4800, 2021

work page 2021
[52]

Lsdir: A large scale dataset for image restoration

Yawei Li, Kai Zhang, Jingyun Liang, Jiezhang Cao, Ce Liu, Rui Gong, Yulun Zhang, Hao Tang, Yun Liu, Denis Demandolx, et al. Lsdir: A large scale dataset for image restoration. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1775–1787, 2023

work page 2023
[53]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019

work page 2019
[54]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[55]

SGDR: Stochastic Gradient Descent with Warm Restarts

Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts.arXiv preprint arXiv:1608.03983, 2016. 12

work page internal anchor Pith review Pith/arXiv arXiv 2016
[56]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

work page 2009
[57]

Toward real-world single image super-resolution: A new benchmark and a new model

Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, and Lei Zhang. Toward real-world single image super-resolution: A new benchmark and a new model. InProceedings of the IEEE/CVF international conference on computer vision, pages 3086–3095, 2019

work page 2019
[58]

Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

work page 2004
[59]

completely blind

Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Making a “completely blind” image quality analyzer. IEEE Signal processing letters, 20(3):209–212, 2012

work page 2012
[60]

The 2018 pirm challenge on perceptual image super-resolution

Yochai Blau, Roey Mechrez, Radu Timofte, Tomer Michaeli, and Lihi Zelnik-Manor. The 2018 pirm challenge on perceptual image super-resolution. InProceedings of the European conference on computer vision (ECCV) workshops, pages 0–0, 2018

work page 2018
[61]

Musiq: Multi-scale image quality transformer

Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 5148–5157, 2021

work page 2021
[62]

Exploring clip for assessing the look and feel of images

Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Exploring clip for assessing the look and feel of images. InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 2555–2563, 2023. 13 A Appendix In the appendix, we provide the following materials: • Extension to the DDPM Paradigm • Quantitative and qualitative results of differen...

work page 2023
[63]

(12) to generate noise, and perform the full 4-step inference to produce the final result

as the x0 substitute in Eq. (12) to generate noise, and perform the full 4-step inference to produce the final result. We compare its performance with that of random Gaussian noise, theoretical optimal noise (calculated from the ground-truth HR image), and our LR-guided noise predictor, with results presented in Tab. 5. It can be seen that the theoretical...

work page 1995