Disentangling Generation and Regression in Stochastic Interpolants for Controllable Image Restoration

Jia Ma; Jihong Guan; Shuigeng Zhou; Wengen Li; Yichao Zhang; Yi Liu

arxiv: 2605.21381 · v1 · pith:AW6ZRJ5Anew · submitted 2026-05-20 · 💻 cs.CV · cs.LG

Disentangling Generation and Regression in Stochastic Interpolants for Controllable Image Restoration

Yi Liu , Jia Ma , Wengen Li , Jihong Guan , Shuigeng Zhou , Yichao Zhang This is my paper

Pith reviewed 2026-05-21 04:37 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords image restorationstochastic interpolantsgenerative modelsregression methodscontrollable trade-offdual-branch networkdiffusion modelsflow matching

0 comments

The pith

Disentangling stochastic interpolants into independent generation and regression lets one model control the fidelity-realism trade-off in image restoration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DiSI, a framework that splits the stochastic interpolant process into separate generation and regression components. This split supports a continuous adjustment from pure regression, which delivers fast pixel-accurate outputs, to full generation, which produces more realistic textures. A unified sampler and dual-branch U-Net transformer handle the transition efficiently while preserving conditional guidance. The approach aims to combine the speed and precision of classical regression methods with the visual quality of generative models like diffusion without needing separate networks for each style. Experiments indicate competitive performance across image restoration tasks with the added benefit of inference-time control over output characteristics.

Core claim

The stochastic interpolant process can be decomposed into independent generation and regression trajectories that share a single network and sampler, allowing any mixture ratio to be selected at inference time for controllable image restoration.

What carries the argument

DiSI disentanglement of the stochastic interpolant process into independent generation and regression components, implemented via a dual-branch U-Net style transformer and a unified sampler for arbitrary trajectories.

If this is right

A single trained model can produce outputs anywhere along the continuum from high pixel fidelity to high perceptual realism.
Few-step sampling remains efficient for any chosen point on the regression-to-generation spectrum.
The same architecture works across multiple image restoration tasks without task-specific retraining.
Conditional guidance is strengthened by a dedicated network branch while overall throughput stays high.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The disentanglement approach could extend to other stochastic modeling domains such as video or 3D restoration where similar fidelity-creativity tensions exist.
Deployment pipelines might replace several specialized models with one flexible network that users tune per use case.
Further work could test whether the independence holds when the input degradations differ substantially from the training distribution.

Load-bearing premise

The stochastic interpolant process admits a clean decomposition into independent generation and regression components that maintain their strengths when recombined without introducing artifacts or efficiency loss.

What would settle it

Train the DiSI model and check whether, at the pure-regression end of its control range, it matches the pixel accuracy of a dedicated regression baseline and, at the pure-generation end, matches the perceptual quality of a dedicated generative baseline on the same restoration task; failure at either extreme would indicate the decomposition does not fully preserve the separate advantages.

Figures

Figures reproduced from arXiv: 2605.21381 by Jia Ma, Jihong Guan, Shuigeng Zhou, Wengen Li, Yichao Zhang, Yi Liu.

**Figure 1.** Figure 1: (a-c) A conceptual comparison between our method DiSI and major existing generative frameworks DMs, FMs and SIs, illustrated by restoring a distorted S curve. (a) DMs/FMs: a path between data x 0 and noise z. (b) SIs: a path between two data points ( x 0 , x 1) with intermediate noise. (c) DiSI: a decoupled framework with an independent Regression Time r for the data-to-data path and a Generation Time g fo… view at source ↗

**Figure 2.** Figure 2: Two inference trajectories. (a) The Elliptical Trajectory bridges noiseless x1 and x0 with noise z in intermediate states. (b) The Linear Trajectory starts from a noisy mix of x1 and z, and ends at noiseless x0. For both, δ controls the noise level. hyperparameter tuning with minimal overhead [32, 47]. The loss function is: \mL _{\text {DiSI}}(\theta ,\phi ) \coloneqq \bbE _{\bfxrg ,r,g} \lsb e^{w_{\phi }(… view at source ↗

**Figure 3.** Figure 3: DULiT architecture. Left: Backbone (feature resolutions at bottom). Right: Modules including (a) Two-Time Encoder and (b) DULiT Block, which comprises the (c) Linear Attention Module, (d) JLA layer, and (e) FFN. Dimensions b, n, c denote batch size, sequence length, and channels. Modules in dashed borders are optional. As shown in [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Visual results on Rain100H test set. MAXIM GOUB FoD DiSI-R DiSI-G Target Degraded Image MIRNet URetinex-Net [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 7.** Figure 7: Visual results on Celeba-HQ test set. 4.1 Comparative Experiments We compare DiSI against SOTA IR approaches, the results are in Tabs. 1 to 4 and visual comparisons are in Figs. 4 to 7. Best and second-best results are highlighted and underlined, with ↑/↓ indicating higher/lower is better performance. DM/FM-based methods are marked in gray . We report DiSI using the proposed elliptical sampler in Algorith… view at source ↗

**Figure 8.** Figure 8: Illustration of V path. Ablation on Trajectories. Evaluating trajectory continuity requires no retraining. We directly reuse the model trained via the lognorm2 generalist time sampler in Tab. 8. During inference, we deploy a parameterized Vpath family (see [PITH_FULL_IMAGE:figures/full_fig_p037_8.png] view at source ↗

**Figure 9.** Figure 9: Visual comparisons for image deraining on the Rain100H benchmark. Degraded Image GT DeepDeblur DeblurGANv2 DBGAN MAXIM IR-SDE DiSI-R DiSI-G [PITH_FULL_IMAGE:figures/full_fig_p041_9.png] view at source ↗

**Figure 10.** Figure 10: Visual comparisons for image deblurring on the GoPro benchmark. GT Degraded Image MIRNet URetinex-Net MAXIM GOUB FoD DiSI-R DiSI-G [PITH_FULL_IMAGE:figures/full_fig_p041_10.png] view at source ↗

**Figure 12.** Figure 12: Visual comparisons for image inpainting on the Celeba-HQ benchmark [PITH_FULL_IMAGE:figures/full_fig_p042_12.png] view at source ↗

**Figure 13.** Figure 13: Zoomed-in image deraining results on the Rain100H test set. Degraded Image DeepDeblur DeblurGANv2 Degraded Image DeepDeblur DeblurGANv2 DBGAN MAXIM IR-SDE DBGAN MAXIM IR-SDE DiSI-R DiSI-G GT DiSI-R DiSI-G GT Degraded Image DeepDeblur DeblurGANv2 Degraded Image DeepDeblur DeblurGANv2 DBGAN MAXIM IR-SDE DBGAN MAXIM IR-SDE DiSI-R DiSI-G GT DiSI-R DiSI-G GT [PITH_FULL_IMAGE:figures/full_fig_p043_13.png] view at source ↗

**Figure 14.** Figure 14: Zoomed-in image deblurring results on the GoPro test set [PITH_FULL_IMAGE:figures/full_fig_p043_14.png] view at source ↗

**Figure 15.** Figure 15: Zoomed-in low-light enhancement results on the LOL test set. Degraded Image GOUB FoD Degraded Image GOUB FoD DiSI-R DiSI-G GT DiSI-R DiSI-G GT Degraded Image GOUB FoD Degraded Image GOUB FoD DiSI-R DiSI-G GT DiSI-R DiSI-G GT [PITH_FULL_IMAGE:figures/full_fig_p044_15.png] view at source ↗

**Figure 16.** Figure 16: Zoomed-in image inpainting results on the CelebA-HQ test set [PITH_FULL_IMAGE:figures/full_fig_p044_16.png] view at source ↗

read the original abstract

Recent advances in Image Restoration (IR) have been largely driven by generative methods such as Diffusion Models and Flow Matching, which excel in synthesizing realistic textures while suffering from slow multi-step inference and compromised pixel fidelity. In contrast, classical regression-based IR methods excel precisely in these aspects, offering single-step efficiency and high pixel-level reconstruction fidelity. To bridge this gap, we propose DiSI, a unified framework that Disentangles the underlying Stochastic Interpolant process into independent generation and regression components. This decoupling endows DiSI with remarkable versatility, enabling a continuous and controllable transition from a pure regression process to a fully generative one. Technically, we instantiate this framework with two specific sampling trajectories, accompanied by a unified sampler for high-quality, few-step inference on arbitrary trajectories. Furthermore, we design a dual-branch U-Net style transformer network in pixel space, using a dedicated branch to enhance conditional guidance while ensuring high throughput. Extensive experiments demonstrate that DiSI efficiently achieves competitive results on various IR tasks, while uniquely offering the inference-time flexibility to control the distortion-perception trade-off within a single model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DiSI splits stochastic interpolants into separate generation and regression paths to give controllable trade-offs in image restoration, but the independence claim needs explicit math to confirm it works for intermediate points.

read the letter

The main point is that the paper proposes DiSI to disentangle the stochastic interpolant process into independent generation and regression components. This split is meant to let users dial in a continuous transition from fast high-fidelity regression to slower but more realistic generative restoration inside one model and sampler. The dual-branch U-Net transformer in pixel space and the unified sampler are the concrete pieces they add to make it run efficiently. Experiments are reported to hit competitive numbers on standard IR tasks while adding the ability to tune the distortion-perception balance at inference time without retraining. That flexibility addresses a practical need where pure regression or pure generative methods each fall short on their own. The framing is straightforward and the engineering choices look reasonable for keeping throughput high. The softer spot is the core technical claim. The abstract describes the decomposition into independent trajectories, yet it does not show the SDE or ODE steps that would prove cross terms vanish at mixed ratios rather than being approximated. If residual coupling remains, the advertised continuous control could be limited to the endpoints or introduce artifacts in between. The stress-test note is on target here; the paper needs to put the derivations front and center so readers can check whether the independence is exact. Citation patterns look normal for the area, but tighter comparisons to existing flow-matching restoration work would help. This paper is for computer vision researchers working on image restoration who want tunable outputs rather than fixed regression or diffusion pipelines. A reader focused on practical control in generative models would find the framework and reported results useful. It deserves a serious referee because the idea is coherent, the target problem is real, and the proposed solution has clear application value even if the math requires close checking. I would send it for peer review.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes DiSI, a unified framework that disentangles the underlying stochastic interpolant process into independent generation and regression components for image restoration. This decoupling is claimed to enable a continuous, controllable transition from pure regression to fully generative processes via two specific sampling trajectories, a unified sampler for few-step inference, and a dual-branch U-Net-style transformer network in pixel space that enhances conditional guidance. Experiments are reported to show competitive results on IR tasks while allowing inference-time control of the distortion-perception trade-off within a single model.

Significance. If the decomposition is shown to be exact and free of residual coupling, the work would meaningfully bridge generative methods (strong on textures but slow) and regression methods (efficient and pixel-accurate), providing practical inference-time flexibility that is currently unavailable in a single model. The unified sampler and dual-branch architecture are presented as efficiency-preserving innovations.

major comments (2)

[Framework description (abstract and §3)] The central claim requires that the stochastic interpolant decomposes into truly independent generation and regression trajectories whose linear combination yields artifact-free control at arbitrary mixing ratios. The abstract and framework description state two endpoint trajectories plus a unified sampler, but provide no derivation or SDE/ODE analysis demonstrating elimination of cross terms for intermediate ratios; if residual coupling remains, the continuous-control claim reduces to interpolation between the two endpoints rather than a true continuum.
[Network architecture (§4)] The dual-branch U-Net transformer is introduced to enhance conditional guidance while maintaining throughput, yet no ablation or analysis quantifies whether the dedicated branch preserves the claimed efficiency or introduces new artifacts at intermediate mixing ratios; this is load-bearing for the versatility claim.

minor comments (2)

[Introduction and Experiments] The abstract refers to 'extensive experiments' demonstrating competitive results; the introduction or results section should explicitly tabulate comparisons against both pure regression baselines and recent generative IR methods (e.g., diffusion/flow-matching variants) with standard metrics and inference-step counts.
[Notation] Notation for the mixing parameter and the two trajectories should be defined once in a dedicated subsection rather than introduced piecemeal across the abstract and technical sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below with clarifications based on the framework and commit to revisions that strengthen the presentation without altering the core contributions.

read point-by-point responses

Referee: [Framework description (abstract and §3)] The central claim requires that the stochastic interpolant decomposes into truly independent generation and regression trajectories whose linear combination yields artifact-free control at arbitrary mixing ratios. The abstract and framework description state two endpoint trajectories plus a unified sampler, but provide no derivation or SDE/ODE analysis demonstrating elimination of cross terms for intermediate ratios; if residual coupling remains, the continuous-control claim reduces to interpolation between the two endpoints rather than a true continuum.

Authors: We appreciate the referee highlighting the need for explicit analysis of independence. In §3 the disentanglement follows directly from the stochastic interpolant definition: the process is an affine combination of the clean image and noise, with the regression trajectory given by the deterministic conditional expectation and the generation trajectory incorporating the full stochastic forcing term. The unified sampler constructs intermediate trajectories by linear interpolation of the corresponding velocity fields. Because the underlying interpolant is linear, substitution into the SDE yields an interpolated process whose Fokker-Planck equation contains no residual cross-coupling terms between the regression and generation components. We will add a concise derivation together with the relevant SDE/ODE verification for arbitrary mixing ratios in the revised §3 to make this property fully explicit. revision: yes
Referee: [Network architecture (§4)] The dual-branch U-Net transformer is introduced to enhance conditional guidance while maintaining throughput, yet no ablation or analysis quantifies whether the dedicated branch preserves the claimed efficiency or introduces new artifacts at intermediate mixing ratios; this is load-bearing for the versatility claim.

Authors: We agree that targeted ablations at intermediate mixing ratios are important for substantiating the efficiency and artifact-free versatility claims. The dual-branch design isolates conditional guidance in a separate path to improve modulation while keeping the overall parameter count and forward-pass cost comparable to a single-branch baseline. The current experiments report aggregate throughput and quality, but do not isolate the branch contribution across mixing ratios. In the revision we will add ablation tables and figures that measure wall-clock time, FLOPs, PSNR, and LPIPS for a range of mixing ratios using both the dual-branch model and an ablated single-branch counterpart, confirming that efficiency is preserved and no additional artifacts appear at intermediate points. revision: yes

Circularity Check

0 steps flagged

No circularity: DiSI proposes independent decomposition as novel framework without self-referential reduction

full rationale

The paper introduces DiSI as a new framework that disentangles the stochastic interpolant process into independent generation and regression components, supported by a unified sampler and dual-branch network. No equations or derivations in the provided abstract reduce any claimed prediction or result to fitted inputs or prior self-citations by construction. The central claim of continuous controllable transition rests on the proposed decomposition and experimental validation rather than tautological redefinition or load-bearing self-citation chains. This is a standard case of a self-contained proposal where the derivation does not collapse to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based solely on the abstract, the central claim rests on the decomposability of stochastic interpolants and the effectiveness of the proposed sampler and dual-branch architecture. No explicit free parameters, axioms, or invented entities are quantified.

axioms (1)

domain assumption Stochastic interpolant processes admit a meaningful decomposition into independent generation and regression components
This decomposition is the foundational premise stated in the abstract for enabling controllable transitions.

invented entities (1)

DiSI framework with dual-branch U-Net transformer and unified sampler no independent evidence
purpose: To realize the disentangled generation-regression control in pixel space
Newly proposed architecture and sampler whose independent validation is not shown in the abstract.

pith-pipeline@v0.9.0 · 5735 in / 1363 out tokens · 47979 ms · 2026-05-21T04:37:43.576143+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DiSI process: x(r,g)=λg(αr x0 + βr x1) + γg z with GVP schedules αr,βr,λg=cos g, γg=sin g and φ=arcsin sqrt((1-ρ)/2)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Two independent time variables r (regression) and g (generation) with PF-ODE dx = vr dr + vg dg

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

107 extracted references · 107 canonical work pages · 6 internal anchors

[1]

Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

Albergo, M.S., Boffi, N.M., Vanden-Eijnden, E.: Stochastic interpolants: A uni- fying framework for flows and diffusions. arXiv preprint arXiv:2303.08797 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Albergo, M.S., Goldstein, M., Boffi, N.M., Ranganath, R., Vanden-Eijnden, E.: Stochasticinterpolantswithdata-dependentcouplings.In:Proceedingsofthe41st International Conference on Machine Learning. pp. 921–937 (2024)

work page 2024
[3]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Blau, Y., Michaeli, T.: The perception-distortion tradeoff. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 6228–6237 (2018)

work page 2018
[4]

IEEE transactions on image processing25(11), 5187–5198 (2016)

Cai, B., Xu, X., Jia, K., Qing, C., Tao, D.: Dehazenet: An end-to-end system for single image haze removal. IEEE transactions on image processing25(11), 5187–5198 (2016)

work page 2016
[5]

In: Proceedings of the IEEE/CVF international conference on computer vision

Cai, Y., Bian, H., Lin, J., Wang, H., Timofte, R., Zhang, Y.: Retinexformer: One- stage retinex-based transformer for low-light image enhancement. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 12504–12513 (2023)

work page 2023
[6]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Chen, H., Wang, Y., Guo, T., Xu, C., Deng, Y., Liu, Z., Ma, S., Xu, C., Xu, C., Gao, W.: Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12299– 12310 (2021)

work page 2021
[7]

In: The Twelfth International Conference on Learning Representations (2024)

Chen, J., YU, J., GE, C., Yao, L., Xie, E., Wang, Z., Kwok, J., Luo, P., Lu, H., Li, Z.: Pixart-$\alpha$: Fast training of diffusion transformer for photorealistic text-to-image synthesis. In: The Twelfth International Conference on Learning Representations (2024)

work page 2024
[8]

Advances in neural information processing systems31(2018)

Chen, R.T., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary dif- ferential equations. Advances in neural information processing systems31(2018)

work page 2018
[9]

IEEE transactions on pattern analysis and machine intelligence39(6), 1256–1272 (2016)

Chen, Y., Pock, T.: Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE transactions on pattern analysis and machine intelligence39(6), 1256–1272 (2016)

work page 2016
[10]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Choi, J., Kim, S., Jeong, Y., Gwon, Y., Yoon, S.: Ilvr: Conditioning method for denoising diffusion probabilistic models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 14367–14376 (2021) 16 Yi Liu et al

work page 2021
[11]

In: The Eleventh International Con- ference on Learning Representations (2023)

Chung, H., Kim, J., Mccann, M.T., Klasky, M.L., Ye, J.C.: Diffusion posterior sampling for general noisy inverse problems. In: The Eleventh International Con- ference on Learning Representations (2023)

work page 2023
[12]

IEEE transactions on pattern analysis and machine intelligence45(9), 10850–10869 (2023)

Croitoru, F.A., Hondru, V., Ionescu, R.T., Shah, M.: Diffusion models in vision: A survey. IEEE transactions on pattern analysis and machine intelligence45(9), 10850–10869 (2023)

work page 2023
[13]

In: Forty-first International Conference on Machine Learning (2024)

Crowson, K., Baumann, S.A., Birch, A., Abraham, T.M., Kaplan, D.Z., Shippole, E.: Scalable high-resolution pixel-space image synthesis with hourglass diffusion transformers. In: Forty-first International Conference on Machine Learning (2024)

work page 2024
[14]

IEEE Transactions on image processing 16(8), 2080–2095 (2007)

Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transactions on image processing 16(8), 2080–2095 (2007)

work page 2080
[15]

Advances in neural information pro- cessing systems35, 16344–16359 (2022)

Dao, T., Fu, D., Ermon, S., Rudra, A., Ré, C.: Flashattention: Fast and memory- efficient exact attention with io-awareness. Advances in neural information pro- cessing systems35, 16344–16359 (2022)

work page 2022
[16]

Advances in neural information processing systems34, 8780–8794 (2021)

Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. Advances in neural information processing systems34, 8780–8794 (2021)

work page 2021
[17]

In: Forty-first international conference on machine learning (2024)

Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz,D.,Sauer,A.,Boesel,F.,etal.:Scalingrectifiedflowtransformersforhigh- resolution image synthesis. In: Forty-first international conference on machine learning (2024)

work page 2024
[18]

In: The Thirteenth International Conference on Learning Representations (2025)

Frans, K., Hafner, D., Levine, S., Abbeel, P.: One step diffusion via shortcut models. In: The Thirteenth International Conference on Learning Representations (2025)

work page 2025
[19]

arXiv preprint arXiv:2305.05146 (2023)

Gao, H., Yang, J., Zhang, Y., Wang, N., Yang, J., Dang, D.: A mountain- shaped single-stage network for accurate image restoration. arXiv preprint arXiv:2305.05146 (2023)

work page arXiv 2023
[20]

Pattern Recognition161, 111313 (2025)

Gao, H., Zhang, Y., Yang, J., Dang, D.: Mixed hierarchy network for image restoration. Pattern Recognition161, 111313 (2025)

work page 2025
[21]

Mean Flows for One-step Generative Modeling

Geng, Z., Deng, M., Bai, X., Kolter, J.Z., He, K.: Mean flows for one-step gener- ative modeling. arXiv preprint arXiv:2505.13447 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[22]

Advances in neural information processing systems27(2014)

Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Advances in neural information processing systems27(2014)

work page 2014
[23]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)

He, C., Shen, Y., Fang, C., Xiao, F., Tang, L., Zhang, Y., Zuo, W., Guo, Z., Li, X.: Diffusion models in low-level vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)

work page 2025
[24]

Advances in neural information processing systems30(2017)

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems30(2017)

work page 2017
[25]

Advances in neural information processing systems33, 6840–6851 (2020)

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems33, 6840–6851 (2020)

work page 2020
[26]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[27]

IEEE transactions on pattern analysis and machine intelligence45(8), 10173–10196 (2023)

Huang, L., Qin, J., Zhou, Y., Zhu, F., Liu, L., Shao, L.: Normalization techniques in training dnns: Methodology, analysis and application. IEEE transactions on pattern analysis and machine intelligence45(8), 10173–10196 (2023)

work page 2023
[28]

Islam*, M.A., Jia*, S., Bruce, N.D.B.: How much position information do convo- lutional neural networks encode? In: International Conference on Learning Rep- resentations (2020) DiSI: Disentangled Stochastic Interpolant 17

work page 2020
[29]

IEEE transactions on image processing30, 2340–2349 (2021)

Jiang, Y., Gong, X., Liu, D., Cheng, Y., Fang, C., Shen, X., Yang, J., Zhou, P., Wang, Z.: Enlightengan: Deep light enhancement without paired supervision. IEEE transactions on image processing30, 2340–2349 (2021)

work page 2021
[30]

In: International Conference on Learning Representations (2018)

Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for im- proved quality, stability, and variation. In: International Conference on Learning Representations (2018)

work page 2018
[31]

Advances in neural information processing sys- tems35, 26565–26577 (2022)

Karras, T., Aittala, M., Aila, T., Laine, S.: Elucidating the design space of diffusion-based generative models. Advances in neural information processing sys- tems35, 26565–26577 (2022)

work page 2022
[32]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Karras, T., Aittala, M., Lehtinen, J., Hellsten, J., Aila, T., Laine, S.: Analyzing and improving the training dynamics of diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 24174– 24184 (2024)

work page 2024
[33]

In: International conference on machine learning

Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are rnns: Fast autoregressive transformers with linear attention. In: International conference on machine learning. pp. 5156–5165. PMLR (2020)

work page 2020
[34]

In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K

Kawar, B., Elad, M., Ermon, S., Song, J.: Denoising diffusion restoration mod- els. In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K. (eds.) Advances in Neural Information Processing Systems (2022)

work page 2022
[35]

Advances in neural information processing systems 25(2012)

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25(2012)

work page 2012
[36]

In: Proceedings oftheIEEEconferenceoncomputervisionandpatternrecognition.pp.8183–8192 (2018)

Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: Deblurgan: Blind motion deblurring using conditional adversarial networks. In: Proceedings oftheIEEEconferenceoncomputervisionandpatternrecognition.pp.8183–8192 (2018)

work page 2018
[37]

In: Proceedings of the IEEE/CVF international conference on computer vision

Kupyn, O., Martyniuk, T., Wu, J., Wang, Z.: Deblurgan-v2: Deblurring (orders- of-magnitude) faster and better. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 8878–8887 (2019)

work page 2019
[38]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken,A.,Tejani,A.,Totz,J.,Wang,Z.,etal.:Photo-realisticsingleimagesuper- resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4681–4690 (2017)

work page 2017
[39]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Li, W., Lin, Z., Zhou, K., Qi, L., Wang, Y., Jia, J.: Mat: Mask-aware transformer for large hole image inpainting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10758–10768 (2022)

work page 2022
[40]

International Journal of Computer Vision pp

Li, X., Ren, Y., Jin, X., Lan, C., Wang, X., Zeng, W., Wang, X., Chen, Z.: Diffusion models for image restoration and enhancement: a comprehensive survey. International Journal of Computer Vision pp. 1–31 (2025)

work page 2025
[41]

In: Proceedings of the IEEE/CVF interna- tional conference on computer vision

Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: Image restoration using swin transformer. In: Proceedings of the IEEE/CVF interna- tional conference on computer vision. pp. 1833–1844 (2021)

work page 2021
[42]

In: The Eleventh International Conference on Learning Representations (2023)

Lipman, Y., Chen, R.T.Q., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. In: The Eleventh International Conference on Learning Representations (2023)

work page 2023
[43]

Flow Matching Guide and Code

Lipman, Y., Havasi, M., Holderrieth, P., Shaul, N., Le, M., Karrer, B., Chen, R.T., Lopez-Paz, D., Ben-Hamu, H., Gat, I.: Flow matching guide and code. arXiv preprint arXiv:2412.06264 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[44]

In: Proceedings of the 40th International Conference on Machine Learning

Liu, G.H., Vahdat, A., Huang, D.A., Theodorou, E.A., Nie, W., Anandkumar, A.: I2sb: image-to-image schrödinger bridge. In: Proceedings of the 40th International Conference on Machine Learning. pp. 22042–22062 (2023) 18 Yi Liu et al

work page 2023
[45]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Liu, J., Wang, Q., Fan, H., Wang, Y., Tang, Y., Qu, L.: Residual denoising diffu- sion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2773–2783 (2024)

work page 2024
[46]

In: The Eleventh International Conference on Learning Representations (2023)

Liu, X., Gong, C., qiang liu: Flow straight and fast: Learning to generate and transfer data with rectified flow. In: The Eleventh International Conference on Learning Representations (2023)

work page 2023
[47]

In: The Thirteenth International Conference on Learning Representations (2025)

Lu, C., Song, Y.: Simplifying, stabilizing and scaling continuous-time consistency models. In: The Thirteenth International Conference on Learning Representations (2025)

work page 2025
[48]

Advances in neural information processing systems35, 5775–5787 (2022)

Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., Zhu, J.: Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in neural information processing systems35, 5775–5787 (2022)

work page 2022
[49]

Machine Intelligence Research pp

Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., Zhu, J.: Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. Machine Intelligence Research pp. 1–22 (2025)

work page 2025
[50]

Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R., Van Gool, L.: Repaint:Inpaintingusingdenoisingdiffusionprobabilisticmodels.In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11461–11471 (2022)

work page 2022
[51]

arXiv preprint arXiv:2505.16733 (2025)

Luo, Z., Gustafsson, F.K., Sjölund, J., Schön, T.B.: Forward-only diffusion prob- abilistic models. arXiv preprint arXiv:2505.16733 (2025)

work page arXiv 2025
[52]

In: Proceedings of the 40th International Conference on Machine Learning

Luo, Z., Gustafsson, F.K., Zhao, Z., Sjölund, J., Schön, T.B.: Image restoration with mean-reverting stochastic differential equations. In: Proceedings of the 40th International Conference on Machine Learning. pp. 23045–23066 (2023)

work page 2023
[53]

In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recogni- tion

Luo, Z., Gustafsson, F.K., Zhao, Z., Sjölund, J., Schön, T.B.: Refusion: Enabling large-size realistic image restoration with latent-space diffusion models. In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recogni- tion. pp. 1680–1691 (2023)

work page 2023
[54]

In: The Twelfth International Conference on Learning Representations (2024)

Luo, Z., Gustafsson, F.K., Zhao, Z., Sjölund, J., Schön, T.B.: Controlling vision- language models for multi-task image restoration. In: The Twelfth International Conference on Learning Representations (2024)

work page 2024
[55]

In: European Conference on Computer Vision

Ma, N., Goldstein, M., Albergo, M.S., Boffi, N.M., Vanden-Eijnden, E., Xie, S.: Sit: Exploring flow and diffusion-based generative models with scalable inter- polant transformers. In: European Conference on Computer Vision. pp. 23–40. Springer (2024)

work page 2024
[56]

Nah,S.,HyunKim,T.,MuLee,K.:Deepmulti-scaleconvolutionalneuralnetwork fordynamicscenedeblurring.In:ProceedingsoftheIEEEconferenceoncomputer vision and pattern recognition. pp. 3883–3891 (2017)

work page 2017
[57]

In: International conference on machine learning

Nichol, A.Q., Dhariwal, P.: Improved denoising diffusion probabilistic models. In: International conference on machine learning. pp. 8162–8171. PMLR (2021)

work page 2021
[58]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context en- coders: Feature learning by inpainting. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2536–2544 (2016)

work page 2016
[59]

In: Proceedings of the IEEE/CVF international conference on computer vision

Peebles, W., Xie, S.: Scalable diffusion models with transformers. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4195–4205 (2023)

work page 2023
[60]

In: Proceedings of the IEEE/CVF international conference on computer vision

Qiu, Y., Zhang, K., Wang, C., Luo, W., Li, H., Jin, Z.: Mb-taylorformer: Multi- branch efficient transformer expanded by taylor formula for image dehazing. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 12802–12813 (2023) DiSI: Disentangled Stochastic Interpolant 19

work page 2023
[61]

In: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition

Ren, D., Zuo, W., Hu, Q., Zhu, P., Meng, D.: Progressive image deraining net- works: A better and simpler baseline. In: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition. pp. 3937–3946 (2019)

work page 2019
[62]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10684–10695 (2022)

work page 2022
[63]

In: International Conference on Medical image comput- ing and computer-assisted intervention

Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed- ical image segmentation. In: International Conference on Medical image comput- ing and computer-assisted intervention. pp. 234–241. Springer (2015)

work page 2015
[64]

In: ACM SIGGRAPH 2022 confer- ence proceedings

Saharia,C.,Chan,W.,Chang,H.,Lee,C.,Ho,J.,Salimans,T.,Fleet,D.,Norouzi, M.: Palette: Image-to-image diffusion models. In: ACM SIGGRAPH 2022 confer- ence proceedings. pp. 1–10 (2022)

work page 2022
[65]

IEEE transactions on pattern analysis and machine intelligence45(4), 4713–4726 (2022)

Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., Norouzi, M.: Image super-resolution via iterative refinement. IEEE transactions on pattern analysis and machine intelligence45(4), 4713–4726 (2022)

work page 2022
[66]

Särkkä, S., Solin, A.: Applied stochastic differential equations, vol. 10. Cambridge University Press (2019)

work page 2019
[67]

GLU Variants Improve Transformer

Shazeer, N.: Glu variants improve transformer. arXiv preprint arXiv:2002.05202 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2002
[68]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1874–1883 (2016)

work page 2016
[69]

In: Interna- tional Conference on Learning Representations (2021)

Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: Interna- tional Conference on Learning Representations (2021)

work page 2021
[70]

In: Proceed- ings of the 40th International Conference on Machine Learning

Song, Y., Dhariwal, P., Chen, M., Sutskever, I.: Consistency models. In: Proceed- ings of the 40th International Conference on Machine Learning. pp. 32211–32252 (2023)

work page 2023
[71]

Advances in neural information processing systems32(2019)

Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems32(2019)

work page 2019
[72]

In: In- ternational Conference on Learning Representations (2021)

Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. In: In- ternational Conference on Learning Representations (2021)

work page 2021
[73]

Neurocomputing568, 127063 (2024)

Su, J., Ahmed, M., Lu, Y., Pan, S., Bo, W., Liu, Y.: Roformer: Enhanced trans- former with rotary position embedding. Neurocomputing568, 127063 (2024)

work page 2024
[74]

Neurocomputing487, 46–65 (2022)

Su, J., Xu, B., Yin, H.: A survey of deep learning approaches to image restoration. Neurocomputing487, 46–65 (2022)

work page 2022
[75]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the incep- tion architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2818–2826 (2016)

work page 2016
[76]

In: International Conference on Learning Representations (ICLR 2016)

Theis, L., van den Oord, A., Bethge, M.: A note on the evaluation of generative models. In: International Conference on Learning Representations (ICLR 2016). pp. 1–10 (2016)

work page 2016
[77]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Tu, Z., Talebi, H., Zhang, H., Yang, F., Milanfar, P., Bovik, A., Li, Y.: Maxim: Multi-axis mlp for image processing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5769–5780 (2022)

work page 2022
[78]

Promptir: Prompting for all-in-one blind image restoration

Vaishnav, P., Syed Waqas, Z., Salman, K., Fahad Shahbaz, K.: Promptir: Prompt- ing for all-in-one blind image restoration. arXiv preprint arXiv:2306.13090 (2023)

work page arXiv 2023
[79]

Advances in neural informa- tion processing systems30(2017) 20 Yi Liu et al

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural informa- tion processing systems30(2017) 20 Yi Liu et al

work page 2017
[80]

International Journal of Computer Vision 132(12), 5929–5949 (2024)

Wang, J., Yue, Z., Zhou, S., Chan, K.C., Loy, C.C.: Exploiting diffusion prior for real-world image super-resolution. International Journal of Computer Vision 132(12), 5929–5949 (2024)

work page 2024

Showing first 80 references.

[1] [1]

Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

Albergo, M.S., Boffi, N.M., Vanden-Eijnden, E.: Stochastic interpolants: A uni- fying framework for flows and diffusions. arXiv preprint arXiv:2303.08797 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

Albergo, M.S., Goldstein, M., Boffi, N.M., Ranganath, R., Vanden-Eijnden, E.: Stochasticinterpolantswithdata-dependentcouplings.In:Proceedingsofthe41st International Conference on Machine Learning. pp. 921–937 (2024)

work page 2024

[3] [3]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Blau, Y., Michaeli, T.: The perception-distortion tradeoff. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 6228–6237 (2018)

work page 2018

[4] [4]

IEEE transactions on image processing25(11), 5187–5198 (2016)

Cai, B., Xu, X., Jia, K., Qing, C., Tao, D.: Dehazenet: An end-to-end system for single image haze removal. IEEE transactions on image processing25(11), 5187–5198 (2016)

work page 2016

[5] [5]

In: Proceedings of the IEEE/CVF international conference on computer vision

Cai, Y., Bian, H., Lin, J., Wang, H., Timofte, R., Zhang, Y.: Retinexformer: One- stage retinex-based transformer for low-light image enhancement. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 12504–12513 (2023)

work page 2023

[6] [6]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Chen, H., Wang, Y., Guo, T., Xu, C., Deng, Y., Liu, Z., Ma, S., Xu, C., Xu, C., Gao, W.: Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12299– 12310 (2021)

work page 2021

[7] [7]

In: The Twelfth International Conference on Learning Representations (2024)

Chen, J., YU, J., GE, C., Yao, L., Xie, E., Wang, Z., Kwok, J., Luo, P., Lu, H., Li, Z.: Pixart-$\alpha$: Fast training of diffusion transformer for photorealistic text-to-image synthesis. In: The Twelfth International Conference on Learning Representations (2024)

work page 2024

[8] [8]

Advances in neural information processing systems31(2018)

Chen, R.T., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary dif- ferential equations. Advances in neural information processing systems31(2018)

work page 2018

[9] [9]

IEEE transactions on pattern analysis and machine intelligence39(6), 1256–1272 (2016)

Chen, Y., Pock, T.: Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE transactions on pattern analysis and machine intelligence39(6), 1256–1272 (2016)

work page 2016

[10] [10]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Choi, J., Kim, S., Jeong, Y., Gwon, Y., Yoon, S.: Ilvr: Conditioning method for denoising diffusion probabilistic models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 14367–14376 (2021) 16 Yi Liu et al

work page 2021

[11] [11]

In: The Eleventh International Con- ference on Learning Representations (2023)

Chung, H., Kim, J., Mccann, M.T., Klasky, M.L., Ye, J.C.: Diffusion posterior sampling for general noisy inverse problems. In: The Eleventh International Con- ference on Learning Representations (2023)

work page 2023

[12] [12]

IEEE transactions on pattern analysis and machine intelligence45(9), 10850–10869 (2023)

Croitoru, F.A., Hondru, V., Ionescu, R.T., Shah, M.: Diffusion models in vision: A survey. IEEE transactions on pattern analysis and machine intelligence45(9), 10850–10869 (2023)

work page 2023

[13] [13]

In: Forty-first International Conference on Machine Learning (2024)

Crowson, K., Baumann, S.A., Birch, A., Abraham, T.M., Kaplan, D.Z., Shippole, E.: Scalable high-resolution pixel-space image synthesis with hourglass diffusion transformers. In: Forty-first International Conference on Machine Learning (2024)

work page 2024

[14] [14]

IEEE Transactions on image processing 16(8), 2080–2095 (2007)

Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transactions on image processing 16(8), 2080–2095 (2007)

work page 2080

[15] [15]

Advances in neural information pro- cessing systems35, 16344–16359 (2022)

Dao, T., Fu, D., Ermon, S., Rudra, A., Ré, C.: Flashattention: Fast and memory- efficient exact attention with io-awareness. Advances in neural information pro- cessing systems35, 16344–16359 (2022)

work page 2022

[16] [16]

Advances in neural information processing systems34, 8780–8794 (2021)

Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. Advances in neural information processing systems34, 8780–8794 (2021)

work page 2021

[17] [17]

In: Forty-first international conference on machine learning (2024)

Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz,D.,Sauer,A.,Boesel,F.,etal.:Scalingrectifiedflowtransformersforhigh- resolution image synthesis. In: Forty-first international conference on machine learning (2024)

work page 2024

[18] [18]

In: The Thirteenth International Conference on Learning Representations (2025)

Frans, K., Hafner, D., Levine, S., Abbeel, P.: One step diffusion via shortcut models. In: The Thirteenth International Conference on Learning Representations (2025)

work page 2025

[19] [19]

arXiv preprint arXiv:2305.05146 (2023)

Gao, H., Yang, J., Zhang, Y., Wang, N., Yang, J., Dang, D.: A mountain- shaped single-stage network for accurate image restoration. arXiv preprint arXiv:2305.05146 (2023)

work page arXiv 2023

[20] [20]

Pattern Recognition161, 111313 (2025)

Gao, H., Zhang, Y., Yang, J., Dang, D.: Mixed hierarchy network for image restoration. Pattern Recognition161, 111313 (2025)

work page 2025

[21] [21]

Mean Flows for One-step Generative Modeling

Geng, Z., Deng, M., Bai, X., Kolter, J.Z., He, K.: Mean flows for one-step gener- ative modeling. arXiv preprint arXiv:2505.13447 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[22] [22]

Advances in neural information processing systems27(2014)

Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Advances in neural information processing systems27(2014)

work page 2014

[23] [23]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)

He, C., Shen, Y., Fang, C., Xiao, F., Tang, L., Zhang, Y., Zuo, W., Guo, Z., Li, X.: Diffusion models in low-level vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)

work page 2025

[24] [24]

Advances in neural information processing systems30(2017)

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems30(2017)

work page 2017

[25] [25]

Advances in neural information processing systems33, 6840–6851 (2020)

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems33, 6840–6851 (2020)

work page 2020

[26] [26]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[27] [27]

IEEE transactions on pattern analysis and machine intelligence45(8), 10173–10196 (2023)

Huang, L., Qin, J., Zhou, Y., Zhu, F., Liu, L., Shao, L.: Normalization techniques in training dnns: Methodology, analysis and application. IEEE transactions on pattern analysis and machine intelligence45(8), 10173–10196 (2023)

work page 2023

[28] [28]

Islam*, M.A., Jia*, S., Bruce, N.D.B.: How much position information do convo- lutional neural networks encode? In: International Conference on Learning Rep- resentations (2020) DiSI: Disentangled Stochastic Interpolant 17

work page 2020

[29] [29]

IEEE transactions on image processing30, 2340–2349 (2021)

Jiang, Y., Gong, X., Liu, D., Cheng, Y., Fang, C., Shen, X., Yang, J., Zhou, P., Wang, Z.: Enlightengan: Deep light enhancement without paired supervision. IEEE transactions on image processing30, 2340–2349 (2021)

work page 2021

[30] [30]

In: International Conference on Learning Representations (2018)

Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for im- proved quality, stability, and variation. In: International Conference on Learning Representations (2018)

work page 2018

[31] [31]

Advances in neural information processing sys- tems35, 26565–26577 (2022)

Karras, T., Aittala, M., Aila, T., Laine, S.: Elucidating the design space of diffusion-based generative models. Advances in neural information processing sys- tems35, 26565–26577 (2022)

work page 2022

[32] [32]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Karras, T., Aittala, M., Lehtinen, J., Hellsten, J., Aila, T., Laine, S.: Analyzing and improving the training dynamics of diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 24174– 24184 (2024)

work page 2024

[33] [33]

In: International conference on machine learning

Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are rnns: Fast autoregressive transformers with linear attention. In: International conference on machine learning. pp. 5156–5165. PMLR (2020)

work page 2020

[34] [34]

In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K

Kawar, B., Elad, M., Ermon, S., Song, J.: Denoising diffusion restoration mod- els. In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K. (eds.) Advances in Neural Information Processing Systems (2022)

work page 2022

[35] [35]

Advances in neural information processing systems 25(2012)

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25(2012)

work page 2012

[36] [36]

In: Proceedings oftheIEEEconferenceoncomputervisionandpatternrecognition.pp.8183–8192 (2018)

Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: Deblurgan: Blind motion deblurring using conditional adversarial networks. In: Proceedings oftheIEEEconferenceoncomputervisionandpatternrecognition.pp.8183–8192 (2018)

work page 2018

[37] [37]

In: Proceedings of the IEEE/CVF international conference on computer vision

Kupyn, O., Martyniuk, T., Wu, J., Wang, Z.: Deblurgan-v2: Deblurring (orders- of-magnitude) faster and better. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 8878–8887 (2019)

work page 2019

[38] [38]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken,A.,Tejani,A.,Totz,J.,Wang,Z.,etal.:Photo-realisticsingleimagesuper- resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4681–4690 (2017)

work page 2017

[39] [39]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Li, W., Lin, Z., Zhou, K., Qi, L., Wang, Y., Jia, J.: Mat: Mask-aware transformer for large hole image inpainting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10758–10768 (2022)

work page 2022

[40] [40]

International Journal of Computer Vision pp

Li, X., Ren, Y., Jin, X., Lan, C., Wang, X., Zeng, W., Wang, X., Chen, Z.: Diffusion models for image restoration and enhancement: a comprehensive survey. International Journal of Computer Vision pp. 1–31 (2025)

work page 2025

[41] [41]

In: Proceedings of the IEEE/CVF interna- tional conference on computer vision

Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: Image restoration using swin transformer. In: Proceedings of the IEEE/CVF interna- tional conference on computer vision. pp. 1833–1844 (2021)

work page 2021

[42] [42]

In: The Eleventh International Conference on Learning Representations (2023)

Lipman, Y., Chen, R.T.Q., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. In: The Eleventh International Conference on Learning Representations (2023)

work page 2023

[43] [43]

Flow Matching Guide and Code

Lipman, Y., Havasi, M., Holderrieth, P., Shaul, N., Le, M., Karrer, B., Chen, R.T., Lopez-Paz, D., Ben-Hamu, H., Gat, I.: Flow matching guide and code. arXiv preprint arXiv:2412.06264 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[44] [44]

In: Proceedings of the 40th International Conference on Machine Learning

Liu, G.H., Vahdat, A., Huang, D.A., Theodorou, E.A., Nie, W., Anandkumar, A.: I2sb: image-to-image schrödinger bridge. In: Proceedings of the 40th International Conference on Machine Learning. pp. 22042–22062 (2023) 18 Yi Liu et al

work page 2023

[45] [45]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Liu, J., Wang, Q., Fan, H., Wang, Y., Tang, Y., Qu, L.: Residual denoising diffu- sion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2773–2783 (2024)

work page 2024

[46] [46]

In: The Eleventh International Conference on Learning Representations (2023)

Liu, X., Gong, C., qiang liu: Flow straight and fast: Learning to generate and transfer data with rectified flow. In: The Eleventh International Conference on Learning Representations (2023)

work page 2023

[47] [47]

In: The Thirteenth International Conference on Learning Representations (2025)

Lu, C., Song, Y.: Simplifying, stabilizing and scaling continuous-time consistency models. In: The Thirteenth International Conference on Learning Representations (2025)

work page 2025

[48] [48]

Advances in neural information processing systems35, 5775–5787 (2022)

Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., Zhu, J.: Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in neural information processing systems35, 5775–5787 (2022)

work page 2022

[49] [49]

Machine Intelligence Research pp

Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., Zhu, J.: Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. Machine Intelligence Research pp. 1–22 (2025)

work page 2025

[50] [50]

Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R., Van Gool, L.: Repaint:Inpaintingusingdenoisingdiffusionprobabilisticmodels.In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11461–11471 (2022)

work page 2022

[51] [51]

arXiv preprint arXiv:2505.16733 (2025)

Luo, Z., Gustafsson, F.K., Sjölund, J., Schön, T.B.: Forward-only diffusion prob- abilistic models. arXiv preprint arXiv:2505.16733 (2025)

work page arXiv 2025

[52] [52]

In: Proceedings of the 40th International Conference on Machine Learning

Luo, Z., Gustafsson, F.K., Zhao, Z., Sjölund, J., Schön, T.B.: Image restoration with mean-reverting stochastic differential equations. In: Proceedings of the 40th International Conference on Machine Learning. pp. 23045–23066 (2023)

work page 2023

[53] [53]

In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recogni- tion

Luo, Z., Gustafsson, F.K., Zhao, Z., Sjölund, J., Schön, T.B.: Refusion: Enabling large-size realistic image restoration with latent-space diffusion models. In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recogni- tion. pp. 1680–1691 (2023)

work page 2023

[54] [54]

In: The Twelfth International Conference on Learning Representations (2024)

Luo, Z., Gustafsson, F.K., Zhao, Z., Sjölund, J., Schön, T.B.: Controlling vision- language models for multi-task image restoration. In: The Twelfth International Conference on Learning Representations (2024)

work page 2024

[55] [55]

In: European Conference on Computer Vision

Ma, N., Goldstein, M., Albergo, M.S., Boffi, N.M., Vanden-Eijnden, E., Xie, S.: Sit: Exploring flow and diffusion-based generative models with scalable inter- polant transformers. In: European Conference on Computer Vision. pp. 23–40. Springer (2024)

work page 2024

[56] [56]

Nah,S.,HyunKim,T.,MuLee,K.:Deepmulti-scaleconvolutionalneuralnetwork fordynamicscenedeblurring.In:ProceedingsoftheIEEEconferenceoncomputer vision and pattern recognition. pp. 3883–3891 (2017)

work page 2017

[57] [57]

In: International conference on machine learning

Nichol, A.Q., Dhariwal, P.: Improved denoising diffusion probabilistic models. In: International conference on machine learning. pp. 8162–8171. PMLR (2021)

work page 2021

[58] [58]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context en- coders: Feature learning by inpainting. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2536–2544 (2016)

work page 2016

[59] [59]

In: Proceedings of the IEEE/CVF international conference on computer vision

Peebles, W., Xie, S.: Scalable diffusion models with transformers. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4195–4205 (2023)

work page 2023

[60] [60]

In: Proceedings of the IEEE/CVF international conference on computer vision

Qiu, Y., Zhang, K., Wang, C., Luo, W., Li, H., Jin, Z.: Mb-taylorformer: Multi- branch efficient transformer expanded by taylor formula for image dehazing. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 12802–12813 (2023) DiSI: Disentangled Stochastic Interpolant 19

work page 2023

[61] [61]

In: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition

Ren, D., Zuo, W., Hu, Q., Zhu, P., Meng, D.: Progressive image deraining net- works: A better and simpler baseline. In: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition. pp. 3937–3946 (2019)

work page 2019

[62] [62]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10684–10695 (2022)

work page 2022

[63] [63]

In: International Conference on Medical image comput- ing and computer-assisted intervention

Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed- ical image segmentation. In: International Conference on Medical image comput- ing and computer-assisted intervention. pp. 234–241. Springer (2015)

work page 2015

[64] [64]

In: ACM SIGGRAPH 2022 confer- ence proceedings

Saharia,C.,Chan,W.,Chang,H.,Lee,C.,Ho,J.,Salimans,T.,Fleet,D.,Norouzi, M.: Palette: Image-to-image diffusion models. In: ACM SIGGRAPH 2022 confer- ence proceedings. pp. 1–10 (2022)

work page 2022

[65] [65]

IEEE transactions on pattern analysis and machine intelligence45(4), 4713–4726 (2022)

Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., Norouzi, M.: Image super-resolution via iterative refinement. IEEE transactions on pattern analysis and machine intelligence45(4), 4713–4726 (2022)

work page 2022

[66] [66]

Särkkä, S., Solin, A.: Applied stochastic differential equations, vol. 10. Cambridge University Press (2019)

work page 2019

[67] [67]

GLU Variants Improve Transformer

Shazeer, N.: Glu variants improve transformer. arXiv preprint arXiv:2002.05202 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2002

[68] [68]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1874–1883 (2016)

work page 2016

[69] [69]

In: Interna- tional Conference on Learning Representations (2021)

Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: Interna- tional Conference on Learning Representations (2021)

work page 2021

[70] [70]

In: Proceed- ings of the 40th International Conference on Machine Learning

Song, Y., Dhariwal, P., Chen, M., Sutskever, I.: Consistency models. In: Proceed- ings of the 40th International Conference on Machine Learning. pp. 32211–32252 (2023)

work page 2023

[71] [71]

Advances in neural information processing systems32(2019)

Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems32(2019)

work page 2019

[72] [72]

In: In- ternational Conference on Learning Representations (2021)

Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. In: In- ternational Conference on Learning Representations (2021)

work page 2021

[73] [73]

Neurocomputing568, 127063 (2024)

Su, J., Ahmed, M., Lu, Y., Pan, S., Bo, W., Liu, Y.: Roformer: Enhanced trans- former with rotary position embedding. Neurocomputing568, 127063 (2024)

work page 2024

[74] [74]

Neurocomputing487, 46–65 (2022)

Su, J., Xu, B., Yin, H.: A survey of deep learning approaches to image restoration. Neurocomputing487, 46–65 (2022)

work page 2022

[75] [75]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the incep- tion architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2818–2826 (2016)

work page 2016

[76] [76]

In: International Conference on Learning Representations (ICLR 2016)

Theis, L., van den Oord, A., Bethge, M.: A note on the evaluation of generative models. In: International Conference on Learning Representations (ICLR 2016). pp. 1–10 (2016)

work page 2016

[77] [77]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Tu, Z., Talebi, H., Zhang, H., Yang, F., Milanfar, P., Bovik, A., Li, Y.: Maxim: Multi-axis mlp for image processing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5769–5780 (2022)

work page 2022

[78] [78]

Promptir: Prompting for all-in-one blind image restoration

Vaishnav, P., Syed Waqas, Z., Salman, K., Fahad Shahbaz, K.: Promptir: Prompt- ing for all-in-one blind image restoration. arXiv preprint arXiv:2306.13090 (2023)

work page arXiv 2023

[79] [79]

Advances in neural informa- tion processing systems30(2017) 20 Yi Liu et al

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural informa- tion processing systems30(2017) 20 Yi Liu et al

work page 2017

[80] [80]

International Journal of Computer Vision 132(12), 5929–5949 (2024)

Wang, J., Yue, Z., Zhou, S., Chan, K.C., Loy, C.C.: Exploiting diffusion prior for real-world image super-resolution. International Journal of Computer Vision 132(12), 5929–5949 (2024)

work page 2024