pith. machine review for the scientific record. sign in

arxiv: 2512.18365 · v2 · submitted 2025-12-20 · 💻 cs.CV · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Efficient Zero-Shot Inpainting with Decoupled Diffusion Guidance

Authors on Pith no claims yet

Pith reviewed 2026-05-16 20:18 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords zero-shot inpaintingdiffusion modelslikelihood surrogateimage reconstructionefficient samplingobservation consistencyGaussian posterior
0
0 comments X

The pith

A new likelihood surrogate for diffusion inpainting produces Gaussian posterior samples directly, eliminating backpropagation through the denoiser and cutting inference costs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Diffusion models act as priors for filling missing image regions in a way that stays consistent with the visible parts. Earlier zero-shot techniques used surrogate likelihood scores whose gradients had to be computed through the denoiser at every reverse step, creating heavy memory and time costs. The paper replaces those surrogates with one that directly supplies simple Gaussian posterior transitions. This change keeps strong consistency with observed pixels and produces coherent fillings while lowering the overall inference expense compared with fine-tuned baselines.

Core claim

The central claim is that a new likelihood surrogate can be chosen so that the ideal guidance score is approximated well enough to yield valid Gaussian posterior transitions at each reverse diffusion step. Sampling from these transitions requires no vector-Jacobian products through the denoiser network, removing the main computational overhead of prior zero-shot diffusion inpainting methods while still enforcing observation consistency and high-quality reconstructions.

What carries the argument

The decoupled likelihood surrogate that produces simple Gaussian posterior transitions for the reverse diffusion process.

Load-bearing premise

The proposed likelihood surrogate accurately approximates the ideal score and yields valid Gaussian posterior transitions without backpropagation through the denoiser network.

What would settle it

Running the method and a backpropagation-based zero-shot baseline on the same pretrained diffusion model and the same inpainting benchmarks, then checking whether the new reconstructions show visibly lower consistency with observed regions or higher error metrics, would settle the claim.

Figures

Figures reproduced from arXiv: 2512.18365 by Alain Oliviero Durmus, Badr Moufad, Eric Moulines, Jimmy Olsson, Navid Bagheri Shouraki, Thomas Hirtz, Yazid Janati.

Figure 1
Figure 1. Figure 1: Zero-shot inpainting edits generated by DING (50 NFEs) for different masking patterns using Stable Diffusion 3.5 (medium). Given masked inputs (left column), the model fills the missing regions according to diverse textual prompts. of inverse problems, from image restoration to scientific imaging, and has demonstrated strong editing performance without task-specific training. While current zero-shot method… view at source ↗
Figure 2
Figure 2. Figure 2: Examples of reconstructions on FFHQ and DIV2K with 50 NFEs. we include both training and validation splits (900 images in total), and generate captions for each image using BLIP-2 (Li et al., 2023); see Appendix B for details. All FFHQ and DIV2K images are resized to a resolution of 768 × 768. The PIE-Bench dataset contains 700 images of resolution 512 × 512, each paired with an inpainting mask and an edit… view at source ↗
Figure 3
Figure 3. Figure 3: Latent-space masking and its correspondence to pixel space using a central square mask. The encoder [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance of DING on DIV2K under varying NFE budgets (20 to 500) across different masking [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effect of prompt precision on inpainting quality. [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Effect of prompt precision on inpainting quality. [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of DING and finetuned SD3 on PIE-Bench. Both methods have the same runtime of 2.2s. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of DING and zero-shot baselines on PIE-Bench. All methods use 50 NFEs. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Comparison of DING and zero-shot baselines on PIE-Bench. All methods use 50 NFEs. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparison of DING and zero-shot baselines on PIE-Bench. All methods use 50 NFEs. 29 [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of DING and zero-shot baselines on PIE-Bench. All methods use 50 NFEs. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗
read the original abstract

Diffusion models have emerged as powerful priors for image editing tasks such as inpainting and local modification, where the objective is to generate realistic content that remains consistent with observed regions. In particular, zero-shot approaches that leverage a pretrained diffusion model, without any retraining, have been shown to achieve highly effective reconstructions. However, state-of-the-art zero-shot methods typically rely on a sequence of surrogate likelihood functions, whose scores are used as proxies for the ideal score. This procedure however requires vector-Jacobian products through the denoiser at every reverse step, introducing significant memory and runtime overhead. To address this issue, we propose a new likelihood surrogate that yields simple and efficient to sample Gaussian posterior transitions, sidestepping the backpropagation through the denoiser network. Our extensive experiments show that our method achieves strong observation consistency compared with fine-tuned baselines and produces coherent, high-quality reconstructions, all while significantly reducing inference cost. Code is available at https://github.com/YazidJanati/ding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a decoupled diffusion guidance surrogate for zero-shot inpainting that produces simple Gaussian posterior transitions p(x_{t-1}|x_t, y) without requiring vector-Jacobian products through the pretrained denoiser at each reverse step. It claims this yields strong observation consistency with observed regions, coherent high-quality reconstructions, and substantially lower inference cost than fine-tuned baselines, all while using only standard pretrained diffusion priors.

Significance. If the surrogate derivation holds and the Gaussian transitions remain valid without drift, the work would meaningfully advance practical zero-shot editing by removing a key computational bottleneck in conditioned diffusion sampling, enabling faster inference on standard hardware while retaining the flexibility of pretrained models.

major comments (2)
  1. [Method] The central efficiency and consistency claims rest on the new likelihood surrogate producing mean and variance that match (or sufficiently approximate) those induced by the true conditional score at every timestep. The manuscript must supply the explicit derivation of this surrogate (including how it decouples guidance to avoid backpropagation) and any supporting analysis showing absence of systematic mismatch in masked regions, as even small per-step errors can accumulate over hundreds of reverse steps and violate the observation constraint.
  2. [Experiments] §4 (Experiments): the reported consistency and quality gains versus fine-tuned baselines must be supported by controls that isolate the effect of the surrogate approximation error; without per-timestep error metrics or trajectory-drift ablations on the masked regions, it is unclear whether the observed performance stems from the proposed surrogate or from other implementation details.
minor comments (2)
  1. [Abstract] The abstract states 'strong observation consistency' without naming the quantitative metrics (e.g., PSNR, LPIPS, or masked-region L2) used to support this; these should be stated explicitly.
  2. [Method] Notation for the surrogate (mean/variance schedules, decoupling operator) should be introduced with a clear table or equation block to aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. The comments highlight important points regarding the surrogate derivation and the need for stronger experimental controls. We address each major comment below and have revised the manuscript to incorporate additional details and analyses.

read point-by-point responses
  1. Referee: [Method] The central efficiency and consistency claims rest on the new likelihood surrogate producing mean and variance that match (or sufficiently approximate) those induced by the true conditional score at every timestep. The manuscript must supply the explicit derivation of this surrogate (including how it decouples guidance to avoid backpropagation) and any supporting analysis showing absence of systematic mismatch in masked regions, as even small per-step errors can accumulate over hundreds of reverse steps and violate the observation constraint.

    Authors: We agree that the derivation and error analysis deserve more explicit presentation. Section 3.2 now contains the full step-by-step derivation of the decoupled Gaussian posterior p(x_{t-1}|x_t, y) (Equations 5–9), showing how the surrogate likelihood is constructed to eliminate the vector-Jacobian product through the denoiser while preserving the conditional mean and variance structure. A new subsection 3.3 has been added that provides both a theoretical bound on the per-step approximation error in masked regions and empirical plots of the accumulated drift over the full reverse trajectory. These additions confirm that the surrogate remains sufficiently accurate for the observation constraint to hold. revision: yes

  2. Referee: [Experiments] §4 (Experiments): the reported consistency and quality gains versus fine-tuned baselines must be supported by controls that isolate the effect of the surrogate approximation error; without per-timestep error metrics or trajectory-drift ablations on the masked regions, it is unclear whether the observed performance stems from the proposed surrogate or from other implementation details.

    Authors: We have expanded §4 with two new controls. First, we report per-timestep masked-region MSE between the generated trajectory and the ground-truth observation at every reverse step, averaged over the test set. Second, we include an ablation that compares the full method against a version that replaces the surrogate with exact (but expensive) guidance at selected timesteps, quantifying trajectory drift. These results, now shown in Figure 4 and Table 3, demonstrate that the performance advantage is attributable to the surrogate rather than other implementation choices. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces an independent likelihood surrogate for decoupled diffusion guidance that approximates the conditional score to enable Gaussian posterior transitions without VJP through the denoiser. This surrogate is presented as a novel construction leveraging standard pretrained diffusion priors rather than being defined in terms of the target result or fitted to the paper's own outputs. No load-bearing step reduces by construction to self-citation, ansatz smuggling, or renaming of known results; the central efficiency and consistency claims rest on the surrogate's explicit form and experimental validation against external baselines. The derivation chain remains self-contained against the pretrained model and does not invoke uniqueness theorems or self-referential fits.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach relies on the standard assumption that pretrained diffusion models act as effective image priors and introduces one new conceptual entity (the decoupled surrogate) without additional free parameters beyond the pretrained model weights.

axioms (1)
  • domain assumption Pretrained diffusion models serve as strong priors for generating content consistent with observed image regions.
    Invoked to justify zero-shot use without retraining.
invented entities (1)
  • Decoupled diffusion guidance surrogate no independent evidence
    purpose: To produce Gaussian posterior transitions that avoid vector-Jacobian products through the denoiser.
    New construct introduced to achieve efficiency; no independent falsifiable evidence outside the method itself is stated.

pith-pipeline@v0.9.0 · 5496 in / 1131 out tokens · 20515 ms · 2026-05-16T20:18:00.333692+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors

    cs.LG 2026-05 unverdicted novelty 7.0

    Diffusion model priors enable training-free Bayesian sampling for more accurate rain field reconstruction from path-integrated commercial microwave link measurements than Gaussian process baselines.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Ntire 2017 challenge on single image super-resolution: Dataset and study

    Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 126–135,

  2. [2]

    doi: 10.1145/3592450

    ISSN 0730-0301. doi: 10.1145/3592450. URL https://doi.org/10.1145/ 3592450. Arpit Bansal, Hong-Min Chu, Avi Schwarzschild, Soumyadip Sengupta, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Universal guidance for diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 843–852,

  3. [3]

    Benjamin Boys, Mark Girolami, Jakiw Pidstrigach, Sebastian Reich, Alan Mosca, and O Deniz Aky- ildiz

    ISBN 0387310738. Benjamin Boys, Mark Girolami, Jakiw Pidstrigach, Sebastian Reich, Alan Mosca, and O Deniz Aky- ildiz. Tweedie moment projected diffusions for inverse problems.arXiv preprint arXiv:2310.06721,

  4. [4]

    Monte Carlo guided diffusion for Bayesian linear inverse problems

    Gabriel Cardoso, Yazid Janati El Idrissi, Sylvain Le Corff, and Eric Moulines. Monte Carlo guided diffusion for Bayesian linear inverse problems. arXiv preprint arXiv:2308.07983,

  5. [5]

    Hyungjin Chung, Jeongsol Kim, and Jong Chul Ye

    URL https://openreview.net/forum? id=OnD9zGAGT0k. Hyungjin Chung, Jeongsol Kim, and Jong Chul Ye. Diffusion models for inverse problems. arXiv preprint arXiv:2508.01975,

  6. [6]

    A Survey on Diffusion Models for Inverse Problems

    URL https://proceedings.mlr.press/v258/ corenflos25a.html. Giannis Daras, Hyungjin Chung, Chieh-Hsin Lai, Yuki Mitsufuji, Jong Chul Ye, Peyman Milanfar, Alexandros G Dimakis, and Mauricio Delbracio. A survey on diffusion models for inverse problems. arXiv preprint arXiv:2410.00083,

  7. [7]

    Julius Erbach, Dominik Narnhofer, Andreas Dombos, Bernt Schiele, Jan Eric Lenssen, and Konrad Schindler

    URL https://openreview.net/forum?id=tplXNcHZs1. Julius Erbach, Dominik Narnhofer, Andreas Dombos, Bernt Schiele, Jan Eric Lenssen, and Konrad Schindler. Solving inverse problems with flair. arXiv preprint arXiv:2506.02680,

  8. [8]

    Solving linear inv erse problems using the prior implicit in a denoiser,

    URL https://openreview.net/forum?id=FoMZ4ljhVw. Zahra Kadkhodaie and Eero P Simoncelli. Solving linear inverse problems using the prior implicit in a denoiser. arXiv preprint arXiv:2007.13640,

  9. [9]

    Flowdps: Flow-driven posterior sampling for inverse problems

    Jeongsol Kim, Bryan Sangwoo Kim, and Jong Chul Ye. Flowdps: Flow-driven posterior sampling for inverse problems. arXiv preprint arXiv:2503.08136,

  10. [10]

    Steering rectified flow models in the vector field for con- trolled image generation.arXiv preprint arXiv:2412.00100,

    URL https://openreview.net/forum?id=Z0ffRRtOim. Maitreya Patel, Song Wen, Dimitris N. Metaxas, and Yezhou Yang. Steering rectified flow models in the vector field for controlled image generation. arXiv preprint arXiv:2412.00100,

  11. [11]

    Semantic image inversion and editing using rectified stochastic differential equations

    Litu Rout, Yujia Chen, Nataniel Ruiz, Constantine Caramanis, Sanjay Shakkottai, and Wen-Sheng Chu. Semantic image inversion and editing using rectified stochastic differential equations. arXiv preprint arXiv:2410.10792, 2024a. Litu Rout, Negin Raoof, Giannis Daras, Constantine Caramanis, Alex Dimakis, and Sanjay Shakkottai. Solving linear inverse problems...

  12. [12]

    Palette: Image-to-image diffusion models

    Chitwan Saharia, William Chan, Huiwen Chang, Chris Lee, Jonathan Ho, Tim Salimans, David Fleet, and Mohammad Norouzi. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 conference proceedings, pp. 1–10,

  13. [13]

    Score-based generative modeling through stochastic differential equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021b. Alessio Spagnoletti, Jean Prost, Andrés Almansa, Nicolas Papadakis, and Marcelo Pereyra. Latino- pro: Latent consistenc...

  14. [14]

    Removing structured noise with diffusion models

    Tristan SW Stevens, Hans van Gorp, Faik C Meral, Junseob Shin, Jason Yu, Jean-Luc Robert, and Ruud JG van Sloun. Removing structured noise with diffusion models. arXiv preprint arXiv:2302.05290,

  15. [15]

    Qwen-Image Technical Report

    URL https://openreview.net/forum?id=6TxBxqNME1Y. Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J Fleet, Radu Soricut, et al. Imagen editor and editbench: Ad- vancing and evaluating text-guided image inpainting. In Proceedings of the IEEE/CVF conference on computer vision and...

  16. [16]

    Generative diffusion posterior sampling for informative likelihoods

    Zheng Zhao. Generative diffusion posterior sampling for informative likelihoods. arXiv preprint arXiv:2506.01083,

  17. [17]

    15 Preprint under review Figure 3: Latent-space masking and its correspondence to pixel space using a central square mask

    URL https://openreview.net/forum?id=bwJxUB0y46. 15 Preprint under review Figure 3: Latent-space masking and its correspondence to pixel space using a central square mask. The encoder and decoder of Stable Diffusion 3.5 (medium) were used. The first row shows latent images alongside the encoded mask applied to each, while the second row shows their decoded...

  18. [18]

    (2025, Algorithm 3)

    We have simply adapted the notations and used F (x) = ∥y − x[m]∥2/(2σ2 y) in Martin et al. (2025, Algorithm 3). Thus, the transition used in Algorithm 2 is ˆπθ s|t(xs|xt) ∝ N(xs[m], αsˆxθ 0(xt, t)[m], σ2 sId−dy) × N xs[m], 1 − γs σ2y αsˆxθ 0(xt, t)[m] + γs σ2y αsy, σ2 sId−dy . In the case of the DDIM schedule ηs = σs, we have that µθ s|t(xt) = αsˆxθ 0(xt,...

  19. [19]

    Hence, the main difference lies in the coefficient of the convex combination and the variance used

    in (A.2) writes ˆπθ s|t(xs|xt) = N(xs[m], αsˆxθ 0(xt, t)[m], σ2 sId−dy) × N(xs[m], (1 − ˜γs|t)αsˆxθ 0(xt, t)[m] + ˜γs|tαsy, σ2 s|τ ˜γs|tId−dy). Hence, the main difference lies in the coefficient of the convex combination and the variance used. Algorithm 2 PNP-F LOW reinterpreted 1: Input: Decreasing timesteps (tk)0 k=K with tK = 1, t0 = 0; adaptive stepsi...

  20. [20]

    (2025, line

    We also assume for the sake of simplicity that the optimization problem is solved exactly in Kim et al. (2025, line

  21. [21]

    Comparison with DiffPIR (Zhu et al.,

    and overall, follows the line of work of methods that learn a residual that is then used to translate the denoiser (Bansal et al., 2023; Zhu et al., 2023). Comparison with DiffPIR (Zhu et al.,

  22. [22]

    We provide the DIFFPIR algorithm (Zhu et al., 2023, Algorithm

    and DDNM (Wang et al., 2023b). We provide the DIFFPIR algorithm (Zhu et al., 2023, Algorithm

  23. [23]

    version of DIFFPIR recovers the DDNM algorithm (Zhang et al., 2023). Algorithm 4 DIFFPIR reinterpreted 1: Input: Decreasing timesteps (tk)0 k=K with tK = 1, t0 = 0; scaling λ; original image x∗; mask m; DDIM parameters (ηk)0 k=K 2: y ← x∗[m] 3: x ∼ N (0, Id). 4: for k = K − 1 to 1 do 5: ˆx0 ← xθ 0(x, tk+1) 6: ˆx0[m] ← σ2 tk+1 σ2 tk+1 +λσ2yα2 tk+1 y + λσ2 ...

  24. [24]

    This step is performed approximately by replacing the prior transition p0|tk+1(·|Xtk+1) with a Gaussian approximation centered at the denoiser ˆxθ 0(Xtk+1 , tk+1)

    proposes sampling, given the previous state Xtk+1, a clean state ˆX0 by performing Langevin Monte Carlo steps on the posterior distribtion π0|tk+1(·|Xtk+1 , y). This step is performed approximately by replacing the prior transition p0|tk+1(·|Xtk+1) with a Gaussian approximation centered at the denoiser ˆxθ 0(Xtk+1 , tk+1). Then, given ˆX0, the next state ...

  25. [25]

    VJP-based methods

    adopt a variational perspective: the target distribution is approximated by a Gaussian distribution whose 18 Preprint under review parameters are iteratively estimated by minimizing a combination of an observation-fidelity loss and a score-matching-like loss. VJP-based methods. A broad class of zero-shot approaches builds on the guidance approximation (2....

  26. [26]

    Proof. Using the standard Gaussian conjugation formula (Bishop, 2006, equation 2.116), we have that ˆπdps s|t (xs|xt, y) = N(x; mdps s|t (xt, y), Σdps s ) with mdps s|t (xt, y) := Σdps s|t (η−2 s µs|t(xt; η) + σ−2 y D⊤ s P ⊤ my) , Σdps s|t := η−2 s Id + σ−2 y (PmDs)⊤PmDs −1 . 21 Preprint under review Next, for the DING transition, first set bs(Zs) := −(σs...

  27. [27]

    We implemented Avrahami et al

    BLENDED -D IFF. We implemented Avrahami et al. (2023, Algorithm

  28. [28]

    The codebase includes an additional hyperparameter, blending_percentage, which determines at what fraction of the inference steps blending begins

    following their official code3. The codebase includes an additional hyperparameter, blending_percentage, which determines at what fraction of the inference steps blending begins. We set it to zero, as applying blending across all steps produced the best results. A key detail is the original implementation is that the observed region (background) is re-noi...

  29. [29]

    We found that using Langevin as MCMC sampler for enforcing data consistency works the best for low NFE regime

    based on the released code4 to the flow matching formulation. We found that using Langevin as MCMC sampler for enforcing data consistency works the best for low NFE regime. DIFFPIR . We make Zhu et al. (2023, Algorithm

  30. [30]

    The official implementation uses a DDIM transition in step 5 of Algorithm 3 whose stochasticity is controlled by the hyperparemters η

    being implemented for a mask operator. The official implementation uses a DDIM transition in step 5 of Algorithm 3 whose stochasticity is controlled by the hyperparemters η. As recommended, we set the latter to η = 0.85. FLOWCHEF & FLOWDPS . For both algorithms, we adapt the implementations available in the released codes FLOWCHEF 7 8 to our codebase. We ...

  31. [31]

    while taking as a reference the released code9. For the stepsizes on data fidelity term, we find that a constant scheduler with higher stepsize enables the algorithm to fit the observation, mitigate the smooth and blurring effects in the reconstruction and hence yield better reconstructions. PSLD . We implement the PSLD algorithm provided in Rout et al. (...

  32. [32]

    We initialize the algorithm with a sample for a standard Gaussian

    based on the official code10 and adapt it to the flow matching formulation. We initialize the algorithm with a sample for a standard Gaussian. For low NFE setups, we find that using a constant weight schedule yields better results, namely in terms fitting the observation and providing consistent reconstructions. 3https://github.com/omriav/blended-latent-d...

  33. [33]

    (2024, Appendix) and the reference code11

    based on the provided implementa- tion details in Song et al. (2024, Appendix) and the reference code11. As noted in Janati et al. (2025a), we set the tolerance ε for optimizing the data consistency to the noise level σy. Since we are working with low NEFs, we set the frequency at which hard data consistency is applied (skip step size) to