pith. sign in

arxiv: 2605.21661 · v1 · pith:DGX3YQPZnew · submitted 2026-05-20 · 💻 cs.LG · cs.AI· cs.CV

Hierarchical Variational Policies for Reward-Guided Diffusion

Pith reviewed 2026-05-22 08:44 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV
keywords diffusion modelsvariational policiesamortized inferencereward-guided generationinverse problemsfew-step samplingsuper-resolutiongenerative models
0
0 comments X

The pith

A hierarchical variational stochastic policy amortizes test-time guidance for diffusion models, enabling fast reward-aligned sampling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework to adapt pretrained diffusion models to downstream tasks like inverse problems without costly test-time optimization for each sample. It models the adaptation as a hierarchical variational process where a lightweight stochastic policy is learned to control the sampling steps. This allows for few-step sampling with large steps for speed, while the policy ensures quality through structured control. A sympathetic reader would care because it promises high-quality generative outputs at much lower computational cost, making advanced diffusion techniques more practical for real-world use.

Core claim

Formulating test-time adaptation as a hierarchical variational model with an amortized stochastic policy yields a fully amortized sampler that achieves a strong quality-speed tradeoff, matching or exceeding test-time scaling baselines with significantly less compute, such as better perceptual quality and more than 5x faster inference on 4x super-resolution.

What carries the argument

The hierarchical variational model that amortizes control into a lightweight yet expressive stochastic policy for per-step guidance in diffusion sampling.

If this is right

  • The method supports efficient few-step diffusion sampling while preserving sample quality.
  • It can be extended to a semi-amortized setting combining cheap proposals with limited test-time optimization for state-of-the-art perceptual quality.
  • The fully amortized sampler requires significantly less compute than recent baselines.
  • Applications to challenging inverse problems benefit from improved quality-speed tradeoffs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This amortization could make diffusion models viable for interactive or real-time applications where inference speed is critical.
  • The structured control from the policy might inspire similar approaches in other generative modeling frameworks.
  • Exploring the policy's expressiveness could allow even fewer steps without quality loss.

Load-bearing premise

The lightweight stochastic policy can provide structured per-step control sufficient to maintain sample quality when using large step sizes for fast few-step diffusion sampling.

What would settle it

Running the method on 4x super-resolution with very large step sizes and observing that perceptual quality falls below the best-performing baseline would falsify the claim of maintaining quality through the policy.

Figures

Figures reproduced from arXiv: 2605.21661 by Farrin Marouf Sofian, Felix Draxler, Jan Niklas Groeneveld, Kushagra Pandey, Stephan Mandt.

Figure 1
Figure 1. Figure 1: Our methods (AHVP, SHVP) produce high-quality samples satisfying measurement constraints at reduced inference cost. Baselines often show artifacts (red boxes), while ours preserve fine details (green boxes). AHVP offers strong perceptual quality with fast inference; SHVP further improves quality at a moderate additional cost. See [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: We compare diffusion-based inverse problem solvers in terms of perceptual quality (LPIPS, [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of Amortized Hierarchical Policies. Given a pretrained noise predictor Eϕ and a per-step policy πψ, we can sample the initial noise and per-step controls and perform a simple rollout to obtain conditional samples x0 without any expensive optimizations. terms: a reward term and three regularizers that ensure samples are likely under the unguided diffusion model, log p(y) ≥ Ex0∼q [PITH_FULL_IMAGE:f… view at source ↗
Figure 4
Figure 4. Figure 4: Per-step controls refine reconstruction quality. Stage-1 training learns a noise policy that recovers the coarse image structure (A0), whereas the second stage introduces per-step controls that refine fine-grained details (A1). The residual map |A1 − A0| highlights the additional textures and high-frequency information recovered by the second stage. Examples shown for FFHQ (top row) and ImageNet (bottom ro… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparisons between different methods on the Random Inpainting task. Top panel: FFHQ-256, bottom panel: ImageNet-256. Our methods capture fine-grained details better than competing baselines. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparisons between different methods on the HDR task. Top panel: FFHQ-256, bottom panel: ImageNet-256. Our methods best preserve the overall color profile among competing baselines. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative comparisons between different methods on the SR×8 task. Top panel: FFHQ-256, bottom panel: ImageNet-256. Our method captures details better than competing baselines. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
read the original abstract

Adapting pretrained diffusion models to downstream objectives such as inverse problems often requires expensive test-time guidance or optimization. We propose a principled framework for generating high-quality reward-aligned samples at substantially reduced inference cost. Our approach formulates test-time adaptation as a hierarchical variational model, where control is amortized into a lightweight yet expressive stochastic policy. This formulation naturally supports few-step diffusion sampling: large step sizes enable fast inference, while the learned policy maintains sample quality by providing structured per-step control. The resulting fully amortized sampler achieves a strong quality--speed tradeoff, matching or exceeding recent test-time scaling baselines while requiring significantly less compute. For example, on 4x super-resolution, our method achieves better perceptual quality with more than 5x faster inference compared to the best-performing baseline. We further extend our approach to a semi-amortized regime that combines cheap amortized proposals with limited test-time optimization, achieving state-of-the-art perceptual quality across several challenging inverse problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a hierarchical variational framework to amortize test-time adaptation of pretrained diffusion models for reward-guided tasks such as inverse problems. Control is learned into a lightweight stochastic policy that supports few-step sampling, where large step sizes enable fast inference while the policy supplies structured per-step corrections to preserve sample quality. The fully amortized sampler is claimed to match or exceed test-time scaling baselines on perceptual quality with >5x lower compute (e.g., 4x super-resolution), with an optional semi-amortized extension that adds limited test-time optimization to reach state-of-the-art results.

Significance. If the central empirical claims hold after addressing the noted concerns, the work would provide a practical and principled route to efficient reward-aligned diffusion sampling, reducing reliance on expensive per-sample optimization. The variational amortization of per-step control is a natural fit for the few-step regime and could influence downstream applications in image restoration and conditional generation.

major comments (2)
  1. [Abstract] Abstract: The headline quality-speed tradeoff (better perceptual quality with >5x faster inference on 4x super-resolution) rests on the assumption that the lightweight stochastic policy supplies sufficient structured per-step control to offset error accumulation at large step sizes. The manuscript should include a dedicated ablation (e.g., in the experiments section) that isolates the policy's contribution versus plain few-step sampling without the learned policy, together with quantitative measures of how the variational objective aligns the policy to the reward under large Δt.
  2. [Methods] Methods (variational objective and policy parameterization): The claim that the hierarchical variational model yields an expressive yet lightweight policy capable of maintaining quality requires explicit discussion of the tightness of the ELBO or surrogate reward used for training. Without this, it remains unclear whether the policy can reliably counteract the increased variance of the reverse process in the few-step regime, as raised by the stress-test note.
minor comments (2)
  1. [Experiments] Experiments: Tables reporting perceptual metrics on super-resolution and inverse problems should include standard deviations or confidence intervals across multiple runs to substantiate the claimed improvements over baselines.
  2. [Preliminaries] Notation: The distinction between the hierarchical policy parameters and the diffusion timestep conditioning should be clarified in the first appearance of the variational objective to avoid ambiguity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the potential impact of our hierarchical variational framework for amortizing test-time guidance in diffusion models. We address each major comment below and have incorporated revisions to strengthen the empirical and methodological presentation.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline quality-speed tradeoff (better perceptual quality with >5x faster inference on 4x super-resolution) rests on the assumption that the lightweight stochastic policy supplies sufficient structured per-step control to offset error accumulation at large step sizes. The manuscript should include a dedicated ablation (e.g., in the experiments section) that isolates the policy's contribution versus plain few-step sampling without the learned policy, together with quantitative measures of how the variational objective aligns the policy to the reward under large Δt.

    Authors: We agree that isolating the policy's contribution through a dedicated ablation would strengthen the claims. In the revised manuscript, we have added a new ablation study in the experiments section comparing the full hierarchical variational policy against plain few-step sampling (standard DDPM/DDIM at identical large step sizes, without the learned policy). We report perceptual metrics including FID and LPIPS, along with quantitative measures of variational alignment such as expected reward under the policy versus the unguided baseline and estimates of reverse-process variance at large Δt. These results show that the policy supplies structured corrections that reduce error accumulation. revision: yes

  2. Referee: [Methods] Methods (variational objective and policy parameterization): The claim that the hierarchical variational model yields an expressive yet lightweight policy capable of maintaining quality requires explicit discussion of the tightness of the ELBO or surrogate reward used for training. Without this, it remains unclear whether the policy can reliably counteract the increased variance of the reverse process in the few-step regime, as raised by the stress-test note.

    Authors: We appreciate the suggestion to clarify the ELBO tightness. The revised methods section now includes an expanded discussion of the ELBO gap in the hierarchical variational setting, noting that the per-step policy amortization yields a tighter effective bound than standard variational inference for this task. We report empirical training diagnostics showing close alignment between the surrogate reward and the true objective, and we connect this to the stress-test results to demonstrate that the stochastic policy's expressiveness reliably offsets increased variance in the few-step regime. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper formulates test-time adaptation as a hierarchical variational model and learns a stochastic policy to amortize control for reward-aligned diffusion sampling. The central quality-speed tradeoff claim rests on the empirical performance of this learned policy under the variational objective, not on any reduction of outputs to inputs by construction. No self-definitional steps, fitted parameters renamed as predictions, or load-bearing self-citations that collapse the argument are present. The framework uses standard variational inference and amortization techniques whose justification is independent of the target results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard variational inference assumptions and the existence of a lightweight policy that can approximate optimal per-step guidance; no new physical entities or ad-hoc constants are introduced in the abstract.

axioms (1)
  • domain assumption Variational approximation to the hierarchical policy can be optimized to provide effective structured control.
    Invoked when stating that the learned policy maintains sample quality.

pith-pipeline@v0.9.0 · 5709 in / 1150 out tokens · 25730 ms · 2026-05-22T08:44:18.957349+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · 4 internal anchors

  1. [1]

    Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

    Michael S Albergo, Nicholas M Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions.arXiv preprint arXiv:2303.08797, 2023

  2. [2]

    Universal guidance for diffusion models

    Arpit Bansal, Hong-Min Chu, Avi Schwarzschild, Roni Sengupta, Micah Goldblum, Jonas Geip- ing, and Tom Goldstein. Universal guidance for diffusion models. InThe Twelfth International Conference on Learning Representations, 2024

  3. [3]

    Training Diffusion Models with Reinforcement Learning

    Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning.arXiv preprint arXiv:2305.13301, 2023

  4. [4]

    Blei, Alp Kucukelbir, and Jon D

    David M. Blei, Alp Kucukelbir, and Jon D. McAuliffe. Variational inference: A review for statisticians.Journal of the American Statistical Association, 112(518):859–877, April 2017

  5. [5]

    Flow map matching with stochastic interpolants: A mathematical framework for consistency models

    Nicholas Matthew Boffi, Michael Samuel Albergo, and Eric Vanden-Eijnden. Flow map matching with stochastic interpolants: A mathematical framework for consistency models. Transactions on Machine Learning Research, 2025

  6. [6]

    Tweedie moment projected diffusions for inverse problems.Transactions on Machine Learning Research, 2024

    Benjamin Boys, Mark Girolami, Jakiw Pidstrigach, Sebastian Reich, Alan Mosca, and Omer Deniz Akyildiz. Tweedie moment projected diffusions for inverse problems.Transactions on Machine Learning Research, 2024. Featured Certification

  7. [7]

    Diffusion posterior sampling for general noisy inverse problems

    Hyungjin Chung, Jeongsol Kim, Michael Thompson Mccann, Marc Louis Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. InThe Eleventh Interna- tional Conference on Learning Representations, 2022

  8. [8]

    Kevin Clark, Paul Vicol, Kevin Swersky, and David J. Fleet. Directly fine-tuning diffusion models on differentiable rewards. InThe Twelfth International Conference on Learning Repre- sentations, 2024

  9. [9]

    Dimakis, and Mauricio Delbracio

    Giannis Daras, Hyungjin Chung, Chieh-Hsin Lai, Yuki Mitsufuji, Jong Chul Ye, Peyman Milanfar, Alexandros G. Dimakis, and Mauricio Delbracio. A survey on diffusion models for inverse problems, 2024

  10. [10]

    Imagenet: A large- scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

  11. [11]

    Diffusion models beat gans on image synthesis

    Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021. 10

  12. [12]

    Carles Domingo-Enrich, Michal Drozdzal, Brian Karrer, and Ricky T. Q. Chen. Adjoint matching: Fine-tuning flow and diffusion generative models with memoryless stochastic optimal control. InThe Thirteenth International Conference on Learning Representations, 2025

  13. [13]

    Noise hypernetworks: Amortizing test-time compute in diffusion models

    Luca Eyring, Shyamgopal Karthik, Alexey Dosovitskiy, Nataniel Ruiz, and Zeynep Akata. Noise hypernetworks: Amortizing test-time compute in diffusion models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  14. [14]

    Reno: Enhancing one-step text-to-image models through reward-based noise optimization.Advances in Neural Information Processing Systems, 37:125487–125519, 2024

    Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, and Zeynep Akata. Reno: Enhancing one-step text-to-image models through reward-based noise optimization.Advances in Neural Information Processing Systems, 37:125487–125519, 2024

  15. [15]

    Optimizing ddpm sampling with shortcut fine-tuning

    Ying Fan and Kangwook Lee. Optimizing ddpm sampling with shortcut fine-tuning. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023

  16. [16]

    Mean flows for one-step generative modeling

    Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  17. [17]

    Calibrated test-time guidance for bayesian inference, 2026

    Daniel Geyfman, Felix Draxler, Jan Groeneveld, Hyunsoo Lee, Theofanis Karaletsos, and Stephan Mandt. Calibrated test-time guidance for bayesian inference, 2026

  18. [18]

    Manifold preserving guided diffusion

    Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J Zico Kolter, Ruslan Salakhutdinov, and Stefano Ermon. Manifold preserving guided diffusion. InThe Twelfth International Conference on Learning Representations, 2024

  19. [19]

    Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017

  20. [20]

    Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

  21. [21]

    Symbolic music generation with non-differentiable rule guided diffusion

    Yujia Huang, Adishree Ghatare, Yuanzhe Liu, Ziniu Hu, Qinsheng Zhang, Chandramouli Shama Sastry, Siddharth Gururani, Sageev Oore, and Yisong Yue. Symbolic music generation with non-differentiable rule guided diffusion. InICML, 2024

  22. [22]

    Divide-and-conquer posterior sampling for denoising diffusion priors

    Yazid Janati, Badr MOUFAD, Alain Oliviero Durmus, Eric Moulines, and Jimmy Olsson. Divide-and-conquer posterior sampling for denoising diffusion priors. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

  23. [23]

    Progressive growing of gans for im- proved quality, stability, and variation

    Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for im- proved quality, stability, and variation. InInternational Conference on Learning Representations, 2018

  24. [24]

    Denoising diffusion restoration models.Advances in Neural Information Processing Systems, 35:23593–23606, 2022

    Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. Denoising diffusion restoration models.Advances in Neural Information Processing Systems, 35:23593–23606, 2022

  25. [25]

    Semi-amortized variational autoencoders

    Yoon Kim, Sam Wiseman, Andrew Miller, David Sontag, and Alexander Rush. Semi-amortized variational autoencoders. InInternational Conference on Machine Learning, pages 2678–2687. PMLR, 2018

  26. [26]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017

  27. [27]

    Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in neural information processing systems, 36:36652–36663, 2023

    Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in neural information processing systems, 36:36652–36663, 2023

  28. [28]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InInternational Conference on Learning Representations, 2023. 11

  29. [29]

    Inference-time scaling for diffusion models beyond scaling denoising steps, 2025

    Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, and Saining Xie. Inference-time scaling for diffusion models beyond scaling denoising steps, 2025

  30. [30]

    A variational perspective on solving inverse problems with diffusion models

    Morteza Mardani, Jiaming Song, Jan Kautz, and Arash Vahdat. A variational perspective on solving inverse problems with diffusion models. InThe Twelfth International Conference on Learning Representations, 2024

  31. [31]

    Iterative amortized inference

    Joe Marino, Yisong Yue, and Stephan Mandt. Iterative amortized inference. InInternational Conference on Machine Learning, pages 3403–3412. PMLR, 2018

  32. [32]

    Variational control for guidance in diffusion models

    Kushagra Pandey, Farrin Marouf Sofian, Felix Draxler, Theofanis Karaletsos, and Stephan Mandt. Variational control for guidance in diffusion models. InF orty-second International Conference on Machine Learning, 2025

  33. [33]

    Fast samplers for inverse problems in iterative refinement models

    Kushagra Pandey, Ruihan Yang, and Stephan Mandt. Fast samplers for inverse problems in iterative refinement models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

  34. [34]

    Scalable diffusion models with transformers

    William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4195–4205, October 2023

  35. [35]

    Muckley, Ricky T

    Ashwini Pokle, Matthew J. Muckley, Ricky T. Q. Chen, and Brian Karrer. Training-free linear image inverses via flows.Transactions on Machine Learning Research, 2024

  36. [36]

    Direct preference optimization: Your language model is secretly a reward model

    Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in neural information processing systems, 36:53728–53741, 2023

  37. [37]

    Hierarchical variational models

    Rajesh Ranganath, Dustin Tran, and David Blei. Hierarchical variational models. In Maria Flo- rina Balcan and Kilian Q. Weinberger, editors,Proceedings of The 33rd International Confer- ence on Machine Learning, volume 48 ofProceedings of Machine Learning Research, pages 324–333, New York, New York, USA, 20–22 Jun 2016. PMLR

  38. [38]

    RB-modulation: Training-free stylization using reference-based modulation

    Litu Rout, Yujia Chen, Nataniel Ruiz, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkot- tai, and Wen-Sheng Chu. RB-modulation: Training-free stylization using reference-based modulation. InThe Thirteenth International Conference on Learning Representations, 2025

  39. [39]

    Progressive distillation for fast sampling of diffusion models

    Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InInternational Conference on Learning Representations, 2022

  40. [40]

    A general framework for inference-time scaling and steering of diffusion models

    Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, and Rajesh Ranganath. A general framework for inference-time scaling and steering of diffusion models. InF orty-second International Conference on Machine Learning, 2025

  41. [41]

    Deep unsuper- vised learning using nonequilibrium thermodynamics

    Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InInternational Conference on Machine Learning, pages 2256–2265. PMLR, 2015

  42. [42]

    Pseudoinverse-guided diffusion models for inverse problems

    Jiaming Song, Arash Vahdat, Morteza Mardani, and Jan Kautz. Pseudoinverse-guided diffusion models for inverse problems. InInternational Conference on Learning Representations, 2023

  43. [43]

    Consistency models

    Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In Proceedings of the 40th International Conference on Machine Learning, pages 32211–32252, 2023

  44. [44]

    Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond.arXiv preprint arXiv:2403.06279, 2024

    Wenpin Tang and Fuzhong Zhou. Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond.arXiv preprint arXiv:2403.06279, 2024

  45. [45]

    Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review, 2024

    Masatoshi Uehara, Yulai Zhao, Tommaso Biancalani, and Sergey Levine. Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review, 2024. 12

  46. [46]

    Fine- tuning of continuous-time diffusion models as entropy-regularized control.arXiv preprint arXiv:2402.15194, 2024

    Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Tommaso Biancalani, and Sergey Levine. Fine- tuning of continuous-time diffusion models as entropy-regularized control.arXiv preprint arXiv:2402.15194, 2024

  47. [47]

    Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review, 2025

    Masatoshi Uehara, Yulai Zhao, Chenyu Wang, Xiner Li, Aviv Regev, Sergey Levine, and Tom- maso Biancalani. Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review, 2025

  48. [48]

    Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models

    Siddarth Venkatraman, Mohsin Hasan, Minsu Kim, Luca Scimeca, Marcin Sendera, Yoshua Bengio, Glen Berseth, and Nikolay Malkin. Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models. In Aarti Singh, Maryam Fazel, Daniel Hsu, Simon Lacoste-Julien, Felix Berkenkamp, Tegan Maharaj, Kiri Wagstaff, and Jerry Zhu, edi-...

  49. [49]

    DMPlug: A plug-in method for solving inverse problems with diffusion models

    Hengkang Wang, Xu Zhang, Taihui Li, Yuxiang Wan, Tiancong Chen, and Ju Sun. DMPlug: A plug-in method for solving inverse problems with diffusion models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

  50. [50]

    Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

    Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, and Hongsheng Li. Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis.arXiv preprint arXiv:2306.09341, 2023

  51. [51]

    Imagereward: Learning and evaluating human preferences for text-to-image generation

    Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: Learning and evaluating human preferences for text-to-image generation. Advances in Neural Information Processing Systems, 36:15903–15935, 2023

  52. [52]

    One-step diffusion with distribution matching distillation

    Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6613–6623, 2024

  53. [53]

    Freedom: Training- free energy-guided conditional diffusion model

    Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, and Jian Zhang. Freedom: Training- free energy-guided conditional diffusion model. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 23174–23184, October 2023

  54. [54]

    Improving diffusion inverse problem solving with decoupled noise annealing

    Bingliang Zhang, Wenda Chu, Julius Berner, Chenlin Meng, Anima Anandkumar, and Yang Song. Improving diffusion inverse problem solving with decoupled noise annealing. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 20895–20905, 2025

  55. [55]

    Advances in variational inference.IEEE transactions on pattern analysis and machine intelligence, 41(8):2008–2026, 2018

    Cheng Zhang, Judith Bütepage, Hedvig Kjellström, and Stephan Mandt. Advances in variational inference.IEEE transactions on pattern analysis and machine intelligence, 41(8):2008–2026, 2018

  56. [56]

    The unrea- sonable effectiveness of deep features as a perceptual metric

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018

  57. [57]

    Inversebench: Benchmarking plug-and-play diffusion priors for inverse problems in physical sciences

    Hongkai Zheng, Wenda Chu, Bingliang Zhang, Zihui Wu, Austin Wang, Berthy Feng, Caifeng Zou, Yu Sun, Nikola Borislavov Kovachki, Zachary E Ross, Katherine Bouman, and Yisong Yue. Inversebench: Benchmarking plug-and-play diffusion priors for inverse problems in physical sciences. InThe Thirteenth International Conference on Learning Representations, 2025

  58. [58]

    Inductive moment matching

    Linqi Zhou, Stefano Ermon, and Jiaming Song. Inductive moment matching. InF orty-second International Conference on Machine Learning, 2025

  59. [59]

    Fine-Tuning Language Models from Human Preferences

    Daniel M Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019. 13 A Hierarchical Variational Policies Here we present the proof of the bound in Eq. 7 in the main text. Proof overview.We derive a variation...

  60. [60]

    We write the standard ELBO, decomposing it into an expected log-joint (energy) term and an entropy term

  61. [61]

    We lower-bound the entropy by introducing a factored approximate posterior ¯r(u1:T | x0:T ,y)over the controls, exploiting the non-negativity of the KL divergence

  62. [62]

    We substitute the factored parameterizations of both the generative and variational processes to arrive at a per-timestep bound. Proof.Recall the generative process, p(x0:T ,y) =p(x T ) " TY t=1 p(xt−1 |x t) # p(y|x 0).(12) We introduce a variational distribution q(x0:T |y) and apply Jensen’s inequality to obtain the standard ELBO, logp(y) = log Z p(x0:T ...

  63. [63]

    C-ΠGDM [33].For all tasks and datasets, we fix the number of diffusion steps to 10 using the noise conditioned C-ΠGDM sampler

    DDRM can only be applied to linear inverse problems. C-ΠGDM [33].For all tasks and datasets, we fix the number of diffusion steps to 10 using the noise conditioned C-ΠGDM sampler. Here w and τ represent the hyperparameters for projection to a conjugate space while τ represents the start-time for reverse diffusion sampling. We find that C-ΠGDM is unstable ...

  64. [64]

    Limitations

    We set DDIM η= 0.5 and the scale parameter to 20.0, which we found to work best across tasks. RED-Diff [30].For all tasks and datasets, we reduce the number of steps for RED-Diff to 10. We use the DDIM sampler with η= 0.0 . We highlight other important hyperparameters, such as the learning rate and gradient-term weight, in Table 5. NDTM.We reduce the numb...

  65. [65]

    Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

    Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...