Hierarchical Variational Policies for Reward-Guided Diffusion

Farrin Marouf Sofian; Felix Draxler; Jan Niklas Groeneveld; Kushagra Pandey; Stephan Mandt

arxiv: 2605.21661 · v1 · pith:DGX3YQPZnew · submitted 2026-05-20 · 💻 cs.LG · cs.AI· cs.CV

Hierarchical Variational Policies for Reward-Guided Diffusion

Kushagra Pandey , Farrin Marouf Sofian , Jan Niklas Groeneveld , Felix Draxler , Stephan Mandt This is my paper

Pith reviewed 2026-05-22 08:44 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV

keywords diffusion modelsvariational policiesamortized inferencereward-guided generationinverse problemsfew-step samplingsuper-resolutiongenerative models

0 comments

The pith

A hierarchical variational stochastic policy amortizes test-time guidance for diffusion models, enabling fast reward-aligned sampling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework to adapt pretrained diffusion models to downstream tasks like inverse problems without costly test-time optimization for each sample. It models the adaptation as a hierarchical variational process where a lightweight stochastic policy is learned to control the sampling steps. This allows for few-step sampling with large steps for speed, while the policy ensures quality through structured control. A sympathetic reader would care because it promises high-quality generative outputs at much lower computational cost, making advanced diffusion techniques more practical for real-world use.

Core claim

Formulating test-time adaptation as a hierarchical variational model with an amortized stochastic policy yields a fully amortized sampler that achieves a strong quality-speed tradeoff, matching or exceeding test-time scaling baselines with significantly less compute, such as better perceptual quality and more than 5x faster inference on 4x super-resolution.

What carries the argument

The hierarchical variational model that amortizes control into a lightweight yet expressive stochastic policy for per-step guidance in diffusion sampling.

If this is right

The method supports efficient few-step diffusion sampling while preserving sample quality.
It can be extended to a semi-amortized setting combining cheap proposals with limited test-time optimization for state-of-the-art perceptual quality.
The fully amortized sampler requires significantly less compute than recent baselines.
Applications to challenging inverse problems benefit from improved quality-speed tradeoffs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This amortization could make diffusion models viable for interactive or real-time applications where inference speed is critical.
The structured control from the policy might inspire similar approaches in other generative modeling frameworks.
Exploring the policy's expressiveness could allow even fewer steps without quality loss.

Load-bearing premise

The lightweight stochastic policy can provide structured per-step control sufficient to maintain sample quality when using large step sizes for fast few-step diffusion sampling.

What would settle it

Running the method on 4x super-resolution with very large step sizes and observing that perceptual quality falls below the best-performing baseline would falsify the claim of maintaining quality through the policy.

Figures

Figures reproduced from arXiv: 2605.21661 by Farrin Marouf Sofian, Felix Draxler, Jan Niklas Groeneveld, Kushagra Pandey, Stephan Mandt.

**Figure 1.** Figure 1: Our methods (AHVP, SHVP) produce high-quality samples satisfying measurement constraints at reduced inference cost. Baselines often show artifacts (red boxes), while ours preserve fine details (green boxes). AHVP offers strong perceptual quality with fast inference; SHVP further improves quality at a moderate additional cost. See [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: We compare diffusion-based inverse problem solvers in terms of perceptual quality (LPIPS, [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of Amortized Hierarchical Policies. Given a pretrained noise predictor Eϕ and a per-step policy πψ, we can sample the initial noise and per-step controls and perform a simple rollout to obtain conditional samples x0 without any expensive optimizations. terms: a reward term and three regularizers that ensure samples are likely under the unguided diffusion model, log p(y) ≥ Ex0∼q [PITH_FULL_IMAGE:f… view at source ↗

**Figure 4.** Figure 4: Per-step controls refine reconstruction quality. Stage-1 training learns a noise policy that recovers the coarse image structure (A0), whereas the second stage introduces per-step controls that refine fine-grained details (A1). The residual map |A1 − A0| highlights the additional textures and high-frequency information recovered by the second stage. Examples shown for FFHQ (top row) and ImageNet (bottom ro… view at source ↗

**Figure 5.** Figure 5: Qualitative comparisons between different methods on the Random Inpainting task. Top panel: FFHQ-256, bottom panel: ImageNet-256. Our methods capture fine-grained details better than competing baselines. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative comparisons between different methods on the HDR task. Top panel: FFHQ-256, bottom panel: ImageNet-256. Our methods best preserve the overall color profile among competing baselines. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative comparisons between different methods on the SR×8 task. Top panel: FFHQ-256, bottom panel: ImageNet-256. Our method captures details better than competing baselines. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗

read the original abstract

Adapting pretrained diffusion models to downstream objectives such as inverse problems often requires expensive test-time guidance or optimization. We propose a principled framework for generating high-quality reward-aligned samples at substantially reduced inference cost. Our approach formulates test-time adaptation as a hierarchical variational model, where control is amortized into a lightweight yet expressive stochastic policy. This formulation naturally supports few-step diffusion sampling: large step sizes enable fast inference, while the learned policy maintains sample quality by providing structured per-step control. The resulting fully amortized sampler achieves a strong quality--speed tradeoff, matching or exceeding recent test-time scaling baselines while requiring significantly less compute. For example, on 4x super-resolution, our method achieves better perceptual quality with more than 5x faster inference compared to the best-performing baseline. We further extend our approach to a semi-amortized regime that combines cheap amortized proposals with limited test-time optimization, achieving state-of-the-art perceptual quality across several challenging inverse problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper amortizes test-time reward guidance in diffusion via a hierarchical variational stochastic policy, enabling faster few-step sampling on inverse problems with reported quality gains.

read the letter

The main thing to know is that this work turns expensive per-sample optimization for guiding pretrained diffusion models into a learned hierarchical variational policy. The policy is amortized and stochastic, meant to supply structured corrections at each step so that large step sizes still produce decent samples on tasks like super-resolution and other inverse problems. They report better perceptual quality than recent test-time scaling baselines while using more than 5x less compute on 4x super-resolution, plus a semi-amortized variant that mixes the cheap policy with limited further optimization.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a hierarchical variational framework to amortize test-time adaptation of pretrained diffusion models for reward-guided tasks such as inverse problems. Control is learned into a lightweight stochastic policy that supports few-step sampling, where large step sizes enable fast inference while the policy supplies structured per-step corrections to preserve sample quality. The fully amortized sampler is claimed to match or exceed test-time scaling baselines on perceptual quality with >5x lower compute (e.g., 4x super-resolution), with an optional semi-amortized extension that adds limited test-time optimization to reach state-of-the-art results.

Significance. If the central empirical claims hold after addressing the noted concerns, the work would provide a practical and principled route to efficient reward-aligned diffusion sampling, reducing reliance on expensive per-sample optimization. The variational amortization of per-step control is a natural fit for the few-step regime and could influence downstream applications in image restoration and conditional generation.

major comments (2)

[Abstract] Abstract: The headline quality-speed tradeoff (better perceptual quality with >5x faster inference on 4x super-resolution) rests on the assumption that the lightweight stochastic policy supplies sufficient structured per-step control to offset error accumulation at large step sizes. The manuscript should include a dedicated ablation (e.g., in the experiments section) that isolates the policy's contribution versus plain few-step sampling without the learned policy, together with quantitative measures of how the variational objective aligns the policy to the reward under large Δt.
[Methods] Methods (variational objective and policy parameterization): The claim that the hierarchical variational model yields an expressive yet lightweight policy capable of maintaining quality requires explicit discussion of the tightness of the ELBO or surrogate reward used for training. Without this, it remains unclear whether the policy can reliably counteract the increased variance of the reverse process in the few-step regime, as raised by the stress-test note.

minor comments (2)

[Experiments] Experiments: Tables reporting perceptual metrics on super-resolution and inverse problems should include standard deviations or confidence intervals across multiple runs to substantiate the claimed improvements over baselines.
[Preliminaries] Notation: The distinction between the hierarchical policy parameters and the diffusion timestep conditioning should be clarified in the first appearance of the variational objective to avoid ambiguity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the potential impact of our hierarchical variational framework for amortizing test-time guidance in diffusion models. We address each major comment below and have incorporated revisions to strengthen the empirical and methodological presentation.

read point-by-point responses

Referee: [Abstract] Abstract: The headline quality-speed tradeoff (better perceptual quality with >5x faster inference on 4x super-resolution) rests on the assumption that the lightweight stochastic policy supplies sufficient structured per-step control to offset error accumulation at large step sizes. The manuscript should include a dedicated ablation (e.g., in the experiments section) that isolates the policy's contribution versus plain few-step sampling without the learned policy, together with quantitative measures of how the variational objective aligns the policy to the reward under large Δt.

Authors: We agree that isolating the policy's contribution through a dedicated ablation would strengthen the claims. In the revised manuscript, we have added a new ablation study in the experiments section comparing the full hierarchical variational policy against plain few-step sampling (standard DDPM/DDIM at identical large step sizes, without the learned policy). We report perceptual metrics including FID and LPIPS, along with quantitative measures of variational alignment such as expected reward under the policy versus the unguided baseline and estimates of reverse-process variance at large Δt. These results show that the policy supplies structured corrections that reduce error accumulation. revision: yes
Referee: [Methods] Methods (variational objective and policy parameterization): The claim that the hierarchical variational model yields an expressive yet lightweight policy capable of maintaining quality requires explicit discussion of the tightness of the ELBO or surrogate reward used for training. Without this, it remains unclear whether the policy can reliably counteract the increased variance of the reverse process in the few-step regime, as raised by the stress-test note.

Authors: We appreciate the suggestion to clarify the ELBO tightness. The revised methods section now includes an expanded discussion of the ELBO gap in the hierarchical variational setting, noting that the per-step policy amortization yields a tighter effective bound than standard variational inference for this task. We report empirical training diagnostics showing close alignment between the surrogate reward and the true objective, and we connect this to the stress-test results to demonstrate that the stochastic policy's expressiveness reliably offsets increased variance in the few-step regime. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper formulates test-time adaptation as a hierarchical variational model and learns a stochastic policy to amortize control for reward-aligned diffusion sampling. The central quality-speed tradeoff claim rests on the empirical performance of this learned policy under the variational objective, not on any reduction of outputs to inputs by construction. No self-definitional steps, fitted parameters renamed as predictions, or load-bearing self-citations that collapse the argument are present. The framework uses standard variational inference and amortization techniques whose justification is independent of the target results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard variational inference assumptions and the existence of a lightweight policy that can approximate optimal per-step guidance; no new physical entities or ad-hoc constants are introduced in the abstract.

axioms (1)

domain assumption Variational approximation to the hierarchical policy can be optimized to provide effective structured control.
Invoked when stating that the learned policy maintains sample quality.

pith-pipeline@v0.9.0 · 5709 in / 1150 out tokens · 25730 ms · 2026-05-22T08:44:18.957349+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a unified framework that recasts test-time guidance in diffusion models as inference over hierarchical variational policies... q(x0:T,u1:T|y) = q(xT|y) ∏ q(ut|xt,u>t,y) q(xt−1|xt,ut)
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The resulting fully amortized sampler achieves a strong quality–speed tradeoff... on 4× super-resolution, our method achieves better perceptual quality with more than 5× faster inference

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · 4 internal anchors

[1]

Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

Michael S Albergo, Nicholas M Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions.arXiv preprint arXiv:2303.08797, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Universal guidance for diffusion models

Arpit Bansal, Hong-Min Chu, Avi Schwarzschild, Roni Sengupta, Micah Goldblum, Jonas Geip- ing, and Tom Goldstein. Universal guidance for diffusion models. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[3]

Training Diffusion Models with Reinforcement Learning

Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning.arXiv preprint arXiv:2305.13301, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

Blei, Alp Kucukelbir, and Jon D

David M. Blei, Alp Kucukelbir, and Jon D. McAuliffe. Variational inference: A review for statisticians.Journal of the American Statistical Association, 112(518):859–877, April 2017

work page 2017
[5]

Flow map matching with stochastic interpolants: A mathematical framework for consistency models

Nicholas Matthew Boffi, Michael Samuel Albergo, and Eric Vanden-Eijnden. Flow map matching with stochastic interpolants: A mathematical framework for consistency models. Transactions on Machine Learning Research, 2025

work page 2025
[6]

Tweedie moment projected diffusions for inverse problems.Transactions on Machine Learning Research, 2024

Benjamin Boys, Mark Girolami, Jakiw Pidstrigach, Sebastian Reich, Alan Mosca, and Omer Deniz Akyildiz. Tweedie moment projected diffusions for inverse problems.Transactions on Machine Learning Research, 2024. Featured Certification

work page 2024
[7]

Diffusion posterior sampling for general noisy inverse problems

Hyungjin Chung, Jeongsol Kim, Michael Thompson Mccann, Marc Louis Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. InThe Eleventh Interna- tional Conference on Learning Representations, 2022

work page 2022
[8]

Kevin Clark, Paul Vicol, Kevin Swersky, and David J. Fleet. Directly fine-tuning diffusion models on differentiable rewards. InThe Twelfth International Conference on Learning Repre- sentations, 2024

work page 2024
[9]

Dimakis, and Mauricio Delbracio

Giannis Daras, Hyungjin Chung, Chieh-Hsin Lai, Yuki Mitsufuji, Jong Chul Ye, Peyman Milanfar, Alexandros G. Dimakis, and Mauricio Delbracio. A survey on diffusion models for inverse problems, 2024

work page 2024
[10]

Imagenet: A large- scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

work page 2009
[11]

Diffusion models beat gans on image synthesis

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021. 10

work page 2021
[12]

Carles Domingo-Enrich, Michal Drozdzal, Brian Karrer, and Ricky T. Q. Chen. Adjoint matching: Fine-tuning flow and diffusion generative models with memoryless stochastic optimal control. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[13]

Noise hypernetworks: Amortizing test-time compute in diffusion models

Luca Eyring, Shyamgopal Karthik, Alexey Dosovitskiy, Nataniel Ruiz, and Zeynep Akata. Noise hypernetworks: Amortizing test-time compute in diffusion models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025
[14]

Reno: Enhancing one-step text-to-image models through reward-based noise optimization.Advances in Neural Information Processing Systems, 37:125487–125519, 2024

Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, and Zeynep Akata. Reno: Enhancing one-step text-to-image models through reward-based noise optimization.Advances in Neural Information Processing Systems, 37:125487–125519, 2024

work page 2024
[15]

Optimizing ddpm sampling with shortcut fine-tuning

Ying Fan and Kangwook Lee. Optimizing ddpm sampling with shortcut fine-tuning. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023

work page 2023
[16]

Mean flows for one-step generative modeling

Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025
[17]

Calibrated test-time guidance for bayesian inference, 2026

Daniel Geyfman, Felix Draxler, Jan Groeneveld, Hyunsoo Lee, Theofanis Karaletsos, and Stephan Mandt. Calibrated test-time guidance for bayesian inference, 2026

work page 2026
[18]

Manifold preserving guided diffusion

Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J Zico Kolter, Ruslan Salakhutdinov, and Stefano Ermon. Manifold preserving guided diffusion. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[19]

Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017

work page 2017
[20]

Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

work page 2020
[21]

Symbolic music generation with non-differentiable rule guided diffusion

Yujia Huang, Adishree Ghatare, Yuanzhe Liu, Ziniu Hu, Qinsheng Zhang, Chandramouli Shama Sastry, Siddharth Gururani, Sageev Oore, and Yisong Yue. Symbolic music generation with non-differentiable rule guided diffusion. InICML, 2024

work page 2024
[22]

Divide-and-conquer posterior sampling for denoising diffusion priors

Yazid Janati, Badr MOUFAD, Alain Oliviero Durmus, Eric Moulines, and Jimmy Olsson. Divide-and-conquer posterior sampling for denoising diffusion priors. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024
[23]

Progressive growing of gans for im- proved quality, stability, and variation

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for im- proved quality, stability, and variation. InInternational Conference on Learning Representations, 2018

work page 2018
[24]

Denoising diffusion restoration models.Advances in Neural Information Processing Systems, 35:23593–23606, 2022

Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. Denoising diffusion restoration models.Advances in Neural Information Processing Systems, 35:23593–23606, 2022

work page 2022
[25]

Semi-amortized variational autoencoders

Yoon Kim, Sam Wiseman, Andrew Miller, David Sontag, and Alexander Rush. Semi-amortized variational autoencoders. InInternational Conference on Machine Learning, pages 2678–2687. PMLR, 2018

work page 2018
[26]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017

work page 2017
[27]

Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in neural information processing systems, 36:36652–36663, 2023

Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in neural information processing systems, 36:36652–36663, 2023

work page 2023
[28]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InInternational Conference on Learning Representations, 2023. 11

work page 2023
[29]

Inference-time scaling for diffusion models beyond scaling denoising steps, 2025

Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, and Saining Xie. Inference-time scaling for diffusion models beyond scaling denoising steps, 2025

work page 2025
[30]

A variational perspective on solving inverse problems with diffusion models

Morteza Mardani, Jiaming Song, Jan Kautz, and Arash Vahdat. A variational perspective on solving inverse problems with diffusion models. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[31]

Iterative amortized inference

Joe Marino, Yisong Yue, and Stephan Mandt. Iterative amortized inference. InInternational Conference on Machine Learning, pages 3403–3412. PMLR, 2018

work page 2018
[32]

Variational control for guidance in diffusion models

Kushagra Pandey, Farrin Marouf Sofian, Felix Draxler, Theofanis Karaletsos, and Stephan Mandt. Variational control for guidance in diffusion models. InF orty-second International Conference on Machine Learning, 2025

work page 2025
[33]

Fast samplers for inverse problems in iterative refinement models

Kushagra Pandey, Ruihan Yang, and Stephan Mandt. Fast samplers for inverse problems in iterative refinement models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024
[34]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4195–4205, October 2023

work page 2023
[35]

Muckley, Ricky T

Ashwini Pokle, Matthew J. Muckley, Ricky T. Q. Chen, and Brian Karrer. Training-free linear image inverses via flows.Transactions on Machine Learning Research, 2024

work page 2024
[36]

Direct preference optimization: Your language model is secretly a reward model

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in neural information processing systems, 36:53728–53741, 2023

work page 2023
[37]

Hierarchical variational models

Rajesh Ranganath, Dustin Tran, and David Blei. Hierarchical variational models. In Maria Flo- rina Balcan and Kilian Q. Weinberger, editors,Proceedings of The 33rd International Confer- ence on Machine Learning, volume 48 ofProceedings of Machine Learning Research, pages 324–333, New York, New York, USA, 20–22 Jun 2016. PMLR

work page 2016
[38]

RB-modulation: Training-free stylization using reference-based modulation

Litu Rout, Yujia Chen, Nataniel Ruiz, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkot- tai, and Wen-Sheng Chu. RB-modulation: Training-free stylization using reference-based modulation. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[39]

Progressive distillation for fast sampling of diffusion models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InInternational Conference on Learning Representations, 2022

work page 2022
[40]

A general framework for inference-time scaling and steering of diffusion models

Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, and Rajesh Ranganath. A general framework for inference-time scaling and steering of diffusion models. InF orty-second International Conference on Machine Learning, 2025

work page 2025
[41]

Deep unsuper- vised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InInternational Conference on Machine Learning, pages 2256–2265. PMLR, 2015

work page 2015
[42]

Pseudoinverse-guided diffusion models for inverse problems

Jiaming Song, Arash Vahdat, Morteza Mardani, and Jan Kautz. Pseudoinverse-guided diffusion models for inverse problems. InInternational Conference on Learning Representations, 2023

work page 2023
[43]

Consistency models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In Proceedings of the 40th International Conference on Machine Learning, pages 32211–32252, 2023

work page 2023
[44]

Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond.arXiv preprint arXiv:2403.06279, 2024

Wenpin Tang and Fuzhong Zhou. Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond.arXiv preprint arXiv:2403.06279, 2024

work page arXiv 2024
[45]

Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review, 2024

Masatoshi Uehara, Yulai Zhao, Tommaso Biancalani, and Sergey Levine. Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review, 2024. 12

work page 2024
[46]

Fine- tuning of continuous-time diffusion models as entropy-regularized control.arXiv preprint arXiv:2402.15194, 2024

Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Tommaso Biancalani, and Sergey Levine. Fine- tuning of continuous-time diffusion models as entropy-regularized control.arXiv preprint arXiv:2402.15194, 2024

work page arXiv 2024
[47]

Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review, 2025

Masatoshi Uehara, Yulai Zhao, Chenyu Wang, Xiner Li, Aviv Regev, Sergey Levine, and Tom- maso Biancalani. Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review, 2025

work page 2025
[48]

Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models

Siddarth Venkatraman, Mohsin Hasan, Minsu Kim, Luca Scimeca, Marcin Sendera, Yoshua Bengio, Glen Berseth, and Nikolay Malkin. Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models. In Aarti Singh, Maryam Fazel, Daniel Hsu, Simon Lacoste-Julien, Felix Berkenkamp, Tegan Maharaj, Kiri Wagstaff, and Jerry Zhu, edi-...

work page 2025
[49]

DMPlug: A plug-in method for solving inverse problems with diffusion models

Hengkang Wang, Xu Zhang, Taihui Li, Yuxiang Wan, Tiancong Chen, and Ju Sun. DMPlug: A plug-in method for solving inverse problems with diffusion models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024
[50]

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, and Hongsheng Li. Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis.arXiv preprint arXiv:2306.09341, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[51]

Imagereward: Learning and evaluating human preferences for text-to-image generation

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: Learning and evaluating human preferences for text-to-image generation. Advances in Neural Information Processing Systems, 36:15903–15935, 2023

work page 2023
[52]

One-step diffusion with distribution matching distillation

Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6613–6623, 2024

work page 2024
[53]

Freedom: Training- free energy-guided conditional diffusion model

Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, and Jian Zhang. Freedom: Training- free energy-guided conditional diffusion model. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 23174–23184, October 2023

work page 2023
[54]

Improving diffusion inverse problem solving with decoupled noise annealing

Bingliang Zhang, Wenda Chu, Julius Berner, Chenlin Meng, Anima Anandkumar, and Yang Song. Improving diffusion inverse problem solving with decoupled noise annealing. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 20895–20905, 2025

work page 2025
[55]

Advances in variational inference.IEEE transactions on pattern analysis and machine intelligence, 41(8):2008–2026, 2018

Cheng Zhang, Judith Bütepage, Hedvig Kjellström, and Stephan Mandt. Advances in variational inference.IEEE transactions on pattern analysis and machine intelligence, 41(8):2008–2026, 2018

work page 2008
[56]

The unrea- sonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018

work page 2018
[57]

Inversebench: Benchmarking plug-and-play diffusion priors for inverse problems in physical sciences

Hongkai Zheng, Wenda Chu, Bingliang Zhang, Zihui Wu, Austin Wang, Berthy Feng, Caifeng Zou, Yu Sun, Nikola Borislavov Kovachki, Zachary E Ross, Katherine Bouman, and Yisong Yue. Inversebench: Benchmarking plug-and-play diffusion priors for inverse problems in physical sciences. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[58]

Inductive moment matching

Linqi Zhou, Stefano Ermon, and Jiaming Song. Inductive moment matching. InF orty-second International Conference on Machine Learning, 2025

work page 2025
[59]

Fine-Tuning Language Models from Human Preferences

Daniel M Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019. 13 A Hierarchical Variational Policies Here we present the proof of the bound in Eq. 7 in the main text. Proof overview.We derive a variation...

work page internal anchor Pith review Pith/arXiv arXiv 1909
[60]

We write the standard ELBO, decomposing it into an expected log-joint (energy) term and an entropy term

work page
[61]

We lower-bound the entropy by introducing a factored approximate posterior ¯r(u1:T | x0:T ,y)over the controls, exploiting the non-negativity of the KL divergence

work page
[62]

We substitute the factored parameterizations of both the generative and variational processes to arrive at a per-timestep bound. Proof.Recall the generative process, p(x0:T ,y) =p(x T ) " TY t=1 p(xt−1 |x t) # p(y|x 0).(12) We introduce a variational distribution q(x0:T |y) and apply Jensen’s inequality to obtain the standard ELBO, logp(y) = log Z p(x0:T ...

work page
[63]

C-ΠGDM [33].For all tasks and datasets, we fix the number of diffusion steps to 10 using the noise conditioned C-ΠGDM sampler

DDRM can only be applied to linear inverse problems. C-ΠGDM [33].For all tasks and datasets, we fix the number of diffusion steps to 10 using the noise conditioned C-ΠGDM sampler. Here w and τ represent the hyperparameters for projection to a conjugate space while τ represents the start-time for reverse diffusion sampling. We find that C-ΠGDM is unstable ...

work page
[64]

Limitations

We set DDIM η= 0.5 and the scale parameter to 20.0, which we found to work best across tasks. RED-Diff [30].For all tasks and datasets, we reduce the number of steps for RED-Diff to 10. We use the DDIM sampler with η= 0.0 . We highlight other important hyperparameters, such as the learning rate and gradient-term weight, in Table 5. NDTM.We reduce the numb...

work page
[65]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page

[1] [1]

Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

Michael S Albergo, Nicholas M Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions.arXiv preprint arXiv:2303.08797, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

Universal guidance for diffusion models

Arpit Bansal, Hong-Min Chu, Avi Schwarzschild, Roni Sengupta, Micah Goldblum, Jonas Geip- ing, and Tom Goldstein. Universal guidance for diffusion models. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[3] [3]

Training Diffusion Models with Reinforcement Learning

Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning.arXiv preprint arXiv:2305.13301, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[4] [4]

Blei, Alp Kucukelbir, and Jon D

David M. Blei, Alp Kucukelbir, and Jon D. McAuliffe. Variational inference: A review for statisticians.Journal of the American Statistical Association, 112(518):859–877, April 2017

work page 2017

[5] [5]

Flow map matching with stochastic interpolants: A mathematical framework for consistency models

Nicholas Matthew Boffi, Michael Samuel Albergo, and Eric Vanden-Eijnden. Flow map matching with stochastic interpolants: A mathematical framework for consistency models. Transactions on Machine Learning Research, 2025

work page 2025

[6] [6]

Tweedie moment projected diffusions for inverse problems.Transactions on Machine Learning Research, 2024

Benjamin Boys, Mark Girolami, Jakiw Pidstrigach, Sebastian Reich, Alan Mosca, and Omer Deniz Akyildiz. Tweedie moment projected diffusions for inverse problems.Transactions on Machine Learning Research, 2024. Featured Certification

work page 2024

[7] [7]

Diffusion posterior sampling for general noisy inverse problems

Hyungjin Chung, Jeongsol Kim, Michael Thompson Mccann, Marc Louis Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. InThe Eleventh Interna- tional Conference on Learning Representations, 2022

work page 2022

[8] [8]

Kevin Clark, Paul Vicol, Kevin Swersky, and David J. Fleet. Directly fine-tuning diffusion models on differentiable rewards. InThe Twelfth International Conference on Learning Repre- sentations, 2024

work page 2024

[9] [9]

Dimakis, and Mauricio Delbracio

Giannis Daras, Hyungjin Chung, Chieh-Hsin Lai, Yuki Mitsufuji, Jong Chul Ye, Peyman Milanfar, Alexandros G. Dimakis, and Mauricio Delbracio. A survey on diffusion models for inverse problems, 2024

work page 2024

[10] [10]

Imagenet: A large- scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

work page 2009

[11] [11]

Diffusion models beat gans on image synthesis

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021. 10

work page 2021

[12] [12]

Carles Domingo-Enrich, Michal Drozdzal, Brian Karrer, and Ricky T. Q. Chen. Adjoint matching: Fine-tuning flow and diffusion generative models with memoryless stochastic optimal control. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[13] [13]

Noise hypernetworks: Amortizing test-time compute in diffusion models

Luca Eyring, Shyamgopal Karthik, Alexey Dosovitskiy, Nataniel Ruiz, and Zeynep Akata. Noise hypernetworks: Amortizing test-time compute in diffusion models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025

[14] [14]

Reno: Enhancing one-step text-to-image models through reward-based noise optimization.Advances in Neural Information Processing Systems, 37:125487–125519, 2024

Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, and Zeynep Akata. Reno: Enhancing one-step text-to-image models through reward-based noise optimization.Advances in Neural Information Processing Systems, 37:125487–125519, 2024

work page 2024

[15] [15]

Optimizing ddpm sampling with shortcut fine-tuning

Ying Fan and Kangwook Lee. Optimizing ddpm sampling with shortcut fine-tuning. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023

work page 2023

[16] [16]

Mean flows for one-step generative modeling

Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025

[17] [17]

Calibrated test-time guidance for bayesian inference, 2026

Daniel Geyfman, Felix Draxler, Jan Groeneveld, Hyunsoo Lee, Theofanis Karaletsos, and Stephan Mandt. Calibrated test-time guidance for bayesian inference, 2026

work page 2026

[18] [18]

Manifold preserving guided diffusion

Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J Zico Kolter, Ruslan Salakhutdinov, and Stefano Ermon. Manifold preserving guided diffusion. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[19] [19]

Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017

work page 2017

[20] [20]

Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

work page 2020

[21] [21]

Symbolic music generation with non-differentiable rule guided diffusion

Yujia Huang, Adishree Ghatare, Yuanzhe Liu, Ziniu Hu, Qinsheng Zhang, Chandramouli Shama Sastry, Siddharth Gururani, Sageev Oore, and Yisong Yue. Symbolic music generation with non-differentiable rule guided diffusion. InICML, 2024

work page 2024

[22] [22]

Divide-and-conquer posterior sampling for denoising diffusion priors

Yazid Janati, Badr MOUFAD, Alain Oliviero Durmus, Eric Moulines, and Jimmy Olsson. Divide-and-conquer posterior sampling for denoising diffusion priors. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024

[23] [23]

Progressive growing of gans for im- proved quality, stability, and variation

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for im- proved quality, stability, and variation. InInternational Conference on Learning Representations, 2018

work page 2018

[24] [24]

Denoising diffusion restoration models.Advances in Neural Information Processing Systems, 35:23593–23606, 2022

Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. Denoising diffusion restoration models.Advances in Neural Information Processing Systems, 35:23593–23606, 2022

work page 2022

[25] [25]

Semi-amortized variational autoencoders

Yoon Kim, Sam Wiseman, Andrew Miller, David Sontag, and Alexander Rush. Semi-amortized variational autoencoders. InInternational Conference on Machine Learning, pages 2678–2687. PMLR, 2018

work page 2018

[26] [26]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017

work page 2017

[27] [27]

Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in neural information processing systems, 36:36652–36663, 2023

Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in neural information processing systems, 36:36652–36663, 2023

work page 2023

[28] [28]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InInternational Conference on Learning Representations, 2023. 11

work page 2023

[29] [29]

Inference-time scaling for diffusion models beyond scaling denoising steps, 2025

Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, and Saining Xie. Inference-time scaling for diffusion models beyond scaling denoising steps, 2025

work page 2025

[30] [30]

A variational perspective on solving inverse problems with diffusion models

Morteza Mardani, Jiaming Song, Jan Kautz, and Arash Vahdat. A variational perspective on solving inverse problems with diffusion models. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[31] [31]

Iterative amortized inference

Joe Marino, Yisong Yue, and Stephan Mandt. Iterative amortized inference. InInternational Conference on Machine Learning, pages 3403–3412. PMLR, 2018

work page 2018

[32] [32]

Variational control for guidance in diffusion models

Kushagra Pandey, Farrin Marouf Sofian, Felix Draxler, Theofanis Karaletsos, and Stephan Mandt. Variational control for guidance in diffusion models. InF orty-second International Conference on Machine Learning, 2025

work page 2025

[33] [33]

Fast samplers for inverse problems in iterative refinement models

Kushagra Pandey, Ruihan Yang, and Stephan Mandt. Fast samplers for inverse problems in iterative refinement models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024

[34] [34]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4195–4205, October 2023

work page 2023

[35] [35]

Muckley, Ricky T

Ashwini Pokle, Matthew J. Muckley, Ricky T. Q. Chen, and Brian Karrer. Training-free linear image inverses via flows.Transactions on Machine Learning Research, 2024

work page 2024

[36] [36]

Direct preference optimization: Your language model is secretly a reward model

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in neural information processing systems, 36:53728–53741, 2023

work page 2023

[37] [37]

Hierarchical variational models

Rajesh Ranganath, Dustin Tran, and David Blei. Hierarchical variational models. In Maria Flo- rina Balcan and Kilian Q. Weinberger, editors,Proceedings of The 33rd International Confer- ence on Machine Learning, volume 48 ofProceedings of Machine Learning Research, pages 324–333, New York, New York, USA, 20–22 Jun 2016. PMLR

work page 2016

[38] [38]

RB-modulation: Training-free stylization using reference-based modulation

Litu Rout, Yujia Chen, Nataniel Ruiz, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkot- tai, and Wen-Sheng Chu. RB-modulation: Training-free stylization using reference-based modulation. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[39] [39]

Progressive distillation for fast sampling of diffusion models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InInternational Conference on Learning Representations, 2022

work page 2022

[40] [40]

A general framework for inference-time scaling and steering of diffusion models

Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, and Rajesh Ranganath. A general framework for inference-time scaling and steering of diffusion models. InF orty-second International Conference on Machine Learning, 2025

work page 2025

[41] [41]

Deep unsuper- vised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InInternational Conference on Machine Learning, pages 2256–2265. PMLR, 2015

work page 2015

[42] [42]

Pseudoinverse-guided diffusion models for inverse problems

Jiaming Song, Arash Vahdat, Morteza Mardani, and Jan Kautz. Pseudoinverse-guided diffusion models for inverse problems. InInternational Conference on Learning Representations, 2023

work page 2023

[43] [43]

Consistency models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In Proceedings of the 40th International Conference on Machine Learning, pages 32211–32252, 2023

work page 2023

[44] [44]

Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond.arXiv preprint arXiv:2403.06279, 2024

Wenpin Tang and Fuzhong Zhou. Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond.arXiv preprint arXiv:2403.06279, 2024

work page arXiv 2024

[45] [45]

Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review, 2024

Masatoshi Uehara, Yulai Zhao, Tommaso Biancalani, and Sergey Levine. Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review, 2024. 12

work page 2024

[46] [46]

Fine- tuning of continuous-time diffusion models as entropy-regularized control.arXiv preprint arXiv:2402.15194, 2024

Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Tommaso Biancalani, and Sergey Levine. Fine- tuning of continuous-time diffusion models as entropy-regularized control.arXiv preprint arXiv:2402.15194, 2024

work page arXiv 2024

[47] [47]

Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review, 2025

Masatoshi Uehara, Yulai Zhao, Chenyu Wang, Xiner Li, Aviv Regev, Sergey Levine, and Tom- maso Biancalani. Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review, 2025

work page 2025

[48] [48]

Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models

Siddarth Venkatraman, Mohsin Hasan, Minsu Kim, Luca Scimeca, Marcin Sendera, Yoshua Bengio, Glen Berseth, and Nikolay Malkin. Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models. In Aarti Singh, Maryam Fazel, Daniel Hsu, Simon Lacoste-Julien, Felix Berkenkamp, Tegan Maharaj, Kiri Wagstaff, and Jerry Zhu, edi-...

work page 2025

[49] [49]

DMPlug: A plug-in method for solving inverse problems with diffusion models

Hengkang Wang, Xu Zhang, Taihui Li, Yuxiang Wan, Tiancong Chen, and Ju Sun. DMPlug: A plug-in method for solving inverse problems with diffusion models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024

[50] [50]

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, and Hongsheng Li. Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis.arXiv preprint arXiv:2306.09341, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[51] [51]

Imagereward: Learning and evaluating human preferences for text-to-image generation

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: Learning and evaluating human preferences for text-to-image generation. Advances in Neural Information Processing Systems, 36:15903–15935, 2023

work page 2023

[52] [52]

One-step diffusion with distribution matching distillation

Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6613–6623, 2024

work page 2024

[53] [53]

Freedom: Training- free energy-guided conditional diffusion model

Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, and Jian Zhang. Freedom: Training- free energy-guided conditional diffusion model. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 23174–23184, October 2023

work page 2023

[54] [54]

Improving diffusion inverse problem solving with decoupled noise annealing

Bingliang Zhang, Wenda Chu, Julius Berner, Chenlin Meng, Anima Anandkumar, and Yang Song. Improving diffusion inverse problem solving with decoupled noise annealing. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 20895–20905, 2025

work page 2025

[55] [55]

Advances in variational inference.IEEE transactions on pattern analysis and machine intelligence, 41(8):2008–2026, 2018

Cheng Zhang, Judith Bütepage, Hedvig Kjellström, and Stephan Mandt. Advances in variational inference.IEEE transactions on pattern analysis and machine intelligence, 41(8):2008–2026, 2018

work page 2008

[56] [56]

The unrea- sonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018

work page 2018

[57] [57]

Inversebench: Benchmarking plug-and-play diffusion priors for inverse problems in physical sciences

Hongkai Zheng, Wenda Chu, Bingliang Zhang, Zihui Wu, Austin Wang, Berthy Feng, Caifeng Zou, Yu Sun, Nikola Borislavov Kovachki, Zachary E Ross, Katherine Bouman, and Yisong Yue. Inversebench: Benchmarking plug-and-play diffusion priors for inverse problems in physical sciences. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[58] [58]

Inductive moment matching

Linqi Zhou, Stefano Ermon, and Jiaming Song. Inductive moment matching. InF orty-second International Conference on Machine Learning, 2025

work page 2025

[59] [59]

Fine-Tuning Language Models from Human Preferences

Daniel M Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019. 13 A Hierarchical Variational Policies Here we present the proof of the bound in Eq. 7 in the main text. Proof overview.We derive a variation...

work page internal anchor Pith review Pith/arXiv arXiv 1909

[60] [60]

We write the standard ELBO, decomposing it into an expected log-joint (energy) term and an entropy term

work page

[61] [61]

We lower-bound the entropy by introducing a factored approximate posterior ¯r(u1:T | x0:T ,y)over the controls, exploiting the non-negativity of the KL divergence

work page

[62] [62]

We substitute the factored parameterizations of both the generative and variational processes to arrive at a per-timestep bound. Proof.Recall the generative process, p(x0:T ,y) =p(x T ) " TY t=1 p(xt−1 |x t) # p(y|x 0).(12) We introduce a variational distribution q(x0:T |y) and apply Jensen’s inequality to obtain the standard ELBO, logp(y) = log Z p(x0:T ...

work page

[63] [63]

C-ΠGDM [33].For all tasks and datasets, we fix the number of diffusion steps to 10 using the noise conditioned C-ΠGDM sampler

DDRM can only be applied to linear inverse problems. C-ΠGDM [33].For all tasks and datasets, we fix the number of diffusion steps to 10 using the noise conditioned C-ΠGDM sampler. Here w and τ represent the hyperparameters for projection to a conjugate space while τ represents the start-time for reverse diffusion sampling. We find that C-ΠGDM is unstable ...

work page

[64] [64]

Limitations

We set DDIM η= 0.5 and the scale parameter to 20.0, which we found to work best across tasks. RED-Diff [30].For all tasks and datasets, we reduce the number of steps for RED-Diff to 10. We use the DDIM sampler with η= 0.0 . We highlight other important hyperparameters, such as the learning rate and gradient-term weight, in Table 5. NDTM.We reduce the numb...

work page

[65] [65]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page