Hierarchical Variational Policies for Reward-Guided Diffusion
Pith reviewed 2026-05-22 08:44 UTC · model grok-4.3
The pith
A hierarchical variational stochastic policy amortizes test-time guidance for diffusion models, enabling fast reward-aligned sampling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Formulating test-time adaptation as a hierarchical variational model with an amortized stochastic policy yields a fully amortized sampler that achieves a strong quality-speed tradeoff, matching or exceeding test-time scaling baselines with significantly less compute, such as better perceptual quality and more than 5x faster inference on 4x super-resolution.
What carries the argument
The hierarchical variational model that amortizes control into a lightweight yet expressive stochastic policy for per-step guidance in diffusion sampling.
If this is right
- The method supports efficient few-step diffusion sampling while preserving sample quality.
- It can be extended to a semi-amortized setting combining cheap proposals with limited test-time optimization for state-of-the-art perceptual quality.
- The fully amortized sampler requires significantly less compute than recent baselines.
- Applications to challenging inverse problems benefit from improved quality-speed tradeoffs.
Where Pith is reading between the lines
- This amortization could make diffusion models viable for interactive or real-time applications where inference speed is critical.
- The structured control from the policy might inspire similar approaches in other generative modeling frameworks.
- Exploring the policy's expressiveness could allow even fewer steps without quality loss.
Load-bearing premise
The lightweight stochastic policy can provide structured per-step control sufficient to maintain sample quality when using large step sizes for fast few-step diffusion sampling.
What would settle it
Running the method on 4x super-resolution with very large step sizes and observing that perceptual quality falls below the best-performing baseline would falsify the claim of maintaining quality through the policy.
Figures
read the original abstract
Adapting pretrained diffusion models to downstream objectives such as inverse problems often requires expensive test-time guidance or optimization. We propose a principled framework for generating high-quality reward-aligned samples at substantially reduced inference cost. Our approach formulates test-time adaptation as a hierarchical variational model, where control is amortized into a lightweight yet expressive stochastic policy. This formulation naturally supports few-step diffusion sampling: large step sizes enable fast inference, while the learned policy maintains sample quality by providing structured per-step control. The resulting fully amortized sampler achieves a strong quality--speed tradeoff, matching or exceeding recent test-time scaling baselines while requiring significantly less compute. For example, on 4x super-resolution, our method achieves better perceptual quality with more than 5x faster inference compared to the best-performing baseline. We further extend our approach to a semi-amortized regime that combines cheap amortized proposals with limited test-time optimization, achieving state-of-the-art perceptual quality across several challenging inverse problems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a hierarchical variational framework to amortize test-time adaptation of pretrained diffusion models for reward-guided tasks such as inverse problems. Control is learned into a lightweight stochastic policy that supports few-step sampling, where large step sizes enable fast inference while the policy supplies structured per-step corrections to preserve sample quality. The fully amortized sampler is claimed to match or exceed test-time scaling baselines on perceptual quality with >5x lower compute (e.g., 4x super-resolution), with an optional semi-amortized extension that adds limited test-time optimization to reach state-of-the-art results.
Significance. If the central empirical claims hold after addressing the noted concerns, the work would provide a practical and principled route to efficient reward-aligned diffusion sampling, reducing reliance on expensive per-sample optimization. The variational amortization of per-step control is a natural fit for the few-step regime and could influence downstream applications in image restoration and conditional generation.
major comments (2)
- [Abstract] Abstract: The headline quality-speed tradeoff (better perceptual quality with >5x faster inference on 4x super-resolution) rests on the assumption that the lightweight stochastic policy supplies sufficient structured per-step control to offset error accumulation at large step sizes. The manuscript should include a dedicated ablation (e.g., in the experiments section) that isolates the policy's contribution versus plain few-step sampling without the learned policy, together with quantitative measures of how the variational objective aligns the policy to the reward under large Δt.
- [Methods] Methods (variational objective and policy parameterization): The claim that the hierarchical variational model yields an expressive yet lightweight policy capable of maintaining quality requires explicit discussion of the tightness of the ELBO or surrogate reward used for training. Without this, it remains unclear whether the policy can reliably counteract the increased variance of the reverse process in the few-step regime, as raised by the stress-test note.
minor comments (2)
- [Experiments] Experiments: Tables reporting perceptual metrics on super-resolution and inverse problems should include standard deviations or confidence intervals across multiple runs to substantiate the claimed improvements over baselines.
- [Preliminaries] Notation: The distinction between the hierarchical policy parameters and the diffusion timestep conditioning should be clarified in the first appearance of the variational objective to avoid ambiguity for readers.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of the potential impact of our hierarchical variational framework for amortizing test-time guidance in diffusion models. We address each major comment below and have incorporated revisions to strengthen the empirical and methodological presentation.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline quality-speed tradeoff (better perceptual quality with >5x faster inference on 4x super-resolution) rests on the assumption that the lightweight stochastic policy supplies sufficient structured per-step control to offset error accumulation at large step sizes. The manuscript should include a dedicated ablation (e.g., in the experiments section) that isolates the policy's contribution versus plain few-step sampling without the learned policy, together with quantitative measures of how the variational objective aligns the policy to the reward under large Δt.
Authors: We agree that isolating the policy's contribution through a dedicated ablation would strengthen the claims. In the revised manuscript, we have added a new ablation study in the experiments section comparing the full hierarchical variational policy against plain few-step sampling (standard DDPM/DDIM at identical large step sizes, without the learned policy). We report perceptual metrics including FID and LPIPS, along with quantitative measures of variational alignment such as expected reward under the policy versus the unguided baseline and estimates of reverse-process variance at large Δt. These results show that the policy supplies structured corrections that reduce error accumulation. revision: yes
-
Referee: [Methods] Methods (variational objective and policy parameterization): The claim that the hierarchical variational model yields an expressive yet lightweight policy capable of maintaining quality requires explicit discussion of the tightness of the ELBO or surrogate reward used for training. Without this, it remains unclear whether the policy can reliably counteract the increased variance of the reverse process in the few-step regime, as raised by the stress-test note.
Authors: We appreciate the suggestion to clarify the ELBO tightness. The revised methods section now includes an expanded discussion of the ELBO gap in the hierarchical variational setting, noting that the per-step policy amortization yields a tighter effective bound than standard variational inference for this task. We report empirical training diagnostics showing close alignment between the surrogate reward and the true objective, and we connect this to the stress-test results to demonstrate that the stochastic policy's expressiveness reliably offsets increased variance in the few-step regime. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper formulates test-time adaptation as a hierarchical variational model and learns a stochastic policy to amortize control for reward-aligned diffusion sampling. The central quality-speed tradeoff claim rests on the empirical performance of this learned policy under the variational objective, not on any reduction of outputs to inputs by construction. No self-definitional steps, fitted parameters renamed as predictions, or load-bearing self-citations that collapse the argument are present. The framework uses standard variational inference and amortization techniques whose justification is independent of the target results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Variational approximation to the hierarchical policy can be optimized to provide effective structured control.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce a unified framework that recasts test-time guidance in diffusion models as inference over hierarchical variational policies... q(x0:T,u1:T|y) = q(xT|y) ∏ q(ut|xt,u>t,y) q(xt−1|xt,ut)
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The resulting fully amortized sampler achieves a strong quality–speed tradeoff... on 4× super-resolution, our method achieves better perceptual quality with more than 5× faster inference
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Stochastic Interpolants: A Unifying Framework for Flows and Diffusions
Michael S Albergo, Nicholas M Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions.arXiv preprint arXiv:2303.08797, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Universal guidance for diffusion models
Arpit Bansal, Hong-Min Chu, Avi Schwarzschild, Roni Sengupta, Micah Goldblum, Jonas Geip- ing, and Tom Goldstein. Universal guidance for diffusion models. InThe Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[3]
Training Diffusion Models with Reinforcement Learning
Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning.arXiv preprint arXiv:2305.13301, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[4]
Blei, Alp Kucukelbir, and Jon D
David M. Blei, Alp Kucukelbir, and Jon D. McAuliffe. Variational inference: A review for statisticians.Journal of the American Statistical Association, 112(518):859–877, April 2017
work page 2017
-
[5]
Flow map matching with stochastic interpolants: A mathematical framework for consistency models
Nicholas Matthew Boffi, Michael Samuel Albergo, and Eric Vanden-Eijnden. Flow map matching with stochastic interpolants: A mathematical framework for consistency models. Transactions on Machine Learning Research, 2025
work page 2025
-
[6]
Benjamin Boys, Mark Girolami, Jakiw Pidstrigach, Sebastian Reich, Alan Mosca, and Omer Deniz Akyildiz. Tweedie moment projected diffusions for inverse problems.Transactions on Machine Learning Research, 2024. Featured Certification
work page 2024
-
[7]
Diffusion posterior sampling for general noisy inverse problems
Hyungjin Chung, Jeongsol Kim, Michael Thompson Mccann, Marc Louis Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. InThe Eleventh Interna- tional Conference on Learning Representations, 2022
work page 2022
-
[8]
Kevin Clark, Paul Vicol, Kevin Swersky, and David J. Fleet. Directly fine-tuning diffusion models on differentiable rewards. InThe Twelfth International Conference on Learning Repre- sentations, 2024
work page 2024
-
[9]
Dimakis, and Mauricio Delbracio
Giannis Daras, Hyungjin Chung, Chieh-Hsin Lai, Yuki Mitsufuji, Jong Chul Ye, Peyman Milanfar, Alexandros G. Dimakis, and Mauricio Delbracio. A survey on diffusion models for inverse problems, 2024
work page 2024
-
[10]
Imagenet: A large- scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009
work page 2009
-
[11]
Diffusion models beat gans on image synthesis
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021. 10
work page 2021
-
[12]
Carles Domingo-Enrich, Michal Drozdzal, Brian Karrer, and Ricky T. Q. Chen. Adjoint matching: Fine-tuning flow and diffusion generative models with memoryless stochastic optimal control. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[13]
Noise hypernetworks: Amortizing test-time compute in diffusion models
Luca Eyring, Shyamgopal Karthik, Alexey Dosovitskiy, Nataniel Ruiz, and Zeynep Akata. Noise hypernetworks: Amortizing test-time compute in diffusion models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[14]
Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, and Zeynep Akata. Reno: Enhancing one-step text-to-image models through reward-based noise optimization.Advances in Neural Information Processing Systems, 37:125487–125519, 2024
work page 2024
-
[15]
Optimizing ddpm sampling with shortcut fine-tuning
Ying Fan and Kangwook Lee. Optimizing ddpm sampling with shortcut fine-tuning. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023
work page 2023
-
[16]
Mean flows for one-step generative modeling
Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[17]
Calibrated test-time guidance for bayesian inference, 2026
Daniel Geyfman, Felix Draxler, Jan Groeneveld, Hyunsoo Lee, Theofanis Karaletsos, and Stephan Mandt. Calibrated test-time guidance for bayesian inference, 2026
work page 2026
-
[18]
Manifold preserving guided diffusion
Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J Zico Kolter, Ruslan Salakhutdinov, and Stefano Ermon. Manifold preserving guided diffusion. InThe Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[19]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017
work page 2017
-
[20]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020
work page 2020
-
[21]
Symbolic music generation with non-differentiable rule guided diffusion
Yujia Huang, Adishree Ghatare, Yuanzhe Liu, Ziniu Hu, Qinsheng Zhang, Chandramouli Shama Sastry, Siddharth Gururani, Sageev Oore, and Yisong Yue. Symbolic music generation with non-differentiable rule guided diffusion. InICML, 2024
work page 2024
-
[22]
Divide-and-conquer posterior sampling for denoising diffusion priors
Yazid Janati, Badr MOUFAD, Alain Oliviero Durmus, Eric Moulines, and Jimmy Olsson. Divide-and-conquer posterior sampling for denoising diffusion priors. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
work page 2024
-
[23]
Progressive growing of gans for im- proved quality, stability, and variation
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for im- proved quality, stability, and variation. InInternational Conference on Learning Representations, 2018
work page 2018
-
[24]
Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. Denoising diffusion restoration models.Advances in Neural Information Processing Systems, 35:23593–23606, 2022
work page 2022
-
[25]
Semi-amortized variational autoencoders
Yoon Kim, Sam Wiseman, Andrew Miller, David Sontag, and Alexander Rush. Semi-amortized variational autoencoders. InInternational Conference on Machine Learning, pages 2678–2687. PMLR, 2018
work page 2018
-
[26]
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017
work page 2017
-
[27]
Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in neural information processing systems, 36:36652–36663, 2023
work page 2023
-
[28]
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InInternational Conference on Learning Representations, 2023. 11
work page 2023
-
[29]
Inference-time scaling for diffusion models beyond scaling denoising steps, 2025
Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, and Saining Xie. Inference-time scaling for diffusion models beyond scaling denoising steps, 2025
work page 2025
-
[30]
A variational perspective on solving inverse problems with diffusion models
Morteza Mardani, Jiaming Song, Jan Kautz, and Arash Vahdat. A variational perspective on solving inverse problems with diffusion models. InThe Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[31]
Joe Marino, Yisong Yue, and Stephan Mandt. Iterative amortized inference. InInternational Conference on Machine Learning, pages 3403–3412. PMLR, 2018
work page 2018
-
[32]
Variational control for guidance in diffusion models
Kushagra Pandey, Farrin Marouf Sofian, Felix Draxler, Theofanis Karaletsos, and Stephan Mandt. Variational control for guidance in diffusion models. InF orty-second International Conference on Machine Learning, 2025
work page 2025
-
[33]
Fast samplers for inverse problems in iterative refinement models
Kushagra Pandey, Ruihan Yang, and Stephan Mandt. Fast samplers for inverse problems in iterative refinement models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
work page 2024
-
[34]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4195–4205, October 2023
work page 2023
-
[35]
Ashwini Pokle, Matthew J. Muckley, Ricky T. Q. Chen, and Brian Karrer. Training-free linear image inverses via flows.Transactions on Machine Learning Research, 2024
work page 2024
-
[36]
Direct preference optimization: Your language model is secretly a reward model
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in neural information processing systems, 36:53728–53741, 2023
work page 2023
-
[37]
Hierarchical variational models
Rajesh Ranganath, Dustin Tran, and David Blei. Hierarchical variational models. In Maria Flo- rina Balcan and Kilian Q. Weinberger, editors,Proceedings of The 33rd International Confer- ence on Machine Learning, volume 48 ofProceedings of Machine Learning Research, pages 324–333, New York, New York, USA, 20–22 Jun 2016. PMLR
work page 2016
-
[38]
RB-modulation: Training-free stylization using reference-based modulation
Litu Rout, Yujia Chen, Nataniel Ruiz, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkot- tai, and Wen-Sheng Chu. RB-modulation: Training-free stylization using reference-based modulation. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[39]
Progressive distillation for fast sampling of diffusion models
Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InInternational Conference on Learning Representations, 2022
work page 2022
-
[40]
A general framework for inference-time scaling and steering of diffusion models
Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, and Rajesh Ranganath. A general framework for inference-time scaling and steering of diffusion models. InF orty-second International Conference on Machine Learning, 2025
work page 2025
-
[41]
Deep unsuper- vised learning using nonequilibrium thermodynamics
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InInternational Conference on Machine Learning, pages 2256–2265. PMLR, 2015
work page 2015
-
[42]
Pseudoinverse-guided diffusion models for inverse problems
Jiaming Song, Arash Vahdat, Morteza Mardani, and Jan Kautz. Pseudoinverse-guided diffusion models for inverse problems. InInternational Conference on Learning Representations, 2023
work page 2023
-
[43]
Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In Proceedings of the 40th International Conference on Machine Learning, pages 32211–32252, 2023
work page 2023
-
[44]
Wenpin Tang and Fuzhong Zhou. Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond.arXiv preprint arXiv:2403.06279, 2024
-
[45]
Masatoshi Uehara, Yulai Zhao, Tommaso Biancalani, and Sergey Levine. Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review, 2024. 12
work page 2024
-
[46]
Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Tommaso Biancalani, and Sergey Levine. Fine- tuning of continuous-time diffusion models as entropy-regularized control.arXiv preprint arXiv:2402.15194, 2024
-
[47]
Masatoshi Uehara, Yulai Zhao, Chenyu Wang, Xiner Li, Aviv Regev, Sergey Levine, and Tom- maso Biancalani. Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review, 2025
work page 2025
-
[48]
Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models
Siddarth Venkatraman, Mohsin Hasan, Minsu Kim, Luca Scimeca, Marcin Sendera, Yoshua Bengio, Glen Berseth, and Nikolay Malkin. Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models. In Aarti Singh, Maryam Fazel, Daniel Hsu, Simon Lacoste-Julien, Felix Berkenkamp, Tegan Maharaj, Kiri Wagstaff, and Jerry Zhu, edi-...
work page 2025
-
[49]
DMPlug: A plug-in method for solving inverse problems with diffusion models
Hengkang Wang, Xu Zhang, Taihui Li, Yuxiang Wan, Tiancong Chen, and Ju Sun. DMPlug: A plug-in method for solving inverse problems with diffusion models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
work page 2024
-
[50]
Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, and Hongsheng Li. Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis.arXiv preprint arXiv:2306.09341, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[51]
Imagereward: Learning and evaluating human preferences for text-to-image generation
Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: Learning and evaluating human preferences for text-to-image generation. Advances in Neural Information Processing Systems, 36:15903–15935, 2023
work page 2023
-
[52]
One-step diffusion with distribution matching distillation
Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6613–6623, 2024
work page 2024
-
[53]
Freedom: Training- free energy-guided conditional diffusion model
Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, and Jian Zhang. Freedom: Training- free energy-guided conditional diffusion model. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 23174–23184, October 2023
work page 2023
-
[54]
Improving diffusion inverse problem solving with decoupled noise annealing
Bingliang Zhang, Wenda Chu, Julius Berner, Chenlin Meng, Anima Anandkumar, and Yang Song. Improving diffusion inverse problem solving with decoupled noise annealing. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 20895–20905, 2025
work page 2025
-
[55]
Cheng Zhang, Judith Bütepage, Hedvig Kjellström, and Stephan Mandt. Advances in variational inference.IEEE transactions on pattern analysis and machine intelligence, 41(8):2008–2026, 2018
work page 2008
-
[56]
The unrea- sonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018
work page 2018
-
[57]
Inversebench: Benchmarking plug-and-play diffusion priors for inverse problems in physical sciences
Hongkai Zheng, Wenda Chu, Bingliang Zhang, Zihui Wu, Austin Wang, Berthy Feng, Caifeng Zou, Yu Sun, Nikola Borislavov Kovachki, Zachary E Ross, Katherine Bouman, and Yisong Yue. Inversebench: Benchmarking plug-and-play diffusion priors for inverse problems in physical sciences. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[58]
Linqi Zhou, Stefano Ermon, and Jiaming Song. Inductive moment matching. InF orty-second International Conference on Machine Learning, 2025
work page 2025
-
[59]
Fine-Tuning Language Models from Human Preferences
Daniel M Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019. 13 A Hierarchical Variational Policies Here we present the proof of the bound in Eq. 7 in the main text. Proof overview.We derive a variation...
work page internal anchor Pith review Pith/arXiv arXiv 1909
-
[60]
We write the standard ELBO, decomposing it into an expected log-joint (energy) term and an entropy term
-
[61]
We lower-bound the entropy by introducing a factored approximate posterior ¯r(u1:T | x0:T ,y)over the controls, exploiting the non-negativity of the KL divergence
-
[62]
We substitute the factored parameterizations of both the generative and variational processes to arrive at a per-timestep bound. Proof.Recall the generative process, p(x0:T ,y) =p(x T ) " TY t=1 p(xt−1 |x t) # p(y|x 0).(12) We introduce a variational distribution q(x0:T |y) and apply Jensen’s inequality to obtain the standard ELBO, logp(y) = log Z p(x0:T ...
-
[63]
DDRM can only be applied to linear inverse problems. C-ΠGDM [33].For all tasks and datasets, we fix the number of diffusion steps to 10 using the noise conditioned C-ΠGDM sampler. Here w and τ represent the hyperparameters for projection to a conjugate space while τ represents the start-time for reverse diffusion sampling. We find that C-ΠGDM is unstable ...
-
[64]
We set DDIM η= 0.5 and the scale parameter to 20.0, which we found to work best across tasks. RED-Diff [30].For all tasks and datasets, we reduce the number of steps for RED-Diff to 10. We use the DDIM sampler with η= 0.0 . We highlight other important hyperparameters, such as the learning rate and gradient-term weight, in Table 5. NDTM.We reduce the numb...
-
[65]
Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.