arxiv: 2605.09302 · v1 · submitted 2026-05-10 · 💻 cs.LG · cs.CV

Recognition: no theorem link

Discrete Langevin-Inspired Posterior Sampling

Chaitanya Amballa , Sattwik Basu , Jorge Van\v{c}o Sampedro , Romit Roy Choudhury

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:16 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords discrete diffusionposterior samplinginverse problemslangevin dynamicsimage restorationdiscrete state spacegenerative priors

0 comments

The pith

A gradient-guided sampler selects discrete state changes to approximate posteriors using discrete diffusion priors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method for posterior sampling in discrete state spaces where discrete diffusion models serve as generative priors for inverse problems. Existing discrete samplers typically rely on continuous relaxations, Gibbs updates, or corruption-specific mechanisms that restrict scalability or generality. The new approach adapts Langevin-style dynamics to operate fully in the discrete domain by using gradient signals from the prior to choose promising token transitions. This design supports parallel updates over all dimensions and works with any training regime for the diffusion prior, such as masked or uniform-state variants. Experiments on image restoration and spatial mapping tasks show gains over prior discrete methods and parity with strong continuous solvers, indicating that fully discrete gradient-informed sampling offers a viable path for inverse problems on discrete representations.

Core claim

The central claim is that a Discrete Langevin-Inspired Posterior Sampler can approximate the posterior over discrete states by employing gradients from the diffusion prior to identify high-value discrete moves, without leaving the discrete state space or depending on the prior's training paradigm. This enables efficient parallel sampling and delivers competitive performance on linear, nonlinear, and blind inverse problems across image and mapping benchmarks.

What carries the argument

ΔLPS, the Discrete Langevin-Inspired Posterior Sampler that uses gradient information from the discrete diffusion prior to select promising discrete state transitions while staying inside the discrete domain.

If this is right

Enables parallel updates across all token dimensions for faster sampling.
Remains agnostic to the discrete diffusion prior's training paradigm, covering masked and uniform-state cases.
Outperforms recent discrete diffusion posterior samplers on image restoration tasks.
Achieves results competitive with strong continuous diffusion inverse solvers on linear, nonlinear, and blind problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could extend to other discrete inverse problems such as text inpainting or combinatorial structure recovery where continuous relaxations distort the output space.
Avoiding continuous variables may help preserve exact discrete constraints in applications like symbolic reasoning or code generation.
Integration with accelerated sampling schedules or learned score functions could further improve speed without leaving the discrete setting.

Load-bearing premise

Gradient signals from the discrete diffusion prior can be used to pick discrete state transitions that approximate the true posterior without large bias or problem-specific adjustments.

What would settle it

If samples produced by the method show systematically higher reconstruction error or lower posterior likelihood than strong baselines on a standard inverse problem benchmark, the gradient-based discrete selection would be shown insufficient.

Figures

Figures reproduced from arXiv: 2605.09302 by Chaitanya Amballa, Jorge Van\v{c}o Sampedro, Romit Roy Choudhury, Sattwik Basu.

**Figure 1.** Figure 1: This paper introduces ∆LPS, a training-free approach for discrete posterior sampling using discrete diffusion priors. Shown here are results from various image restoration tasks across the MNIST, CIFAR, FFHQ datasets along with and spatial mapping. Project website: https://discretelps.github.io method, is specially designed around masked-diffusion models and anchoring. These limitations motivate posterior … view at source ↗

**Figure 2.** Figure 2: Comparison on FFHQ across multiple inverse problems, including HDR reconstruction, [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative results showing ∆LPS’s performance across various settings. On top, we show motion deblurring on FFHQ, outperforming the discrete APS baseline. In the middle, we evaluate on three different tasks on MNIST. The last row shows performance on the blind inverse setting, where the floorplan is estimated from walking trajectories [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative results for the HDR reconstruction task on FFHQ. Each example contains the ground truth image, corrupted measurement, and reconstruction from ∆LPS [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 6.** Figure 6: Qualitative results for random inpainting on FFHQ. Even under large missing regions, ∆LPS generates plausible and visually faithful completions [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 8.** Figure 8: Qualitative results for 4× superresolution on FFHQ. ∆LPS recovers fine facial details and realistic textures from low-resolution measurements [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

read the original abstract

We study posterior sampling for inverse problems in discrete state spaces using discrete diffusion models as generative priors. While continuous diffusion models have become widely used for inverse problems, their discrete counterparts remain comparatively underexplored. Existing discrete posterior samplers often rely on continuous relaxations of discrete variables, Gibbs-style updates, or mechanisms specialized to particular corruption processes, which can limit scalability or generality. We propose $\Delta$LPS, a Discrete Langevin-Inspired Posterior Sampler that uses gradient information to identify promising discrete moves without leaving the discrete state space. The resulting approach enables efficient parallel updates across all token dimensions and is agnostic to the training paradigm of the discrete diffusion prior, including masked and uniform-state diffusion. We evaluate our method on image restoration tasks across MNIST, CIFAR, and FFHQ, as well as spatial mapping, covering linear, nonlinear, and blind inverse problems. Across these settings, we improve over recent discrete diffusion posterior samplers and are competitive with strong continuous diffusion-based inverse solvers. Our results suggest that fully discrete, gradient-informed posterior samplers offer a scalable and general path toward solving inverse problems over discrete representations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ΔLPS gives a workable heuristic for discrete posterior sampling that beats some baselines on image tasks, but the abstract leaves the reversibility question open.

read the letter

The main point is a gradient-driven way to pick discrete token flips for posterior sampling while staying inside the discrete domain. It runs parallel updates and does not tie itself to one diffusion training style. That combination looks new relative to the Gibbs or relaxation approaches mentioned in the abstract. On the positive side, the reported results on MNIST, CIFAR, FFHQ restoration and the spatial mapping tasks show clear gains over recent discrete samplers and stay competitive with continuous diffusion solvers. The parallel nature and training-agnostic claim are practical advantages if they hold up in the full experiments. The paper also ships concrete numbers across linear, nonlinear, and blind settings, which is more than many short abstracts manage. The soft spot is the missing link between the gradient selection rule and the target posterior. The stress-test note is on target: without a derivation that the move probabilities satisfy detailed balance or an equivalent condition for arbitrary corruption processes, the sampler could converge to something other than p(x|y). The abstract gives no such derivation and only heuristic motivation from continuous Langevin dynamics. That gap matters most when conditioning is strong or the score estimates are noisy. Experiments look better than prior discrete methods, but the lack of visible ablations on proposal variance or acceptance rates leaves open how much is method versus careful tuning for these datasets. This paper is aimed at people who already work with discrete diffusion models for inverse problems in vision or similar domains. A reader who needs a drop-in sampler that avoids continuous relaxations will find the empirical comparisons useful. It is not foundational, but the idea is concrete enough that a serious referee should check the full derivations, the exact proposal distribution, and whether the code reproduces the gains. I would send it to review rather than desk reject.

Referee Report

2 major / 3 minor

Summary. The manuscript proposes ΔLPS, a Discrete Langevin-Inspired Posterior Sampler for inverse problems in discrete state spaces that employs gradient information from a discrete diffusion prior to select promising token-level state transitions while remaining entirely in the discrete domain. The approach supports efficient parallel updates across all dimensions and is presented as agnostic to the prior's training paradigm (masked or uniform-state diffusion). Empirical evaluations on linear, nonlinear, and blind inverse problems across MNIST, CIFAR, FFHQ image restoration and spatial mapping tasks show improvements over prior discrete diffusion samplers and competitiveness with continuous diffusion-based solvers.

Significance. If the central claim that ΔLPS produces samples from the correct posterior holds, the work provides a useful advance for posterior sampling with discrete generative priors. It avoids continuous relaxations and problem-specific mechanisms, enabling scalable parallel sampling that could extend to other discrete domains. The reported competitiveness with strong continuous baselines on multiple datasets and problem types indicates practical relevance, and the generality across diffusion training paradigms is a positive feature.

major comments (2)

[Section 3.2, Algorithm 1] Section 3.2 and Algorithm 1: The gradient-based rule for proposing and accepting discrete state transitions is motivated by analogy to continuous Langevin dynamics but lacks a derivation or proof that the resulting Markov chain satisfies detailed balance (or an equivalent reversibility condition) with respect to the target posterior p(x|y) for arbitrary corruption processes. Without this, it is unclear whether the stationary distribution matches the desired posterior or deviates systematically, especially under strong conditioning or noisy score estimates.
[Section 4] Section 4, experimental setup: No analysis or bounds are provided on the bias introduced by the gradient approximation or the finite number of sampling steps relative to the true posterior; the empirical gains could therefore reflect optimization heuristics rather than accurate posterior sampling. This directly affects the interpretation of the reported improvements over baselines.

minor comments (3)

The abstract states that the method 'improves over recent discrete diffusion posterior samplers' but does not include any quantitative metrics or dataset-specific numbers; adding one or two key performance figures would strengthen the summary.
Notation for the discrete state space, corruption kernel, and gradient computation is introduced inline without a consolidated preliminary section, which can make the methods harder to follow on first reading.
Figure captions for the qualitative results on FFHQ and spatial mapping tasks would benefit from explicit mention of the conditioning signal y and the number of sampling steps used.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback and positive assessment of the practical relevance of ΔLPS. We address the two major comments point by point below.

read point-by-point responses

Referee: [Section 3.2, Algorithm 1] Section 3.2 and Algorithm 1: The gradient-based rule for proposing and accepting discrete state transitions is motivated by analogy to continuous Langevin dynamics but lacks a derivation or proof that the resulting Markov chain satisfies detailed balance (or an equivalent reversibility condition) with respect to the target posterior p(x|y) for arbitrary corruption processes. Without this, it is unclear whether the stationary distribution matches the desired posterior or deviates systematically, especially under strong conditioning or noisy score estimates.

Authors: ΔLPS is presented as a Langevin-inspired heuristic rather than an exact sampler. The transition rule uses the gradient of the log-posterior (combining the discrete diffusion score and the data likelihood term) to bias selection of discrete token changes toward higher-probability states while remaining fully discrete and supporting parallel updates. We do not claim or derive that the resulting chain satisfies detailed balance with respect to p(x|y) for arbitrary corruption processes; such a guarantee would require additional assumptions on the prior and corruption that are not generally available for masked or uniform discrete diffusion models. The method instead prioritizes computational efficiency and generality across training paradigms. Empirical results on image restoration and spatial mapping tasks show that the produced samples are competitive with continuous diffusion solvers and superior to prior discrete methods, indicating that the approximation is effective for the targeted applications. revision: no
Referee: [Section 4] Section 4, experimental setup: No analysis or bounds are provided on the bias introduced by the gradient approximation or the finite number of sampling steps relative to the true posterior; the empirical gains could therefore reflect optimization heuristics rather than accurate posterior sampling. This directly affects the interpretation of the reported improvements over baselines.

Authors: We did not provide theoretical bounds on approximation bias or finite-step error, as the validation strategy is empirical. Section 4 evaluates ΔLPS on linear, nonlinear, and blind inverse problems across MNIST, CIFAR, FFHQ, and spatial mapping tasks, demonstrating consistent outperformance over recent discrete diffusion samplers and competitiveness with strong continuous baselines. These results support the practical utility of the gradient-guided discrete updates even without exact posterior guarantees. While bounds would strengthen the theoretical interpretation, their absence does not change the reported empirical findings or the claim that fully discrete gradient-informed sampling offers a scalable alternative for discrete inverse problems. revision: no

standing simulated objections not resolved

Derivation or proof that the discrete Markov chain satisfies detailed balance with respect to the target posterior p(x|y) for arbitrary corruption processes.

Circularity Check

0 steps flagged

No circularity: new algorithmic construction with independent empirical validation

full rationale

The paper proposes ΔLPS as a novel discrete sampler that adapts gradient guidance from continuous Langevin dynamics to remain fully in discrete state space, enabling parallel token updates agnostic to the diffusion training paradigm. No equations or claims in the abstract reduce the sampler's validity, proposal distribution, or reported performance gains to a parameter fitted against the target posterior or to a self-citation chain. The method is presented as a heuristic-motivated construction whose correctness is assessed via empirical comparisons on MNIST, CIFAR, FFHQ, and spatial mapping tasks rather than by deriving the stationary distribution from the inputs by construction. This is the common case of an honest algorithmic contribution whose central claim does not collapse to tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger is necessarily incomplete; no explicit free parameters, new entities, or ad-hoc axioms are named.

axioms (1)

domain assumption Discrete diffusion models trained on clean data can serve as useful generative priors for posterior sampling in inverse problems.
This is the foundational premise for using the diffusion model inside the proposed sampler.

pith-pipeline@v0.9.0 · 5506 in / 1209 out tokens · 36900 ms · 2026-05-12T03:16:35.842206+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 5 internal anchors

[1]

Struc- tured denoising diffusion models in discrete state-spaces

Jacob Austin, Daniel Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg. Struc- tured denoising diffusion models in discrete state-spaces. InAdvances in Neural Information Processing Systems (NeurIPS), 2021

work page 2021
[2]

Contrastive diffusion guidance for spatial inverse problems

Sattwik Basu, Chaitanya Amballa, Zhongweiyang Xu, Jorge Vanˇco Sampedro, Srihari Nelaku- diti, and Romit Roy Choudhury. Contrastive diffusion guidance for spatial inverse problems. arXiv preprint arXiv:2509.26489, 2025

work page arXiv 2025
[3]

Split gibbs discrete diffusion posterior sampling

Wenda Chu, Zihui Wu, Yifan Chen, Yang Song, and Yisong Yue. Split gibbs discrete diffusion posterior sampling. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

work page
[4]

Diffusion Posterior Sampling for General Noisy Inverse Problems

Hyungjin Chung, Jeongsol Kim, Michael T Mccann, Marc L Klasky, and Jong Chul Ye. Diffu- sion posterior sampling for general noisy inverse problems.arXiv preprint arXiv:2209.14687, 2022

work page internal anchor Pith review arXiv 2022
[5]

Engl, Martin Hanke, and Andreas Neubauer.Regularization of Inverse Problems, volume 375 ofMathematics and Its Applications

Heinz W. Engl, Martin Hanke, and Andreas Neubauer.Regularization of Inverse Problems, volume 375 ofMathematics and Its Applications. Springer, Dordrecht, 1996

work page 1996
[6]

Scaling diffusion language models via adaptation from autoregressive models

Shansan Gong, Shivam Agarwal, Yizhe Zhang, Jiacheng Ye, Lin Zheng, Mukai Li, Chenxin An, Peilin Zhao, Wei Bi, Jiawei Han, et al. Scaling diffusion language models via adaptation from autoregressive models.arXiv preprint arXiv:2410.17891, 2024

work page arXiv 2024
[7]

Oops i took a gradient: Scalable sampling for discrete distributions

Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud, and Chris Maddison. Oops i took a gradient: Scalable sampling for discrete distributions. InInternational Conference on Machine Learning, pages 3831–3841. PMLR, 2021

work page 2021
[8]

A formal basis for the heuristic determination of minimum cost paths.IEEE transactions on Systems Science and Cybernetics, 4(2):100–107, 1968

Peter E Hart, Nils J Nilsson, and Bertram Raphael. A formal basis for the heuristic determination of minimum cost paths.IEEE transactions on Systems Science and Cybernetics, 4(2):100–107, 1968

work page 1968
[9]

Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

work page 2020
[10]

Introducing mercury: The first commercial diffusion-based language model

Inception Labs. Introducing mercury: The first commercial diffusion-based language model. https://www.inceptionlabs.ai/blog/introducing-mercury, 2025. Accessed: 2026- 05-02

work page 2025
[11]

A survey on contrastive self-supervised learning, 2021

Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee, and Fillia Makedon. A survey on contrastive self-supervised learning, 2021

work page 2021
[12]

Embed and emulate: Contrastive representations for simulation-based inference.arXiv preprint arXiv:2409.18402, 2024

Ruoxi Jiang, Peter Y Lu, and Rebecca Willett. Embed and emulate: Contrastive representations for simulation-based inference.arXiv preprint arXiv:2409.18402, 2024

work page arXiv 2024
[13]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019

work page 2019
[14]

Mercury: Ultra-fast language models based on diffusion.arXiv e-prints, pages arXiv–2506, 2025

Samar Khanna, Siddhant Kharbanda, Shufan Li, Harshit Varma, Eric Wang, Sawyer Birnbaum, Ziyang Luo, Yanis Miraoui, Akash Palrecha, Stefano Ermon, et al. Mercury: Ultra-fast language models based on diffusion.arXiv e-prints, pages arXiv–2506, 2025

work page 2025
[15]

Supervised contrastive learning

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 18661–18673. Curran Associates, Inc., 2020

work page 2020
[16]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017

work page 2017
[17]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. 10

work page 2009
[18]

Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 2002

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 2002

work page 2002
[19]

Bill Yuchen Lin, Ronan Le Bras, Kyle Richardson, Ashish Sabharwal, Radha Poovendran, Peter Clark, and Yejin Choi

Shufan Li, Konstantinos Kallidromitis, Hritik Bansal, Akash Gokul, Yusuke Kato, Kazuki Kozuka, Jason Kuen, Zhe Lin, Kai-Wei Chang, and Aditya Grover. Lavida: A large diffusion language model for multimodal understanding.arXiv preprint arXiv:2505.16839, 2025

work page arXiv 2025
[20]

Discrete diffusion vla: Bring- ing discrete diffusion to action decoding in vision-language-action policies.arXiv preprint arXiv:2508.20072, 2025

Zhixuan Liang, Yizhuo Li, Tianshuo Yang, Chengyue Wu, Sitong Mao, Liuao Pei, Xiaokang Yang, Jiangmiao Pang, Yao Mu, and Ping Luo. Discrete diffusion vla: Bringing discrete diffusion to action decoding in vision-language-action policies.arXiv preprint arXiv:2508.20072, 2025

work page arXiv 2025
[21]

G2d2: Gradient-guided discrete diffusion for inverse problem solving

Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Bac Nguyen, Stefano Ermon, and Yuki Mitsufuji. G2d2: Gradient-guided discrete diffusion for inverse problem solving. Transactions on Machine Learning Research

work page
[22]

Scaling up masked diffusion models on text.arXiv preprint arXiv:2410.18514,

Shen Nie, Fengqi Zhu, Chao Du, Tianyu Pang, Qian Liu, Guangtao Zeng, Min Lin, and Chongxuan Li. Scaling up masked diffusion models on text.arXiv preprint arXiv:2410.18514, 2024

work page arXiv 2024
[23]

Large Language Diffusion Models

Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models.arXiv preprint arXiv:2502.09992, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[25]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[26]

Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis

Maxim Raginsky, Alexander Rakhlin, and Matus Telgarsky. Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis. In Satyen Kale and Ohad Shamir, editors,Proceedings of the 2017 Conference on Learning Theory, volume 65 ofProceedings of Machine Learning Research, pages 1674–1703. PMLR, 07–10 Jul 2017

work page 2017
[27]

Robert and George Casella.Monte Carlo Statistical Methods

Christian P. Robert and George Casella.Monte Carlo Statistical Methods. Springer Texts in Statistics. Springer New York, NY , 2 edition, 2004

work page 2004
[28]

Roberts and Richard L

Gareth O. Roberts and Richard L. Tweedie. Exponential convergence of Langevin distributions and their discrete approximations.Bernoulli, 2(4):341 – 363, 1996

work page 1996
[29]

Test-time anchoring for discrete diffusion posterior sampling.arXiv preprint arXiv:2510.02291, 2025

Litu Rout, Andreas Lugmayr, Yasamin Jafarian, Srivatsan Varadharajan, Constantine Caramanis, Sanjay Shakkottai, and Ira Kemelmacher-Shlizerman. Test-time anchoring for discrete diffusion posterior sampling.arXiv preprint arXiv:2510.02291, 2025

work page arXiv 2025
[30]

Solving linear inverse problems provably via posterior sampling with latent diffusion models.Advances in Neural Information Processing Systems, 36:49960–49990, 2023

Litu Rout, Negin Raoof, Giannis Daras, Constantine Caramanis, Alex Dimakis, and Sanjay Shakkottai. Solving linear inverse problems provably via posterior sampling with latent diffusion models.Advances in Neural Information Processing Systems, 36:49960–49990, 2023

work page 2023
[31]

Simple and effective masked diffusion language models.Advances in Neural Information Processing Systems, 37:130136–130184, 2024

Subham Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin Chiu, Alexander Rush, and V olodymyr Kuleshov. Simple and effective masked diffusion language models.Advances in Neural Information Processing Systems, 37:130136–130184, 2024

work page 2024
[32]

The diffusion duality.arXiv preprint arXiv:2506.10892, 2025

Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, and V olodymyr Kuleshov. The diffusion duality.arXiv preprint arXiv:2506.10892, 2025

work page arXiv 2025
[33]

Deep unsu- pervised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsu- pervised learning using nonequilibrium thermodynamics. In Francis Bach and David Blei, editors,Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 2256–2265, Lille, France, 07–09 Jul 2015. PMLR

work page 2015
[34]

arXiv preprint arXiv:2307.08123 , year=

Bowen Song, Soo Min Kwon, Zecheng Zhang, Xinyu Hu, Qing Qu, and Liyue Shen. Solv- ing inverse problems with latent diffusion models via hard data consistency.arXiv preprint arXiv:2307.08123, 2023. 11

work page arXiv 2023
[35]

Pseudoinverse-guided diffusion models for inverse problems

Jiaming Song, Arash Vahdat, Morteza Mardani, and Jan Kautz. Pseudoinverse-guided diffusion models for inverse problems. InInternational Conference on Learning Representations, 2023

work page 2023
[36]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2011
[37]

Joshua S. Speagle. A conceptual introduction to markov chain monte carlo methods, 2020

work page 2020
[38]

Peptune: De novo generation of therapeutic peptides with multi-objective-guided discrete diffusion.ArXiv, pages arXiv–2412, 2025

Sophia Tang, Yinuo Zhang, and Pranam Chatterjee. Peptune: De novo generation of therapeutic peptides with multi-objective-guided discrete diffusion.ArXiv, pages arXiv–2412, 2025

work page 2025
[39]

Li Tingguang, Ho Danny, Li Chenming, Zhu Delong, Wang Chaoqun, and Max Q.-H. Meng. Houseexpo: A large-scale 2d indoor layout dataset for learning-based algorithms on mobile robots.arXiv preprint arXiv:1903.09845, 2019

work page arXiv 1903
[40]

Neural discrete representation learning

Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

work page 2017
[41]

Split-and-augmented gibbs sam- pler—application to large-scale inference problems.IEEE Transactions on Signal Processing, 67(6):1648–1661, 2019

Maxime V ono, Nicolas Dobigeon, and Pierre Chainais. Split-and-augmented gibbs sam- pler—application to large-scale inference problems.IEEE Transactions on Signal Processing, 67(6):1648–1661, 2019

work page 2019
[42]

Dplm-2: A multimodal diffusion protein language model.arXiv preprint arXiv:2410.13782, 2024

Xinyou Wang, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, and Quanquan Gu. Dplm-2: A multimodal diffusion protein language model.arXiv preprint arXiv:2410.13782, 2024

work page arXiv 2024
[43]

Diffsound: Discrete diffusion model for text-to-sound generation.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:1720–1733, 2023

Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, and Dong Yu. Diffsound: Discrete diffusion model for text-to-sound generation.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:1720–1733, 2023

work page 2023
[44]

Mmada: Multimodal large diffusion language models

Ling Yang, Ye Tian, Bowen Li, Xinchen Zhang, Ke Shen, Yunhai Tong, and Mengdi Wang. Mmada: Multimodal large diffusion language models.arXiv preprint arXiv:2505.15809, 2025

work page arXiv 2025
[45]

Cl-dps: Acontrastive learning approach to blind nonlinear inverse problem solving via diffusion posterior sampling

Linfeng Ye, Shayan Mohajer Hamidi, Mert Pilanci, and Konstantinos N Plataniotis. Cl-dps: Acontrastive learning approach to blind nonlinear inverse problem solving via diffusion posterior sampling

work page
[46]

Cfp-gen: Combinatorial functional protein generation via diffusion language models.arXiv preprint arXiv:2505.22869, 2025

Junbo Yin, Chao Zha, Wenjia He, Chencheng Xu, and Xin Gao. Cfp-gen: Combinatorial functional protein generation via diffusion language models.arXiv preprint arXiv:2505.22869, 2025

work page arXiv 2025
[47]

Improving diffusion inverse problem solving with decoupled noise annealing

Bingliang Zhang, Wenda Chu, Julius Berner, Chenlin Meng, Anima Anandkumar, and Yang Song. Improving diffusion inverse problem solving with decoupled noise annealing. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 20895–20905, 2025

work page 2025
[48]

The unrea- sonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018

work page 2018
[49]

A Langevin-like sampler for discrete distributions

Ruqi Zhang, Xingchao Liu, and Qiang Liu. A Langevin-like sampler for discrete distributions. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 26375–26396. PMLR, 17–23 Jul 2022

work page 2022
[50]

Feng, Caifeng Zou, Yu Sun, Nikola Kovachki, Zachary E

Hongkai Zheng, Wenda Chu, Bingliang Zhang, Zihui Wu, Austin Wang, Berthy T. Feng, Caifeng Zou, Yu Sun, Nikola Kovachki, Zachary E. Ross, Katherine L. Bouman, and Yisong Yue. Inversebench: Benchmarking plug-and-play diffusion priors for inverse problems in physical sciences, 2025

work page 2025
[51]

LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models

Fengqi Zhu, Rongzhen Wang, Shen Nie, Xiaolu Zhang, Chunwei Wu, Jun Hu, Jun Zhou, Jianfei Chen, Yankai Lin, Ji-Rong Wen, et al. Llada 1.5: Variance-reduced preference optimization for large language diffusion models.arXiv preprint arXiv:2505.19223, 2025. 12 A Algorithm Below, we describe our∆LPS’s algorithm. Algorithm 1∆LPS: Discrete Langevin-Inspired Post...

work page internal anchor Pith review arXiv 2025
[52]

constant

The visible alphabet has two symbols; masked diffusion augments the model vocabulary with a dedicated mask token for absorbing dynamics. DUO (best MNIST checkpoint).Training uses AdamW with zero weight decay, learning rate 3×10 −4, default AdamW momentum parameters, 2500 warmup steps to a constant LR, per-step batch size 128 on one GPU, mixed bfloat16, gr...

work page
[53]

we were unable to find the license for the dataset we used

In our discrete setting, candidate embeddings must come from codebook entries. Therefore, for a proposed token sequence z′ 0, we set C′ 0 =c(z ′ 0).The likelihood part of the directional derivative is available by backpropagation: ⟨∇C0 Uy(C0;y),c(z ′ 0)−c(z 0)⟩.(24) However, the prior term Uθ(z0;z t) is defined only on discrete token sequences, so its gra...

work page
[54]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page