pith. machine review for the scientific record. sign in

arxiv: 2605.09302 · v1 · submitted 2026-05-10 · 💻 cs.LG · cs.CV

Recognition: no theorem link

Discrete Langevin-Inspired Posterior Sampling

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:16 UTC · model grok-4.3

classification 💻 cs.LG cs.CV
keywords discrete diffusionposterior samplinginverse problemslangevin dynamicsimage restorationdiscrete state spacegenerative priors
0
0 comments X

The pith

A gradient-guided sampler selects discrete state changes to approximate posteriors using discrete diffusion priors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method for posterior sampling in discrete state spaces where discrete diffusion models serve as generative priors for inverse problems. Existing discrete samplers typically rely on continuous relaxations, Gibbs updates, or corruption-specific mechanisms that restrict scalability or generality. The new approach adapts Langevin-style dynamics to operate fully in the discrete domain by using gradient signals from the prior to choose promising token transitions. This design supports parallel updates over all dimensions and works with any training regime for the diffusion prior, such as masked or uniform-state variants. Experiments on image restoration and spatial mapping tasks show gains over prior discrete methods and parity with strong continuous solvers, indicating that fully discrete gradient-informed sampling offers a viable path for inverse problems on discrete representations.

Core claim

The central claim is that a Discrete Langevin-Inspired Posterior Sampler can approximate the posterior over discrete states by employing gradients from the diffusion prior to identify high-value discrete moves, without leaving the discrete state space or depending on the prior's training paradigm. This enables efficient parallel sampling and delivers competitive performance on linear, nonlinear, and blind inverse problems across image and mapping benchmarks.

What carries the argument

ΔLPS, the Discrete Langevin-Inspired Posterior Sampler that uses gradient information from the discrete diffusion prior to select promising discrete state transitions while staying inside the discrete domain.

If this is right

  • Enables parallel updates across all token dimensions for faster sampling.
  • Remains agnostic to the discrete diffusion prior's training paradigm, covering masked and uniform-state cases.
  • Outperforms recent discrete diffusion posterior samplers on image restoration tasks.
  • Achieves results competitive with strong continuous diffusion inverse solvers on linear, nonlinear, and blind problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could extend to other discrete inverse problems such as text inpainting or combinatorial structure recovery where continuous relaxations distort the output space.
  • Avoiding continuous variables may help preserve exact discrete constraints in applications like symbolic reasoning or code generation.
  • Integration with accelerated sampling schedules or learned score functions could further improve speed without leaving the discrete setting.

Load-bearing premise

Gradient signals from the discrete diffusion prior can be used to pick discrete state transitions that approximate the true posterior without large bias or problem-specific adjustments.

What would settle it

If samples produced by the method show systematically higher reconstruction error or lower posterior likelihood than strong baselines on a standard inverse problem benchmark, the gradient-based discrete selection would be shown insufficient.

Figures

Figures reproduced from arXiv: 2605.09302 by Chaitanya Amballa, Jorge Van\v{c}o Sampedro, Romit Roy Choudhury, Sattwik Basu.

Figure 1
Figure 1. Figure 1: This paper introduces ∆LPS, a training-free approach for discrete posterior sampling using discrete diffusion priors. Shown here are results from various image restoration tasks across the MNIST, CIFAR, FFHQ datasets along with and spatial mapping. Project website: https://discretelps.github.io method, is specially designed around masked-diffusion models and anchoring. These limitations motivate posterior … view at source ↗
Figure 2
Figure 2. Figure 2: Comparison on FFHQ across multiple inverse problems, including HDR reconstruction, [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative results showing ∆LPS’s performance across various settings. On top, we show motion deblurring on FFHQ, outperforming the discrete APS baseline. In the middle, we evaluate on three different tasks on MNIST. The last row shows performance on the blind inverse setting, where the floorplan is estimated from walking trajectories [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative results for the HDR recon￾struction task on FFHQ. Each example contains the ground truth image, corrupted measurement, and reconstruction from ∆LPS [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative results for random inpaint￾ing on FFHQ. Even under large missing regions, ∆LPS generates plausible and visually faithful completions [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative results for 4× super￾resolution on FFHQ. ∆LPS recovers fine facial details and realistic textures from low-resolution measurements [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
read the original abstract

We study posterior sampling for inverse problems in discrete state spaces using discrete diffusion models as generative priors. While continuous diffusion models have become widely used for inverse problems, their discrete counterparts remain comparatively underexplored. Existing discrete posterior samplers often rely on continuous relaxations of discrete variables, Gibbs-style updates, or mechanisms specialized to particular corruption processes, which can limit scalability or generality. We propose $\Delta$LPS, a Discrete Langevin-Inspired Posterior Sampler that uses gradient information to identify promising discrete moves without leaving the discrete state space. The resulting approach enables efficient parallel updates across all token dimensions and is agnostic to the training paradigm of the discrete diffusion prior, including masked and uniform-state diffusion. We evaluate our method on image restoration tasks across MNIST, CIFAR, and FFHQ, as well as spatial mapping, covering linear, nonlinear, and blind inverse problems. Across these settings, we improve over recent discrete diffusion posterior samplers and are competitive with strong continuous diffusion-based inverse solvers. Our results suggest that fully discrete, gradient-informed posterior samplers offer a scalable and general path toward solving inverse problems over discrete representations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript proposes ΔLPS, a Discrete Langevin-Inspired Posterior Sampler for inverse problems in discrete state spaces that employs gradient information from a discrete diffusion prior to select promising token-level state transitions while remaining entirely in the discrete domain. The approach supports efficient parallel updates across all dimensions and is presented as agnostic to the prior's training paradigm (masked or uniform-state diffusion). Empirical evaluations on linear, nonlinear, and blind inverse problems across MNIST, CIFAR, FFHQ image restoration and spatial mapping tasks show improvements over prior discrete diffusion samplers and competitiveness with continuous diffusion-based solvers.

Significance. If the central claim that ΔLPS produces samples from the correct posterior holds, the work provides a useful advance for posterior sampling with discrete generative priors. It avoids continuous relaxations and problem-specific mechanisms, enabling scalable parallel sampling that could extend to other discrete domains. The reported competitiveness with strong continuous baselines on multiple datasets and problem types indicates practical relevance, and the generality across diffusion training paradigms is a positive feature.

major comments (2)
  1. [Section 3.2, Algorithm 1] Section 3.2 and Algorithm 1: The gradient-based rule for proposing and accepting discrete state transitions is motivated by analogy to continuous Langevin dynamics but lacks a derivation or proof that the resulting Markov chain satisfies detailed balance (or an equivalent reversibility condition) with respect to the target posterior p(x|y) for arbitrary corruption processes. Without this, it is unclear whether the stationary distribution matches the desired posterior or deviates systematically, especially under strong conditioning or noisy score estimates.
  2. [Section 4] Section 4, experimental setup: No analysis or bounds are provided on the bias introduced by the gradient approximation or the finite number of sampling steps relative to the true posterior; the empirical gains could therefore reflect optimization heuristics rather than accurate posterior sampling. This directly affects the interpretation of the reported improvements over baselines.
minor comments (3)
  1. The abstract states that the method 'improves over recent discrete diffusion posterior samplers' but does not include any quantitative metrics or dataset-specific numbers; adding one or two key performance figures would strengthen the summary.
  2. Notation for the discrete state space, corruption kernel, and gradient computation is introduced inline without a consolidated preliminary section, which can make the methods harder to follow on first reading.
  3. Figure captions for the qualitative results on FFHQ and spatial mapping tasks would benefit from explicit mention of the conditioning signal y and the number of sampling steps used.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback and positive assessment of the practical relevance of ΔLPS. We address the two major comments point by point below.

read point-by-point responses
  1. Referee: [Section 3.2, Algorithm 1] Section 3.2 and Algorithm 1: The gradient-based rule for proposing and accepting discrete state transitions is motivated by analogy to continuous Langevin dynamics but lacks a derivation or proof that the resulting Markov chain satisfies detailed balance (or an equivalent reversibility condition) with respect to the target posterior p(x|y) for arbitrary corruption processes. Without this, it is unclear whether the stationary distribution matches the desired posterior or deviates systematically, especially under strong conditioning or noisy score estimates.

    Authors: ΔLPS is presented as a Langevin-inspired heuristic rather than an exact sampler. The transition rule uses the gradient of the log-posterior (combining the discrete diffusion score and the data likelihood term) to bias selection of discrete token changes toward higher-probability states while remaining fully discrete and supporting parallel updates. We do not claim or derive that the resulting chain satisfies detailed balance with respect to p(x|y) for arbitrary corruption processes; such a guarantee would require additional assumptions on the prior and corruption that are not generally available for masked or uniform discrete diffusion models. The method instead prioritizes computational efficiency and generality across training paradigms. Empirical results on image restoration and spatial mapping tasks show that the produced samples are competitive with continuous diffusion solvers and superior to prior discrete methods, indicating that the approximation is effective for the targeted applications. revision: no

  2. Referee: [Section 4] Section 4, experimental setup: No analysis or bounds are provided on the bias introduced by the gradient approximation or the finite number of sampling steps relative to the true posterior; the empirical gains could therefore reflect optimization heuristics rather than accurate posterior sampling. This directly affects the interpretation of the reported improvements over baselines.

    Authors: We did not provide theoretical bounds on approximation bias or finite-step error, as the validation strategy is empirical. Section 4 evaluates ΔLPS on linear, nonlinear, and blind inverse problems across MNIST, CIFAR, FFHQ, and spatial mapping tasks, demonstrating consistent outperformance over recent discrete diffusion samplers and competitiveness with strong continuous baselines. These results support the practical utility of the gradient-guided discrete updates even without exact posterior guarantees. While bounds would strengthen the theoretical interpretation, their absence does not change the reported empirical findings or the claim that fully discrete gradient-informed sampling offers a scalable alternative for discrete inverse problems. revision: no

standing simulated objections not resolved
  • Derivation or proof that the discrete Markov chain satisfies detailed balance with respect to the target posterior p(x|y) for arbitrary corruption processes.

Circularity Check

0 steps flagged

No circularity: new algorithmic construction with independent empirical validation

full rationale

The paper proposes ΔLPS as a novel discrete sampler that adapts gradient guidance from continuous Langevin dynamics to remain fully in discrete state space, enabling parallel token updates agnostic to the diffusion training paradigm. No equations or claims in the abstract reduce the sampler's validity, proposal distribution, or reported performance gains to a parameter fitted against the target posterior or to a self-citation chain. The method is presented as a heuristic-motivated construction whose correctness is assessed via empirical comparisons on MNIST, CIFAR, FFHQ, and spatial mapping tasks rather than by deriving the stationary distribution from the inputs by construction. This is the common case of an honest algorithmic contribution whose central claim does not collapse to tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger is necessarily incomplete; no explicit free parameters, new entities, or ad-hoc axioms are named.

axioms (1)
  • domain assumption Discrete diffusion models trained on clean data can serve as useful generative priors for posterior sampling in inverse problems.
    This is the foundational premise for using the diffusion model inside the proposed sampler.

pith-pipeline@v0.9.0 · 5506 in / 1209 out tokens · 36900 ms · 2026-05-12T03:16:35.842206+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 5 internal anchors

  1. [1]

    Struc- tured denoising diffusion models in discrete state-spaces

    Jacob Austin, Daniel Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg. Struc- tured denoising diffusion models in discrete state-spaces. InAdvances in Neural Information Processing Systems (NeurIPS), 2021

  2. [2]

    Contrastive diffusion guidance for spatial inverse problems

    Sattwik Basu, Chaitanya Amballa, Zhongweiyang Xu, Jorge Vanˇco Sampedro, Srihari Nelaku- diti, and Romit Roy Choudhury. Contrastive diffusion guidance for spatial inverse problems. arXiv preprint arXiv:2509.26489, 2025

  3. [3]

    Split gibbs discrete diffusion posterior sampling

    Wenda Chu, Zihui Wu, Yifan Chen, Yang Song, and Yisong Yue. Split gibbs discrete diffusion posterior sampling. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

  4. [4]

    Diffusion Posterior Sampling for General Noisy Inverse Problems

    Hyungjin Chung, Jeongsol Kim, Michael T Mccann, Marc L Klasky, and Jong Chul Ye. Diffu- sion posterior sampling for general noisy inverse problems.arXiv preprint arXiv:2209.14687, 2022

  5. [5]

    Engl, Martin Hanke, and Andreas Neubauer.Regularization of Inverse Problems, volume 375 ofMathematics and Its Applications

    Heinz W. Engl, Martin Hanke, and Andreas Neubauer.Regularization of Inverse Problems, volume 375 ofMathematics and Its Applications. Springer, Dordrecht, 1996

  6. [6]

    Scaling diffusion language models via adaptation from autoregressive models

    Shansan Gong, Shivam Agarwal, Yizhe Zhang, Jiacheng Ye, Lin Zheng, Mukai Li, Chenxin An, Peilin Zhao, Wei Bi, Jiawei Han, et al. Scaling diffusion language models via adaptation from autoregressive models.arXiv preprint arXiv:2410.17891, 2024

  7. [7]

    Oops i took a gradient: Scalable sampling for discrete distributions

    Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud, and Chris Maddison. Oops i took a gradient: Scalable sampling for discrete distributions. InInternational Conference on Machine Learning, pages 3831–3841. PMLR, 2021

  8. [8]

    A formal basis for the heuristic determination of minimum cost paths.IEEE transactions on Systems Science and Cybernetics, 4(2):100–107, 1968

    Peter E Hart, Nils J Nilsson, and Bertram Raphael. A formal basis for the heuristic determination of minimum cost paths.IEEE transactions on Systems Science and Cybernetics, 4(2):100–107, 1968

  9. [9]

    Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

  10. [10]

    Introducing mercury: The first commercial diffusion-based language model

    Inception Labs. Introducing mercury: The first commercial diffusion-based language model. https://www.inceptionlabs.ai/blog/introducing-mercury, 2025. Accessed: 2026- 05-02

  11. [11]

    A survey on contrastive self-supervised learning, 2021

    Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee, and Fillia Makedon. A survey on contrastive self-supervised learning, 2021

  12. [12]

    Embed and emulate: Contrastive representations for simulation-based inference.arXiv preprint arXiv:2409.18402, 2024

    Ruoxi Jiang, Peter Y Lu, and Rebecca Willett. Embed and emulate: Contrastive representations for simulation-based inference.arXiv preprint arXiv:2409.18402, 2024

  13. [13]

    A style-based generator architecture for generative adversarial networks

    Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019

  14. [14]

    Mercury: Ultra-fast language models based on diffusion.arXiv e-prints, pages arXiv–2506, 2025

    Samar Khanna, Siddhant Kharbanda, Shufan Li, Harshit Varma, Eric Wang, Sawyer Birnbaum, Ziyang Luo, Yanis Miraoui, Akash Palrecha, Stefano Ermon, et al. Mercury: Ultra-fast language models based on diffusion.arXiv e-prints, pages arXiv–2506, 2025

  15. [15]

    Supervised contrastive learning

    Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 18661–18673. Curran Associates, Inc., 2020

  16. [16]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017

  17. [17]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. 10

  18. [18]

    Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 2002

    Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 2002

  19. [19]

    Bill Yuchen Lin, Ronan Le Bras, Kyle Richardson, Ashish Sabharwal, Radha Poovendran, Peter Clark, and Yejin Choi

    Shufan Li, Konstantinos Kallidromitis, Hritik Bansal, Akash Gokul, Yusuke Kato, Kazuki Kozuka, Jason Kuen, Zhe Lin, Kai-Wei Chang, and Aditya Grover. Lavida: A large diffusion language model for multimodal understanding.arXiv preprint arXiv:2505.16839, 2025

  20. [20]

    Discrete diffusion vla: Bring- ing discrete diffusion to action decoding in vision-language-action policies.arXiv preprint arXiv:2508.20072, 2025

    Zhixuan Liang, Yizhuo Li, Tianshuo Yang, Chengyue Wu, Sitong Mao, Liuao Pei, Xiaokang Yang, Jiangmiao Pang, Yao Mu, and Ping Luo. Discrete diffusion vla: Bringing discrete diffusion to action decoding in vision-language-action policies.arXiv preprint arXiv:2508.20072, 2025

  21. [21]

    G2d2: Gradient-guided discrete diffusion for inverse problem solving

    Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Bac Nguyen, Stefano Ermon, and Yuki Mitsufuji. G2d2: Gradient-guided discrete diffusion for inverse problem solving. Transactions on Machine Learning Research

  22. [22]

    Scaling up masked diffusion models on text.arXiv preprint arXiv:2410.18514,

    Shen Nie, Fengqi Zhu, Chao Du, Tianyu Pang, Qian Liu, Guangtao Zeng, Min Lin, and Chongxuan Li. Scaling up masked diffusion models on text.arXiv preprint arXiv:2410.18514, 2024

  23. [23]

    Large Language Diffusion Models

    Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models.arXiv preprint arXiv:2502.09992, 2025

  24. [25]

    Representation Learning with Contrastive Predictive Coding

    Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

  25. [26]

    Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis

    Maxim Raginsky, Alexander Rakhlin, and Matus Telgarsky. Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis. In Satyen Kale and Ohad Shamir, editors,Proceedings of the 2017 Conference on Learning Theory, volume 65 ofProceedings of Machine Learning Research, pages 1674–1703. PMLR, 07–10 Jul 2017

  26. [27]

    Robert and George Casella.Monte Carlo Statistical Methods

    Christian P. Robert and George Casella.Monte Carlo Statistical Methods. Springer Texts in Statistics. Springer New York, NY , 2 edition, 2004

  27. [28]

    Roberts and Richard L

    Gareth O. Roberts and Richard L. Tweedie. Exponential convergence of Langevin distributions and their discrete approximations.Bernoulli, 2(4):341 – 363, 1996

  28. [29]

    Test-time anchoring for discrete diffusion posterior sampling.arXiv preprint arXiv:2510.02291, 2025

    Litu Rout, Andreas Lugmayr, Yasamin Jafarian, Srivatsan Varadharajan, Constantine Caramanis, Sanjay Shakkottai, and Ira Kemelmacher-Shlizerman. Test-time anchoring for discrete diffusion posterior sampling.arXiv preprint arXiv:2510.02291, 2025

  29. [30]

    Solving linear inverse problems provably via posterior sampling with latent diffusion models.Advances in Neural Information Processing Systems, 36:49960–49990, 2023

    Litu Rout, Negin Raoof, Giannis Daras, Constantine Caramanis, Alex Dimakis, and Sanjay Shakkottai. Solving linear inverse problems provably via posterior sampling with latent diffusion models.Advances in Neural Information Processing Systems, 36:49960–49990, 2023

  30. [31]

    Simple and effective masked diffusion language models.Advances in Neural Information Processing Systems, 37:130136–130184, 2024

    Subham Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin Chiu, Alexander Rush, and V olodymyr Kuleshov. Simple and effective masked diffusion language models.Advances in Neural Information Processing Systems, 37:130136–130184, 2024

  31. [32]

    The diffusion duality.arXiv preprint arXiv:2506.10892, 2025

    Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, and V olodymyr Kuleshov. The diffusion duality.arXiv preprint arXiv:2506.10892, 2025

  32. [33]

    Deep unsu- pervised learning using nonequilibrium thermodynamics

    Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsu- pervised learning using nonequilibrium thermodynamics. In Francis Bach and David Blei, editors,Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 2256–2265, Lille, France, 07–09 Jul 2015. PMLR

  33. [34]

    arXiv preprint arXiv:2307.08123 , year=

    Bowen Song, Soo Min Kwon, Zecheng Zhang, Xinyu Hu, Qing Qu, and Liyue Shen. Solv- ing inverse problems with latent diffusion models via hard data consistency.arXiv preprint arXiv:2307.08123, 2023. 11

  34. [35]

    Pseudoinverse-guided diffusion models for inverse problems

    Jiaming Song, Arash Vahdat, Morteza Mardani, and Jan Kautz. Pseudoinverse-guided diffusion models for inverse problems. InInternational Conference on Learning Representations, 2023

  35. [36]

    Score-Based Generative Modeling through Stochastic Differential Equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020

  36. [37]

    Joshua S. Speagle. A conceptual introduction to markov chain monte carlo methods, 2020

  37. [38]

    Peptune: De novo generation of therapeutic peptides with multi-objective-guided discrete diffusion.ArXiv, pages arXiv–2412, 2025

    Sophia Tang, Yinuo Zhang, and Pranam Chatterjee. Peptune: De novo generation of therapeutic peptides with multi-objective-guided discrete diffusion.ArXiv, pages arXiv–2412, 2025

  38. [39]

    Li Tingguang, Ho Danny, Li Chenming, Zhu Delong, Wang Chaoqun, and Max Q.-H. Meng. Houseexpo: A large-scale 2d indoor layout dataset for learning-based algorithms on mobile robots.arXiv preprint arXiv:1903.09845, 2019

  39. [40]

    Neural discrete representation learning

    Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

  40. [41]

    Split-and-augmented gibbs sam- pler—application to large-scale inference problems.IEEE Transactions on Signal Processing, 67(6):1648–1661, 2019

    Maxime V ono, Nicolas Dobigeon, and Pierre Chainais. Split-and-augmented gibbs sam- pler—application to large-scale inference problems.IEEE Transactions on Signal Processing, 67(6):1648–1661, 2019

  41. [42]

    Dplm-2: A multimodal diffusion protein language model.arXiv preprint arXiv:2410.13782, 2024

    Xinyou Wang, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, and Quanquan Gu. Dplm-2: A multimodal diffusion protein language model.arXiv preprint arXiv:2410.13782, 2024

  42. [43]

    Diffsound: Discrete diffusion model for text-to-sound generation.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:1720–1733, 2023

    Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, and Dong Yu. Diffsound: Discrete diffusion model for text-to-sound generation.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:1720–1733, 2023

  43. [44]

    Mmada: Multimodal large diffusion language models

    Ling Yang, Ye Tian, Bowen Li, Xinchen Zhang, Ke Shen, Yunhai Tong, and Mengdi Wang. Mmada: Multimodal large diffusion language models.arXiv preprint arXiv:2505.15809, 2025

  44. [45]

    Cl-dps: Acontrastive learning approach to blind nonlinear inverse problem solving via diffusion posterior sampling

    Linfeng Ye, Shayan Mohajer Hamidi, Mert Pilanci, and Konstantinos N Plataniotis. Cl-dps: Acontrastive learning approach to blind nonlinear inverse problem solving via diffusion posterior sampling

  45. [46]

    Cfp-gen: Combinatorial functional protein generation via diffusion language models.arXiv preprint arXiv:2505.22869, 2025

    Junbo Yin, Chao Zha, Wenjia He, Chencheng Xu, and Xin Gao. Cfp-gen: Combinatorial functional protein generation via diffusion language models.arXiv preprint arXiv:2505.22869, 2025

  46. [47]

    Improving diffusion inverse problem solving with decoupled noise annealing

    Bingliang Zhang, Wenda Chu, Julius Berner, Chenlin Meng, Anima Anandkumar, and Yang Song. Improving diffusion inverse problem solving with decoupled noise annealing. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 20895–20905, 2025

  47. [48]

    The unrea- sonable effectiveness of deep features as a perceptual metric

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018

  48. [49]

    A Langevin-like sampler for discrete distributions

    Ruqi Zhang, Xingchao Liu, and Qiang Liu. A Langevin-like sampler for discrete distributions. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 26375–26396. PMLR, 17–23 Jul 2022

  49. [50]

    Feng, Caifeng Zou, Yu Sun, Nikola Kovachki, Zachary E

    Hongkai Zheng, Wenda Chu, Bingliang Zhang, Zihui Wu, Austin Wang, Berthy T. Feng, Caifeng Zou, Yu Sun, Nikola Kovachki, Zachary E. Ross, Katherine L. Bouman, and Yisong Yue. Inversebench: Benchmarking plug-and-play diffusion priors for inverse problems in physical sciences, 2025

  50. [51]

    LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models

    Fengqi Zhu, Rongzhen Wang, Shen Nie, Xiaolu Zhang, Chunwei Wu, Jun Hu, Jun Zhou, Jianfei Chen, Yankai Lin, Ji-Rong Wen, et al. Llada 1.5: Variance-reduced preference optimization for large language diffusion models.arXiv preprint arXiv:2505.19223, 2025. 12 A Algorithm Below, we describe our∆LPS’s algorithm. Algorithm 1∆LPS: Discrete Langevin-Inspired Post...

  51. [52]

    constant

    The visible alphabet has two symbols; masked diffusion augments the model vocabulary with a dedicated mask token for absorbing dynamics. DUO (best MNIST checkpoint).Training uses AdamW with zero weight decay, learning rate 3×10 −4, default AdamW momentum parameters, 2500 warmup steps to a constant LR, per-step batch size 128 on one GPU, mixed bfloat16, gr...

  52. [53]

    we were unable to find the license for the dataset we used

    In our discrete setting, candidate embeddings must come from codebook entries. Therefore, for a proposed token sequence z′ 0, we set C′ 0 =c(z ′ 0).The likelihood part of the directional derivative is available by backpropagation: ⟨∇C0 Uy(C0;y),c(z ′ 0)−c(z 0)⟩.(24) However, the prior term Uθ(z0;z t) is defined only on discrete token sequences, so its gra...

  53. [54]

    Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

    Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...