Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space

Benjamin J. Zhang; Kelvin Kan; Markos A. Katsoulakis; Stanley Osher; Tuhin Sahai; Xingjian Li

arxiv: 2605.17232 · v3 · pith:SQ62TMIUnew · submitted 2026-05-17 · 💻 cs.LG · math.ST· stat.ML· stat.TH

Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space

Kelvin Kan , Xingjian Li , Benjamin J. Zhang , Tuhin Sahai , Stanley Osher , Markos A. Katsoulakis This is my paper

Pith reviewed 2026-06-30 19:11 UTC · model grok-4.3

classification 💻 cs.LG math.STstat.MLstat.TH

keywords discrete diffusionconvergence boundsadjoint equationsdimension-freeintegral probability metricsmasked diffusiongenerative modeling

0 comments

The pith

Adjoint equations establish dimension-free convergence bounds for discrete diffusion models in any integral probability metric.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an adjoint-equation framework to prove convergence of discrete diffusion models without dependence on state space size S. Existing KL analyses fail for singular priors like masked distributions, while total variation bounds grow with S and become useless for large vocabularies in language modeling. The new approach works in the space of observables, uses coupling and cancellation techniques to eliminate S-dependence, and holds under one rate-matrix regularity condition for general priors. It covers both masked and uniform transitions in any integral probability metric. The framework also supplies tools for loss function design and step complexity analysis.

Core claim

We develop a unified adjoint-equation-based framework that establishes dimension-free convergence guarantees in any integral probability metric. To the best of our knowledge, our bounds are the first to be entirely free of S and applicable to both masked and uniform priors. Our theory relies only on a single standard rate-matrix regularity assumption and applies to general priors.

What carries the argument

adjoint equations operating on observables rather than probability measures directly, combined with coupling, score-marginal cancellation, and exit-routing arguments

If this is right

Convergence bounds hold without any dependence on state space size S.
Analysis covers both masked and uniform priors under the same framework.
Guarantees apply in every integral probability metric rather than only KL or TV.
The same machinery yields principled loss functions and dimension-free step counts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The observable-space view may simplify analysis of other discrete generative processes that currently rely on path-space divergences.
Practitioners gain a concrete way to select sampling schedules whose error scales independently of vocabulary size.
The single-assumption structure suggests the bounds could transfer to diffusion models on graphs or other discrete structures.

Load-bearing premise

The rate matrix satisfies one standard regularity condition.

What would settle it

Numerical verification that the derived convergence bound stays constant as vocabulary size S increases from 10^3 to 10^5 under masked transitions.

read the original abstract

Discrete diffusion has become a leading framework for generative modeling in various applications including language, vision, and biology. Existing convergence theory, however, exhibits fundamental limitations. KL-based analyses diverge under singular priors such as the masked distribution, while bounds in total variation (TV) depend on the state space size $S$ and become vacuous for modern language tasks, where vocabularies contain hundreds of thousands of tokens. We develop a unified adjoint-equation-based framework that establishes dimension-free convergence guarantees in any integral probability metric (IPM). To the best of our knowledge, our bounds are the first to be entirely free of $S$ and applicable to both masked and uniform priors. Importantly, our theory relies only on a single standard rate-matrix regularity assumption and applies to general priors. Five novel techniques drive our improvements: working in the space of observables via adjoint equations rather than directly with probability measures, a regularity analysis that yields bounds on any IPM, a coupling argument that removes $S$-dependence under uniform transitions, and score-marginal cancellation and exit-routing techniques that remove $S$-dependence under masked transitions. Our framework thus sharply departs from prior analyses and avoids the shortcomings of pathspace-KL and existing TV-based approaches. Beyond convergence bounds, our framework provides a versatile toolkit for further theoretical study of discrete diffusion models, including principled choices of loss functions and dimension-free step complexity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives the first S-free IPM convergence bounds for discrete diffusion on both masked and uniform priors by shifting to adjoint equations and a few targeted cancellation tricks.

read the letter

The main thing to know is that the authors claim dimension-free bounds in any integral probability metric that hold for general priors and only need one standard rate-matrix assumption. Previous TV bounds scaled with vocabulary size S and became useless for language-scale models, while KL analyses broke on masked distributions; this work says it sidesteps both problems.

What they actually do is move the analysis to the space of observables via adjoint equations rather than working directly with measures. From there they add a regularity result that works for any IPM, a coupling argument that kills S-dependence under uniform transitions, and two masked-specific moves (score-marginal cancellation and exit-routing) that do the same. The abstract positions these as five distinct departures from pathspace-KL and existing TV arguments. If the proofs deliver on the cancellation steps without hidden S factors, the framework is genuinely new and gives a reusable toolkit for loss design and step counts.

The soft spot is that everything rests on that single regularity assumption being sufficient in practice; the abstract calls it standard, but it would be useful to see how restrictive it is once the rate matrices come from real trained models. No other gaps jump out from the high-level argument.

This is for people doing theory on discrete generative models who need scaling results that survive large vocabularies. A reader working on language or biological sequence models would get concrete value from the bounds and the adjoint toolkit. It deserves a serious referee because the central claim targets a real barrier and the technical moves look distinct enough to warrant checking the details.

Referee Report

0 major / 0 minor

Summary. The manuscript develops a unified adjoint-equation-based framework for discrete diffusion models that establishes dimension-free convergence guarantees in any integral probability metric (IPM). The central claims are that the bounds are the first to be entirely free of the state space size S, apply to both masked and uniform priors, rely only on a single standard rate-matrix regularity assumption, and apply to general priors. Five novel techniques are introduced: working in the space of observables via adjoint equations rather than directly with probability measures, a regularity analysis that yields bounds on any IPM, a coupling argument that removes S-dependence under uniform transitions, and score-marginal cancellation and exit-routing techniques that remove S-dependence under masked transitions. The framework is presented as departing from pathspace-KL and existing TV-based approaches while providing a versatile toolkit for further theoretical study, including principled loss function choices and dimension-free step complexity.

Significance. If the results hold, this would represent a significant advance in the theoretical analysis of discrete diffusion models, which are leading frameworks for generative modeling in language, vision, and biology. By delivering the first S-free IPM bounds that remain valid for singular priors (avoiding KL divergence issues) and large vocabularies (avoiding vacuous TV bounds), the work directly addresses key limitations of prior analyses. The adjoint-equation approach and the listed techniques for dimension removal constitute a promising shift, and the single-assumption reliance is a strength if the assumption is indeed standard and sufficient.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their summary of the manuscript and for noting the potential significance of the adjoint-equation framework for dimension-free convergence in discrete diffusion models. No specific major comments were listed in the report, so we provide no point-by-point responses below. We remain available to clarify any aspects of the results, including the single rate-matrix assumption or the techniques for removing S-dependence.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained under stated standard assumption

full rationale

The paper develops an adjoint-equation framework for IPM convergence bounds that are S-free for general priors, relying explicitly on one standard rate-matrix regularity assumption rather than any fitted parameters, self-citations, or ansatzes that reduce to the target result. The listed techniques (coupling, score-marginal cancellation, exit-routing) are presented as novel contributions that remove dimension dependence without the derivation chain collapsing to a redefinition or prior self-citation load-bearing step. No equations or claims in the provided text exhibit self-definitional equivalence, fitted-input-as-prediction, or uniqueness imported via author overlap. The central result is therefore independent of its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on one domain assumption about rate-matrix regularity; no free parameters or invented entities are mentioned.

axioms (1)

domain assumption Single standard rate-matrix regularity assumption
Abstract states the theory relies only on this assumption for general priors.

pith-pipeline@v0.9.1-grok · 5806 in / 1102 out tokens · 25494 ms · 2026-06-30T19:11:29.152342+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Scores to Gibbs Correctors: Accelerating Uniform-Rate Discrete Diffusion Models
cs.LG 2026-05 unverdicted novelty 6.0

GADD achieves O(polylog(ε^{-1})) sampling complexity for uniform-rate discrete diffusion models via Gibbs correctors derived from the score function, with supporting experiments on text and music.

Reference graph

Works this paper leans on

51 extracted references · 8 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
[2]

On the numerical analysis of inhomogeneous continuous-time markov chains

Markus Arns, Peter Buchholz, and Andriy Panchenko. On the numerical analysis of inhomogeneous continuous-time markov chains. INFORMS Journal on Computing, 22 0 (3): 0 416--432, 2010

2010
[3]

Structured denoising diffusion models in discrete state-spaces

Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tarlow, and Rianne Van Den Berg. Structured denoising diffusion models in discrete state-spaces. Advances in neural information processing systems, 34: 0 17981--17993, 2021

2021
[4]

Katsoulakis, Yannis Pantazis, and Luc Rey-Bellet

Jeremiah Birrell, Paul Dupuis, Markos A. Katsoulakis, Yannis Pantazis, and Luc Rey-Bellet. (f,gamma)-divergences: Interpolating between f-divergences and integral probability metrics. Journal of Machine Learning Research, 23 0 (39): 0 1--70, 2022. URL http://jmlr.org/papers/v23/21-0100.html

2022
[5]

A continuous time framework for discrete denoising models

Andrew Campbell, Joe Benton, Valentin De Bortoli, Thomas Rainforth, George Deligiannidis, and Arnaud Doucet. A continuous time framework for discrete denoising models. Advances in Neural Information Processing Systems, 35: 0 28266--28279, 2022

2022
[6]

Generative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design

Andrew Campbell, Jason Yim, Regina Barzilay, Tom Rainforth, and Tommi Jaakkola. Generative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design. In Forty-first International Conference on Machine Learning, 2024. URL https://openreview.net/forum?id=kQwSbv0BR4

2024
[7]

The numerical stability of leaping methods for stochastic simulation of chemically reacting systems

Yang Cao, Linda R Petzold, Muruhan Rathinam, and Daniel T Gillespie. The numerical stability of leaping methods for stochastic simulation of chemically reacting systems. Journal of Chemical Physics, 121 0 (24): 0 12169--12178, 2004

2004
[8]

Efficient step size selection for the tau-leaping simulation method

Yang Cao, Daniel T Gillespie, and Linda R Petzold. Efficient step size selection for the tau-leaping simulation method. The Journal of chemical physics, 124 0 (4), 2006

2006
[9]

Maskgit: Masked generative image transformer

Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, and William T Freeman. Maskgit: Masked generative image transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 11315--11325, 2022

2022
[10]

arXiv preprint arXiv:2301.00704 , year=

Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T Freeman, Michael Rubinstein, et al. Muse: Text-to-image generation via masked generative transformers. arXiv preprint arXiv:2301.00704, 2023

work page arXiv 2023
[11]

Convergence analysis of discrete diffusion model: Exact implementation through uniformization

Hongrui Chen and Lexing Ying. Convergence analysis of discrete diffusion model: Exact implementation through uniformization. Journal of Machine Learning, 4 0 (2): 0 108--127, June 2025. doi:10.4208/jml.240812. URL https://www.global-sci.com/index.php/jml/article/view/13211

work page doi:10.4208/jml.240812 2025
[12]

Optimal inference schedules for masked diffusion models

Sitan Chen, Kevin Cong, and Jerry Li. Optimal inference schedules for masked diffusion models. arXiv preprint arXiv:2511.04647, 2025

work page arXiv 2025
[13]

Fast sampling via discrete non-markov diffusion models with predetermined transition time

Zixiang Chen, Huizhuo Yuan, Yongqian Li, Yiwen Kou, Junkai Zhang, and Quanquan Gu. Fast sampling via discrete non-markov diffusion models with predetermined transition time. Advances in Neural Information Processing Systems, 37: 0 106870--106905, 2024

2024
[14]

Non-asymptotic convergence of discrete diffusion models: Masked and random walk dynamics

Giovanni Conforti, Alain Durmus, Le-Tuyet-Nhi Pham, and Gael Raoul. Non-asymptotic convergence of discrete diffusion models: Masked and random walk dynamics. arXiv preprint arXiv:2512.00580, 2025

work page arXiv 2025
[15]

Efficient sampling with discrete diffusion models: Sharp and adaptive guarantees

Daniil Dmitriev, Zhihan Huang, and Yuting Wei. Efficient sampling with discrete diffusion models: Sharp and adaptive guarantees. arXiv preprint arXiv:2602.15008, 2026

work page internal anchor Pith review arXiv 2026
[16]

Approximate accelerated stochastic simulation of chemically reacting systems

Daniel T Gillespie. Approximate accelerated stochastic simulation of chemically reacting systems. The Journal of chemical physics, 115 0 (4): 0 1716--1733, 2001

2001
[17]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33: 0 6840--6851, 2020

2020
[18]

Simulation from endpoint-conditioned, continuous-time markov chains on a finite state space, with applications to molecular evolution

Asger Hobolth and Eric A Stone. Simulation from endpoint-conditioned, continuous-time markov chains on a finite state space, with applications to molecular evolution. The annals of applied statistics, 3 0 (3): 0 1204, 2009

2009
[19]

Argmax flows and multinomial diffusion: Learning categorical distributions

Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forr \'e , and Max Welling. Argmax flows and multinomial diffusion: Learning categorical distributions. Advances in neural information processing systems, 34: 0 12454--12465, 2021

2021
[20]

Reversibility and stochastic networks

Frank P Kelly. Reversibility and stochastic networks. J. Wiley, 1979

1979
[21]

Tabddpm: Modelling tabular data with diffusion models

Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, and Artem Babenko. Tabddpm: Modelling tabular data with diffusion models. In International conference on machine learning, pp.\ 17564--17579. PMLR, 2023

2023
[22]

Discrete markov probabilistic models: An improved discrete score-based framework with sharp convergence bounds under minimal assumptions

PHAM Le-Tuyet-Nhi, Dario Shariatian, Antonio Ocello, Giovanni Conforti, and Alain Oliviero Durmus. Discrete markov probabilistic models: An improved discrete score-based framework with sharp convergence bounds under minimal assumptions. In Forty-second International Conference on Machine Learning, 2025

2025
[23]

Markov chains and mixing times, volume 107

David A Levin and Yuval Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017

2017
[24]

Breaking ar’s sampling bottleneck: Provable acceleration via diffusion language models

Gen Li and Changxiao Cai. Breaking ar’s sampling bottleneck: Provable acceleration via diffusion language models. Advances in Neural Information Processing Systems, 38: 0 11700--11725, 2026

2026
[25]

Neural Continuous-Time Markov Chain: Discrete Diffusion via Decoupled Jump Timing and Direction

Jingyuan Li, Xiaoyi Jiang, Fukang Wen, Wei Liu, Renqian Luo, Yi Zhu, Zuoqiang Shi, and Pipi Hu. Neural continuous-time markov chain: Discrete diffusion via decoupled jump timing and direction. arXiv preprint arXiv:2604.15694, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[26]

Absorb and converge: Provable convergence guarantee for absorbing discrete diffusion models

Yuchen Liang, Renxiang Huang, Lifeng Lai, Ness Shroff, and Yingbin Liang. Absorb and converge: Provable convergence guarantee for absorbing discrete diffusion models. Advances in Neural Information Processing Systems, 38: 0 20283--20318, 2025

2025
[27]

Discrete diffusion models: Novel analysis and new sampler guarantees

Yuchen Liang, Yingbin Liang, Lifeng Lai, and Ness Shroff. Discrete diffusion models: Novel analysis and new sampler guarantees. Advances in Neural Information Processing Systems, 38: 0 165511--165548, 2026 a

2026
[28]

Sharp convergence rates for masked diffusion models, 2026 b

Yuchen Liang, Zhiheng Tan, Ness Shroff, and Yingbin Liang. Sharp convergence rates for masked diffusion models, 2026 b . URL https://arxiv.org/abs/2602.22505

work page arXiv 2026
[29]

Discrete diffusion modeling by estimating the ratios of the data distribution

Aaron Lou, Chenlin Meng, and Stefano Ermon. Discrete diffusion modeling by estimating the ratios of the data distribution. In Forty-first International Conference on Machine Learning, 2024. URL https://openreview.net/forum?id=CNicRIVIPA

2024
[30]

Concrete score matching: Generalized score matching for discrete data

Chenlin Meng, Kristy Choi, Jiaming Song, and Stefano Ermon. Concrete score matching: Generalized score matching for discrete data. Advances in Neural Information Processing Systems, 35: 0 34532--34545, 2022

2022
[31]

Score-based generative models are provably robust: an uncertainty quantification perspective

Nikiforos Mimikos-Stamatopoulos, Benjamin J Zhang, and Markos A Katsoulakis. Score-based generative models are provably robust: an uncertainty quantification perspective. Advances in Neural Information Processing Systems, 37: 0 63154--63183, 2024

2024
[32]

Large language diffusion models

Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models. Advances in Neural Information Processing Systems, 38: 0 50608--50646, 2026

2026
[33]

Jump your steps: Optimizing sampling schedule of discrete diffusion models

Yong-Hyun Park, Chieh-Hsin Lai, Satoshi Hayakawa, Yuhta Takida, and Yuki Mitsufuji. Jump your steps: Optimizing sampling schedule of discrete diffusion models. In International Conference on Learning Representations, volume 2025, pp.\ 96272--96300, 2025

2025
[34]

Computational optimal transport: With applications to data science

Gabriel Peyr \'e and Marco Cuturi. Computational optimal transport: With applications to data science. Now Foundations and Trends, 2019

2019
[35]

How discrete and continuous diffusion meet: Comprehensive analysis of discrete diffusion models via a stochastic integral framework

Yinuo Ren, Haoxuan Chen, Grant Rotskoff, and Lexing Ying. How discrete and continuous diffusion meet: Comprehensive analysis of discrete diffusion models via a stochastic integral framework. In International Conference on Learning Representations, volume 2025, pp.\ 42904--42941, 2025

2025
[36]

Fast solvers for discrete diffusion models: Theory and applications of high-order algorithms

Yinuo Ren, Haoxuan Chen, Yuchen Zhu, Wei Guo, Yongxin Chen, Grant Rotskoff, Molei Tao, and Lexing Ying. Fast solvers for discrete diffusion models: Theory and applications of high-order algorithms. Advances in Neural Information Processing Systems, 38: 0 167228--167282, 2026

2026
[37]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj \"o rn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 10684--10695, 2022

2022
[38]

Simple and effective masked diffusion language models

Subham S Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, and Volodymyr Kuleshov. Simple and effective masked diffusion language models. Advances in Neural Information Processing Systems, 37: 0 130136--130184, 2024

2024
[39]

Simple guidance mechanisms for discrete diffusion models

Yair Schiff, Subham Sahoo, Hao Phung, Guanghan Wang, Sam Boshar, Hugo Dalla-torre, Bernardo Almeida, Alexander Rush, Thomas Pierrot, and Volodymyr Kuleshov. Simple guidance mechanisms for discrete diffusion models. In International Conference on Learning Representations, volume 2025, pp.\ 43776--43821, 2025

2025
[40]

Simplified and generalized masked diffusion for discrete data

Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, and Michalis Titsias. Simplified and generalized masked diffusion for discrete data. Advances in neural information processing systems, 37: 0 103131--103167, 2024

2024
[41]

Deep unsupervised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp.\ 2256--2265. pmlr, 2015

2015
[42]

Denoising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021 a . URL https://openreview.net/forum?id=St1giarCHLP

2021
[43]

Score-based generative modeling through stochastic differential equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021 b . URL https://openreview.net/forum?id=PxTIG12RRHS

2021
[44]

On integral probability metrics, \phi-divergences and binary classification

Bharath K Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Bernhard Sch \"o lkopf, and Gert RG Lanckriet. On integral probability metrics, phi-divergences and binary classification. arXiv preprint arXiv:0901.2698, 2009

work page internal anchor Pith review Pith/arXiv arXiv 2009
[45]

Score-based continuous-time discrete diffusion models

Haoran Sun, Lijun Yu, Bo Dai, Dale Schuurmans, and Hanjun Dai. Score-based continuous-time discrete diffusion models. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=BYWWwSY2G5s

2023
[46]

Digress: Discrete denoising diffusion for graph generation

Clement Vignac, Igor Krawczuk, Antoine Siraudin, Bohan Wang, Volkan Cevher, and Pascal Frossard. Digress: Discrete denoising diffusion for graph generation. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=UaAD-Nu86WX

2023
[47]

Convergence of score-based discrete diffusion models: A discrete-time analysis

Zikun Zhang, Zixiang Chen, and Quanquan Gu. Convergence of score-based discrete diffusion models: A discrete-time analysis. In International Conference on Learning Representations, volume 2025, pp.\ 34747--34772, 2025

2025
[48]

Unified discrete diffusion for categorical data

Lingxiao Zhao, Xueying Ding, Lijun Yu, and Leman Akoglu. Unified discrete diffusion for categorical data. Journal of Machine Learning Research, 26 0 (215): 0 1--49, 2025

2025
[49]

Informed correctors for discrete diffusion models

Yixiu Zhao, Jiaxin Shi, Feng Chen, Shaul Druckmann, Lester Mackey, and Scott Linderman. Informed correctors for discrete diffusion models. Advances in Neural Information Processing Systems, 38: 0 125510--125538, 2026

2026
[50]

Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling

Kaiwen Zheng, Yongxin Chen, Hanzi Mao, Ming-Yu Liu, Jun Zhu, and Qinsheng Zhang. Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling. In International Conference on Learning Representations, volume 2025, pp.\ 63186--63227, 2025

2025
[51]

Mdns: Masked diffusion neural sampler via stochastic optimal control

Yuchen Zhu, Wei Guo, Jaemoo Choi, Guan-Horng Liu, Yongxin Chen, and Molei Tao. Mdns: Masked diffusion neural sampler via stochastic optimal control. Advances in Neural Information Processing Systems, 38: 0 35260--35308, 2026

2026

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

[2] [2]

On the numerical analysis of inhomogeneous continuous-time markov chains

Markus Arns, Peter Buchholz, and Andriy Panchenko. On the numerical analysis of inhomogeneous continuous-time markov chains. INFORMS Journal on Computing, 22 0 (3): 0 416--432, 2010

2010

[3] [3]

Structured denoising diffusion models in discrete state-spaces

Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tarlow, and Rianne Van Den Berg. Structured denoising diffusion models in discrete state-spaces. Advances in neural information processing systems, 34: 0 17981--17993, 2021

2021

[4] [4]

Katsoulakis, Yannis Pantazis, and Luc Rey-Bellet

Jeremiah Birrell, Paul Dupuis, Markos A. Katsoulakis, Yannis Pantazis, and Luc Rey-Bellet. (f,gamma)-divergences: Interpolating between f-divergences and integral probability metrics. Journal of Machine Learning Research, 23 0 (39): 0 1--70, 2022. URL http://jmlr.org/papers/v23/21-0100.html

2022

[5] [5]

A continuous time framework for discrete denoising models

Andrew Campbell, Joe Benton, Valentin De Bortoli, Thomas Rainforth, George Deligiannidis, and Arnaud Doucet. A continuous time framework for discrete denoising models. Advances in Neural Information Processing Systems, 35: 0 28266--28279, 2022

2022

[6] [6]

Generative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design

Andrew Campbell, Jason Yim, Regina Barzilay, Tom Rainforth, and Tommi Jaakkola. Generative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design. In Forty-first International Conference on Machine Learning, 2024. URL https://openreview.net/forum?id=kQwSbv0BR4

2024

[7] [7]

The numerical stability of leaping methods for stochastic simulation of chemically reacting systems

Yang Cao, Linda R Petzold, Muruhan Rathinam, and Daniel T Gillespie. The numerical stability of leaping methods for stochastic simulation of chemically reacting systems. Journal of Chemical Physics, 121 0 (24): 0 12169--12178, 2004

2004

[8] [8]

Efficient step size selection for the tau-leaping simulation method

Yang Cao, Daniel T Gillespie, and Linda R Petzold. Efficient step size selection for the tau-leaping simulation method. The Journal of chemical physics, 124 0 (4), 2006

2006

[9] [9]

Maskgit: Masked generative image transformer

Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, and William T Freeman. Maskgit: Masked generative image transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 11315--11325, 2022

2022

[10] [10]

arXiv preprint arXiv:2301.00704 , year=

Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T Freeman, Michael Rubinstein, et al. Muse: Text-to-image generation via masked generative transformers. arXiv preprint arXiv:2301.00704, 2023

work page arXiv 2023

[11] [11]

Convergence analysis of discrete diffusion model: Exact implementation through uniformization

Hongrui Chen and Lexing Ying. Convergence analysis of discrete diffusion model: Exact implementation through uniformization. Journal of Machine Learning, 4 0 (2): 0 108--127, June 2025. doi:10.4208/jml.240812. URL https://www.global-sci.com/index.php/jml/article/view/13211

work page doi:10.4208/jml.240812 2025

[12] [12]

Optimal inference schedules for masked diffusion models

Sitan Chen, Kevin Cong, and Jerry Li. Optimal inference schedules for masked diffusion models. arXiv preprint arXiv:2511.04647, 2025

work page arXiv 2025

[13] [13]

Fast sampling via discrete non-markov diffusion models with predetermined transition time

Zixiang Chen, Huizhuo Yuan, Yongqian Li, Yiwen Kou, Junkai Zhang, and Quanquan Gu. Fast sampling via discrete non-markov diffusion models with predetermined transition time. Advances in Neural Information Processing Systems, 37: 0 106870--106905, 2024

2024

[14] [14]

Non-asymptotic convergence of discrete diffusion models: Masked and random walk dynamics

Giovanni Conforti, Alain Durmus, Le-Tuyet-Nhi Pham, and Gael Raoul. Non-asymptotic convergence of discrete diffusion models: Masked and random walk dynamics. arXiv preprint arXiv:2512.00580, 2025

work page arXiv 2025

[15] [15]

Efficient sampling with discrete diffusion models: Sharp and adaptive guarantees

Daniil Dmitriev, Zhihan Huang, and Yuting Wei. Efficient sampling with discrete diffusion models: Sharp and adaptive guarantees. arXiv preprint arXiv:2602.15008, 2026

work page internal anchor Pith review arXiv 2026

[16] [16]

Approximate accelerated stochastic simulation of chemically reacting systems

Daniel T Gillespie. Approximate accelerated stochastic simulation of chemically reacting systems. The Journal of chemical physics, 115 0 (4): 0 1716--1733, 2001

2001

[17] [17]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33: 0 6840--6851, 2020

2020

[18] [18]

Simulation from endpoint-conditioned, continuous-time markov chains on a finite state space, with applications to molecular evolution

Asger Hobolth and Eric A Stone. Simulation from endpoint-conditioned, continuous-time markov chains on a finite state space, with applications to molecular evolution. The annals of applied statistics, 3 0 (3): 0 1204, 2009

2009

[19] [19]

Argmax flows and multinomial diffusion: Learning categorical distributions

Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forr \'e , and Max Welling. Argmax flows and multinomial diffusion: Learning categorical distributions. Advances in neural information processing systems, 34: 0 12454--12465, 2021

2021

[20] [20]

Reversibility and stochastic networks

Frank P Kelly. Reversibility and stochastic networks. J. Wiley, 1979

1979

[21] [21]

Tabddpm: Modelling tabular data with diffusion models

Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, and Artem Babenko. Tabddpm: Modelling tabular data with diffusion models. In International conference on machine learning, pp.\ 17564--17579. PMLR, 2023

2023

[22] [22]

Discrete markov probabilistic models: An improved discrete score-based framework with sharp convergence bounds under minimal assumptions

PHAM Le-Tuyet-Nhi, Dario Shariatian, Antonio Ocello, Giovanni Conforti, and Alain Oliviero Durmus. Discrete markov probabilistic models: An improved discrete score-based framework with sharp convergence bounds under minimal assumptions. In Forty-second International Conference on Machine Learning, 2025

2025

[23] [23]

Markov chains and mixing times, volume 107

David A Levin and Yuval Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017

2017

[24] [24]

Breaking ar’s sampling bottleneck: Provable acceleration via diffusion language models

Gen Li and Changxiao Cai. Breaking ar’s sampling bottleneck: Provable acceleration via diffusion language models. Advances in Neural Information Processing Systems, 38: 0 11700--11725, 2026

2026

[25] [25]

Neural Continuous-Time Markov Chain: Discrete Diffusion via Decoupled Jump Timing and Direction

Jingyuan Li, Xiaoyi Jiang, Fukang Wen, Wei Liu, Renqian Luo, Yi Zhu, Zuoqiang Shi, and Pipi Hu. Neural continuous-time markov chain: Discrete diffusion via decoupled jump timing and direction. arXiv preprint arXiv:2604.15694, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[26] [26]

Absorb and converge: Provable convergence guarantee for absorbing discrete diffusion models

Yuchen Liang, Renxiang Huang, Lifeng Lai, Ness Shroff, and Yingbin Liang. Absorb and converge: Provable convergence guarantee for absorbing discrete diffusion models. Advances in Neural Information Processing Systems, 38: 0 20283--20318, 2025

2025

[27] [27]

Discrete diffusion models: Novel analysis and new sampler guarantees

Yuchen Liang, Yingbin Liang, Lifeng Lai, and Ness Shroff. Discrete diffusion models: Novel analysis and new sampler guarantees. Advances in Neural Information Processing Systems, 38: 0 165511--165548, 2026 a

2026

[28] [28]

Sharp convergence rates for masked diffusion models, 2026 b

Yuchen Liang, Zhiheng Tan, Ness Shroff, and Yingbin Liang. Sharp convergence rates for masked diffusion models, 2026 b . URL https://arxiv.org/abs/2602.22505

work page arXiv 2026

[29] [29]

Discrete diffusion modeling by estimating the ratios of the data distribution

Aaron Lou, Chenlin Meng, and Stefano Ermon. Discrete diffusion modeling by estimating the ratios of the data distribution. In Forty-first International Conference on Machine Learning, 2024. URL https://openreview.net/forum?id=CNicRIVIPA

2024

[30] [30]

Concrete score matching: Generalized score matching for discrete data

Chenlin Meng, Kristy Choi, Jiaming Song, and Stefano Ermon. Concrete score matching: Generalized score matching for discrete data. Advances in Neural Information Processing Systems, 35: 0 34532--34545, 2022

2022

[31] [31]

Score-based generative models are provably robust: an uncertainty quantification perspective

Nikiforos Mimikos-Stamatopoulos, Benjamin J Zhang, and Markos A Katsoulakis. Score-based generative models are provably robust: an uncertainty quantification perspective. Advances in Neural Information Processing Systems, 37: 0 63154--63183, 2024

2024

[32] [32]

Large language diffusion models

Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models. Advances in Neural Information Processing Systems, 38: 0 50608--50646, 2026

2026

[33] [33]

Jump your steps: Optimizing sampling schedule of discrete diffusion models

Yong-Hyun Park, Chieh-Hsin Lai, Satoshi Hayakawa, Yuhta Takida, and Yuki Mitsufuji. Jump your steps: Optimizing sampling schedule of discrete diffusion models. In International Conference on Learning Representations, volume 2025, pp.\ 96272--96300, 2025

2025

[34] [34]

Computational optimal transport: With applications to data science

Gabriel Peyr \'e and Marco Cuturi. Computational optimal transport: With applications to data science. Now Foundations and Trends, 2019

2019

[35] [35]

How discrete and continuous diffusion meet: Comprehensive analysis of discrete diffusion models via a stochastic integral framework

Yinuo Ren, Haoxuan Chen, Grant Rotskoff, and Lexing Ying. How discrete and continuous diffusion meet: Comprehensive analysis of discrete diffusion models via a stochastic integral framework. In International Conference on Learning Representations, volume 2025, pp.\ 42904--42941, 2025

2025

[36] [36]

Fast solvers for discrete diffusion models: Theory and applications of high-order algorithms

Yinuo Ren, Haoxuan Chen, Yuchen Zhu, Wei Guo, Yongxin Chen, Grant Rotskoff, Molei Tao, and Lexing Ying. Fast solvers for discrete diffusion models: Theory and applications of high-order algorithms. Advances in Neural Information Processing Systems, 38: 0 167228--167282, 2026

2026

[37] [37]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj \"o rn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 10684--10695, 2022

2022

[38] [38]

Simple and effective masked diffusion language models

Subham S Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, and Volodymyr Kuleshov. Simple and effective masked diffusion language models. Advances in Neural Information Processing Systems, 37: 0 130136--130184, 2024

2024

[39] [39]

Simple guidance mechanisms for discrete diffusion models

Yair Schiff, Subham Sahoo, Hao Phung, Guanghan Wang, Sam Boshar, Hugo Dalla-torre, Bernardo Almeida, Alexander Rush, Thomas Pierrot, and Volodymyr Kuleshov. Simple guidance mechanisms for discrete diffusion models. In International Conference on Learning Representations, volume 2025, pp.\ 43776--43821, 2025

2025

[40] [40]

Simplified and generalized masked diffusion for discrete data

Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, and Michalis Titsias. Simplified and generalized masked diffusion for discrete data. Advances in neural information processing systems, 37: 0 103131--103167, 2024

2024

[41] [41]

Deep unsupervised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp.\ 2256--2265. pmlr, 2015

2015

[42] [42]

Denoising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021 a . URL https://openreview.net/forum?id=St1giarCHLP

2021

[43] [43]

Score-based generative modeling through stochastic differential equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021 b . URL https://openreview.net/forum?id=PxTIG12RRHS

2021

[44] [44]

On integral probability metrics, \phi-divergences and binary classification

Bharath K Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Bernhard Sch \"o lkopf, and Gert RG Lanckriet. On integral probability metrics, phi-divergences and binary classification. arXiv preprint arXiv:0901.2698, 2009

work page internal anchor Pith review Pith/arXiv arXiv 2009

[45] [45]

Score-based continuous-time discrete diffusion models

Haoran Sun, Lijun Yu, Bo Dai, Dale Schuurmans, and Hanjun Dai. Score-based continuous-time discrete diffusion models. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=BYWWwSY2G5s

2023

[46] [46]

Digress: Discrete denoising diffusion for graph generation

Clement Vignac, Igor Krawczuk, Antoine Siraudin, Bohan Wang, Volkan Cevher, and Pascal Frossard. Digress: Discrete denoising diffusion for graph generation. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=UaAD-Nu86WX

2023

[47] [47]

Convergence of score-based discrete diffusion models: A discrete-time analysis

Zikun Zhang, Zixiang Chen, and Quanquan Gu. Convergence of score-based discrete diffusion models: A discrete-time analysis. In International Conference on Learning Representations, volume 2025, pp.\ 34747--34772, 2025

2025

[48] [48]

Unified discrete diffusion for categorical data

Lingxiao Zhao, Xueying Ding, Lijun Yu, and Leman Akoglu. Unified discrete diffusion for categorical data. Journal of Machine Learning Research, 26 0 (215): 0 1--49, 2025

2025

[49] [49]

Informed correctors for discrete diffusion models

Yixiu Zhao, Jiaxin Shi, Feng Chen, Shaul Druckmann, Lester Mackey, and Scott Linderman. Informed correctors for discrete diffusion models. Advances in Neural Information Processing Systems, 38: 0 125510--125538, 2026

2026

[50] [50]

Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling

Kaiwen Zheng, Yongxin Chen, Hanzi Mao, Ming-Yu Liu, Jun Zhu, and Qinsheng Zhang. Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling. In International Conference on Learning Representations, volume 2025, pp.\ 63186--63227, 2025

2025

[51] [51]

Mdns: Masked diffusion neural sampler via stochastic optimal control

Yuchen Zhu, Wei Guo, Jaemoo Choi, Guan-Horng Liu, Yongxin Chen, and Molei Tao. Mdns: Masked diffusion neural sampler via stochastic optimal control. Advances in Neural Information Processing Systems, 38: 0 35260--35308, 2026

2026