pith. sign in

arxiv: 2605.17232 · v1 · pith:SQ62TMIUnew · submitted 2026-05-17 · 💻 cs.LG · math.ST· stat.ML· stat.TH

Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space

Pith reviewed 2026-05-20 14:43 UTC · model grok-4.3

classification 💻 cs.LG math.STstat.MLstat.TH
keywords discrete diffusionconvergence analysisadjoint equationsintegral probability metricsdimension-free boundsmasked diffusionrate matrix
0
0 comments X

The pith

Adjoint equations let discrete diffusion converge in any integral probability metric without depending on state space size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a unified framework that proves dimension-free convergence rates for discrete diffusion models in any integral probability metric. Existing approaches either diverge for singular priors such as the masked distribution or produce bounds that grow with the state space size S, rendering them useless for vocabularies with hundreds of thousands of tokens. The new analysis works directly with observables through adjoint equations, applies a regularity argument to control any IPM, and removes S-dependence via a coupling construction for uniform transitions and a score-marginal cancellation for masked transitions. These steps rest on one standard assumption about the rate matrix and remain valid for time-inhomogeneous schedules. A sympathetic reader cares because the result supplies the first practical theoretical guarantees for generative modeling in language and other large discrete domains.

Core claim

We develop a unified adjoint-equation-based framework that establishes dimension-free convergence guarantees in any integral probability metric. To the best of our knowledge, our bounds are the first to be entirely free of S and applicable to both masked and uniform priors. The theory relies only on a single standard rate-matrix regularity assumption and is compatible with time-inhomogeneous schedules. Four novel techniques drive the improvements: working in the space of observables via adjoint equations rather than directly with probability measures, a regularity analysis that yields bounds on any IPM, a coupling argument that removes S-dependence under uniform transitions, and a score-marg

What carries the argument

The adjoint-equation framework, which analyzes the evolution of observables instead of probability measures to obtain contraction in any integral probability metric.

If this is right

  • Convergence guarantees remain useful even when the discrete state space reaches hundreds of thousands of tokens.
  • The same analysis covers both uniform and masked transition kernels without modification.
  • Bounds continue to hold when the diffusion schedule varies with time.
  • The framework supplies a general toolkit for studying additional properties of discrete diffusion beyond convergence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar adjoint techniques might remove dimension dependence when analyzing other discrete generative processes such as autoregressive models.
  • The cancellation identity used for masked transitions could be adapted to derive bounds under additional structured priors.
  • The regularity assumption may be verified or relaxed for common rate matrices arising in biological sequence generation.

Load-bearing premise

The rate matrix must obey a standard regularity condition that permits bounding the generator and establishing contraction in the chosen integral probability metric.

What would settle it

A concrete counter-example in which a practical rate matrix for masked diffusion on a large state space yields an IPM bound that grows with S or fails to contract at the claimed rate.

read the original abstract

Discrete diffusion has become a leading framework for generative modeling in various applications including language, vision, and biology. Existing convergence theory, however, exhibits fundamental limitations. KL-based analyses diverge under singular priors such as the masked distribution, while bounds in total variation (TV) depend on the state space size $S$ and become vacuous for modern language tasks, where vocabularies contain hundreds of thousands of tokens. We develop a unified adjoint-equation-based framework that establishes dimension-free convergence guarantees in any integral probability metric (IPM). To the best of our knowledge, our bounds are the first to be entirely free of $S$ and applicable to both masked and uniform priors. Importantly, our theory relies only on a single standard rate-matrix regularity assumption and is compatible with time-inhomogeneous schedules. Four novel techniques drive our improvements: working in the space of observables via adjoint equations rather than directly with probability measures, a regularity analysis that yields bounds on any IPM, a coupling argument that removes $S$-dependence under uniform transitions, and a score-marginal cancellation technique that removes $S$-dependence under masked transitions. Our framework thus sharply departs from prior analyses and avoids the shortcomings of pathspace-KL and existing TV-based approaches. Beyond convergence bounds, our framework provides a versatile toolkit for further theoretical study of discrete diffusion models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a unified adjoint-equation-based framework for proving dimension-free convergence of discrete diffusion models in arbitrary integral probability metrics (IPMs). It claims the first S-free bounds that apply to both masked and uniform priors, relying on a single standard rate-matrix regularity assumption and remaining compatible with time-inhomogeneous schedules. The improvements are attributed to four techniques: working in the space of observables via adjoints, a regularity analysis yielding IPM bounds, a coupling argument removing S-dependence for uniform transitions, and a score-marginal cancellation technique for masked transitions.

Significance. If the dimension-free IPM bounds hold under the stated regularity assumption, the result would be significant for the theoretical analysis of discrete diffusion models in high-dimensional discrete spaces such as large-vocabulary language modeling, where prior KL and TV analyses either diverge or become vacuous. The adjoint perspective and the explicit handling of both prior types constitute a versatile toolkit that could support further study of time-inhomogeneous schedules.

major comments (2)
  1. [masked transition analysis] The central claim that the score-marginal cancellation technique removes all S-dependence for masked transitions (Abstract and the corresponding derivation) rests on the rate-matrix regularity assumption producing S-independent constants in the generator bound. The manuscript should explicitly verify that the assumption holds with S-independent constants for standard masked rate matrices under typical time-inhomogeneous schedules; otherwise the IPM contraction retains an implicit dimension factor.
  2. [uniform vs. masked comparison] § on coupling argument: the uniform-transition coupling is stated to be robust, but the manuscript should clarify whether the same regularity assumption is used uniformly for both priors or whether separate constants appear; any S-dependent factor in the masked case would undermine the unified dimension-free claim.
minor comments (2)
  1. [Introduction] The abstract lists four techniques; ensure each is given a dedicated subsection with a clear statement of the technical novelty relative to prior path-space KL and TV analyses.
  2. [Preliminaries] Notation for the IPM and the adjoint operator should be introduced once and used consistently; currently the transition from probability-measure to observable space is described at a high level.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive feedback. We address each major comment below and have revised the manuscript to incorporate clarifications and explicit verifications where needed.

read point-by-point responses
  1. Referee: [masked transition analysis] The central claim that the score-marginal cancellation technique removes all S-dependence for masked transitions (Abstract and the corresponding derivation) rests on the rate-matrix regularity assumption producing S-independent constants in the generator bound. The manuscript should explicitly verify that the assumption holds with S-independent constants for standard masked rate matrices under typical time-inhomogeneous schedules; otherwise the IPM contraction retains an implicit dimension factor.

    Authors: We agree that an explicit verification is valuable for clarity. In the revised manuscript we have added a new subsection that directly verifies the rate-matrix regularity assumption for standard masked rate matrices (including the common linear and cosine time-inhomogeneous schedules). The verification shows that the resulting constants in the generator bound remain independent of S, confirming that the score-marginal cancellation removes all dimension dependence without introducing implicit factors. revision: yes

  2. Referee: [uniform vs. masked comparison] § on coupling argument: the uniform-transition coupling is stated to be robust, but the manuscript should clarify whether the same regularity assumption is used uniformly for both priors or whether separate constants appear; any S-dependent factor in the masked case would undermine the unified dimension-free claim.

    Authors: The manuscript employs the identical rate-matrix regularity assumption for both priors. We have revised the coupling-argument section to state explicitly that the constants obtained from this assumption are the same for the uniform and masked cases, with the score-marginal cancellation ensuring that no S-dependent factors appear in the masked setting. This preserves the unified dimension-free claim. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation relies on external regularity assumption and independent techniques

full rationale

The paper presents a unified adjoint-equation framework deriving dimension-free IPM convergence bounds for discrete diffusion under masked and uniform priors. It explicitly invokes one standard rate-matrix regularity assumption as the sole modeling hypothesis, with the four listed techniques (adjoint observables, regularity analysis for IPMs, coupling for uniform transitions, score-marginal cancellation for masked) operating on top of that assumption. No equations or claims in the abstract reduce a prediction to a fitted input, rename a known result, or depend on self-citation chains for load-bearing steps. The derivation chain is therefore self-contained against external benchmarks and does not collapse to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on one standard rate-matrix regularity assumption whose precise statement is not given in the abstract; no free parameters or invented entities are mentioned.

axioms (1)
  • domain assumption Single standard rate-matrix regularity assumption sufficient for IPM contraction under both masked and uniform transitions
    Invoked to obtain bounds free of state-space size S; location implied in the convergence analysis section referenced by the abstract.

pith-pipeline@v0.9.0 · 5796 in / 1211 out tokens · 39595 ms · 2026-05-20T14:43:32.126197+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

97 extracted references · 97 canonical work pages · 2 internal anchors

  1. [1]

    Katsoulakis and Yannis Pantazis and Luc Rey-Bellet , title =

    Jeremiah Birrell and Paul Dupuis and Markos A. Katsoulakis and Yannis Pantazis and Luc Rey-Bellet , title =. Journal of Machine Learning Research , year =

  2. [2]

    2026 , eprint=

    Sharp Convergence Rates for Masked Diffusion Models , author=. 2026 , eprint=

  3. [3]

    Forty-first International Conference on Machine Learning , year=

    Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution , author=. Forty-first International Conference on Machine Learning , year=

  4. [4]

    Advances in Neural Information Processing Systems , volume=

    A continuous time framework for discrete denoising models , author=. Advances in Neural Information Processing Systems , volume=

  5. [5]

    Forty-first International Conference on Machine Learning , year=

    Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design , author=. Forty-first International Conference on Machine Learning , year=

  6. [6]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  7. [7]

    International conference on machine learning , pages=

    Tabddpm: Modelling tabular data with diffusion models , author=. International conference on machine learning , pages=. 2023 , organization=

  8. [8]

    The Eleventh International Conference on Learning Representations , year=

    DiGress: Discrete Denoising diffusion for graph generation , author=. The Eleventh International Conference on Learning Representations , year=

  9. [9]

    International Conference on Learning Representations , volume=

    Simple guidance mechanisms for discrete diffusion models , author=. International Conference on Learning Representations , volume=

  10. [10]

    International Conference on Learning Representations , volume=

    Convergence of score-based discrete diffusion models: A discrete-time analysis , author=. International Conference on Learning Representations , volume=

  11. [11]

    Journal of Machine Learning , volume =

    Convergence Analysis of Discrete Diffusion Model: Exact Implementation through Uniformization , author =. Journal of Machine Learning , volume =. 2025 , month = jun, doi =

  12. [12]

    Advances in neural information processing systems , volume=

    Structured denoising diffusion models in discrete state-spaces , author=. Advances in neural information processing systems , volume=

  13. [13]

    1979 , publisher=

    Reversibility and stochastic networks , author=. 1979 , publisher=

  14. [14]

    Journal of Machine Learning Research , volume=

    Unified discrete diffusion for categorical data , author=. Journal of Machine Learning Research , volume=

  15. [15]

    The Eleventh International Conference on Learning Representations , year=

    Score-based Continuous-time Discrete Diffusion Models , author=. The Eleventh International Conference on Learning Representations , year=

  16. [16]

    Advances in Neural Information Processing Systems , volume=

    Concrete score matching: Generalized score matching for discrete data , author=. Advances in Neural Information Processing Systems , volume=

  17. [17]

    International Conference on Learning Representations , year=

    Score-Based Generative Modeling through Stochastic Differential Equations , author=. International Conference on Learning Representations , year=

  18. [18]

    International Conference on Learning Representations , volume=

    How discrete and continuous diffusion meet: Comprehensive analysis of discrete diffusion models via a stochastic integral framework , author=. International Conference on Learning Representations , volume=

  19. [19]

    Advances in neural information processing systems , volume=

    Elucidating the design space of diffusion-based generative models , author=. Advances in neural information processing systems , volume=

  20. [20]

    Advances in Neural Information Processing Systems , volume=

    Fast solvers for discrete diffusion models: Theory and applications of high-order algorithms , author=. Advances in Neural Information Processing Systems , volume=

  21. [21]

    The Journal of chemical physics , volume=

    Approximate accelerated stochastic simulation of chemically reacting systems , author=. The Journal of chemical physics , volume=. 2001 , publisher=

  22. [22]

    The Journal of chemical physics , volume=

    Efficient step size selection for the tau-leaping simulation method , author=. The Journal of chemical physics , volume=. 2006 , publisher=

  23. [23]

    The annals of applied statistics , volume=

    Simulation from endpoint-conditioned, continuous-time Markov chains on a finite state space, with applications to molecular evolution , author=. The annals of applied statistics , volume=

  24. [25]

    Advances in Neural Information Processing Systems , volume=

    Mdns: Masked diffusion neural sampler via stochastic optimal control , author=. Advances in Neural Information Processing Systems , volume=

  25. [26]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Maskgit: Masked generative image transformer , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  26. [28]

    Advances in Neural Information Processing Systems , volume=

    Large language diffusion models , author=. Advances in Neural Information Processing Systems , volume=

  27. [29]

    Advances in neural information processing systems , volume=

    Simplified and generalized masked diffusion for discrete data , author=. Advances in neural information processing systems , volume=

  28. [30]

    Advances in Neural Information Processing Systems , volume=

    Informed correctors for discrete diffusion models , author=. Advances in Neural Information Processing Systems , volume=

  29. [31]

    Forty-second International Conference on Machine Learning , year=

    Discrete markov probabilistic models: An improved discrete score-based framework with sharp convergence bounds under minimal assumptions , author=. Forty-second International Conference on Machine Learning , year=

  30. [32]

    Journal of Chemical Physics , volume=

    The numerical stability of leaping methods for stochastic simulation of chemically reacting systems , author=. Journal of Chemical Physics , volume=

  31. [33]

    INFORMS Journal on Computing , volume=

    On the numerical analysis of inhomogeneous continuous-time Markov chains , author=. INFORMS Journal on Computing , volume=. 2010 , publisher=

  32. [34]

    2019 , publisher=

    Computational optimal transport: With applications to data science , author=. 2019 , publisher=

  33. [36]

    Advances in Neural Information Processing Systems , volume=

    Score-based generative models are provably robust: an uncertainty quantification perspective , author=. Advances in Neural Information Processing Systems , volume=

  34. [37]

    Advances in Neural Information Processing Systems , volume=

    Simple and effective masked diffusion language models , author=. Advances in Neural Information Processing Systems , volume=

  35. [38]

    International Conference on Learning Representations , volume=

    Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling , author=. International Conference on Learning Representations , volume=

  36. [39]

    International Conference on Learning Representations , volume=

    Jump your steps: Optimizing sampling schedule of discrete diffusion models , author=. International Conference on Learning Representations , volume=

  37. [40]

    Advances in Neural Information Processing Systems , volume=

    Why masking diffusion works: Condition on the jump schedule for improved discrete diffusion , author=. Advances in Neural Information Processing Systems , volume=

  38. [41]

    2017 , publisher=

    Markov chains and mixing times , author=. 2017 , publisher=

  39. [42]

    Advances in Neural Information Processing Systems , volume=

    Absorb and converge: Provable convergence guarantee for absorbing discrete diffusion models , author=. Advances in Neural Information Processing Systems , volume=

  40. [43]

    International conference on machine learning , pages=

    Deep unsupervised learning using nonequilibrium thermodynamics , author=. International conference on machine learning , pages=. 2015 , organization=

  41. [44]

    Advances in neural information processing systems , volume=

    Argmax flows and multinomial diffusion: Learning categorical distributions , author=. Advances in neural information processing systems , volume=

  42. [45]

    Advances in Neural Information Processing Systems , volume=

    Fast sampling via discrete non-markov diffusion models with predetermined transition time , author=. Advances in Neural Information Processing Systems , volume=

  43. [46]

    Advances in neural information processing systems , volume=

    Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=

  44. [47]

    International Conference on Learning Representations , year=

    Denoising Diffusion Implicit Models , author=. International Conference on Learning Representations , year=

  45. [48]

    Advances in Neural Information Processing Systems , volume=

    Discrete diffusion models: Novel analysis and new sampler guarantees , author=. Advances in Neural Information Processing Systems , volume=

  46. [51]

    Advances in Neural Information Processing Systems , volume=

    Breaking AR’s sampling bottleneck: Provable acceleration via diffusion language models , author=. Advances in Neural Information Processing Systems , volume=

  47. [53]

    Proceedings of the 42nd International Conference on Machine Learning , year =

    Scheduling Thoughts: Learning the Order of Thought in Diffusion Language Models , author =. Proceedings of the 42nd International Conference on Machine Learning , year =

  48. [54]

    On the numerical analysis of inhomogeneous continuous-time markov chains

    Markus Arns, Peter Buchholz, and Andriy Panchenko. On the numerical analysis of inhomogeneous continuous-time markov chains. INFORMS Journal on Computing, 22 0 (3): 0 416--432, 2010

  49. [55]

    Structured denoising diffusion models in discrete state-spaces

    Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tarlow, and Rianne Van Den Berg. Structured denoising diffusion models in discrete state-spaces. Advances in neural information processing systems, 34: 0 17981--17993, 2021

  50. [56]

    Katsoulakis, Yannis Pantazis, and Luc Rey-Bellet

    Jeremiah Birrell, Paul Dupuis, Markos A. Katsoulakis, Yannis Pantazis, and Luc Rey-Bellet. (f,gamma)-divergences: Interpolating between f-divergences and integral probability metrics. Journal of Machine Learning Research, 23 0 (39): 0 1--70, 2022. URL http://jmlr.org/papers/v23/21-0100.html

  51. [57]

    A continuous time framework for discrete denoising models

    Andrew Campbell, Joe Benton, Valentin De Bortoli, Thomas Rainforth, George Deligiannidis, and Arnaud Doucet. A continuous time framework for discrete denoising models. Advances in Neural Information Processing Systems, 35: 0 28266--28279, 2022

  52. [58]

    Generative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design

    Andrew Campbell, Jason Yim, Regina Barzilay, Tom Rainforth, and Tommi Jaakkola. Generative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design. In Forty-first International Conference on Machine Learning, 2024. URL https://openreview.net/forum?id=kQwSbv0BR4

  53. [59]

    The numerical stability of leaping methods for stochastic simulation of chemically reacting systems

    Yang Cao, Linda R Petzold, Muruhan Rathinam, and Daniel T Gillespie. The numerical stability of leaping methods for stochastic simulation of chemically reacting systems. Journal of Chemical Physics, 121 0 (24): 0 12169--12178, 2004

  54. [60]

    Efficient step size selection for the tau-leaping simulation method

    Yang Cao, Daniel T Gillespie, and Linda R Petzold. Efficient step size selection for the tau-leaping simulation method. The Journal of chemical physics, 124 0 (4), 2006

  55. [61]

    Maskgit: Masked generative image transformer

    Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, and William T Freeman. Maskgit: Masked generative image transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 11315--11325, 2022

  56. [62]

    Muse: Text-to- image generation via masked generative transformers.arXiv preprint arXiv:2301.00704, 2023

    Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T Freeman, Michael Rubinstein, et al. Muse: Text-to-image generation via masked generative transformers. arXiv preprint arXiv:2301.00704, 2023

  57. [63]

    Convergence analysis of discrete diffusion model: Exact implementation through uniformization

    Hongrui Chen and Lexing Ying. Convergence analysis of discrete diffusion model: Exact implementation through uniformization. Journal of Machine Learning, 4 0 (2): 0 108--127, June 2025. doi:10.4208/jml.240812. URL https://www.global-sci.com/index.php/jml/article/view/13211

  58. [64]

    Optimal inference schedules for masked diffusion models

    Sitan Chen, Kevin Cong, and Jerry Li. Optimal inference schedules for masked diffusion models. arXiv preprint arXiv:2511.04647, 2025

  59. [65]

    Fast sampling via discrete non-markov diffusion models with predetermined transition time

    Zixiang Chen, Huizhuo Yuan, Yongqian Li, Yiwen Kou, Junkai Zhang, and Quanquan Gu. Fast sampling via discrete non-markov diffusion models with predetermined transition time. Advances in Neural Information Processing Systems, 37: 0 106870--106905, 2024

  60. [66]

    Non-asymptotic convergence of discrete diffusion models: Masked and random walk dynamics

    Giovanni Conforti, Alain Durmus, Le-Tuyet-Nhi Pham, and Gael Raoul. Non-asymptotic convergence of discrete diffusion models: Masked and random walk dynamics. arXiv preprint arXiv:2512.00580, 2025

  61. [67]

    Efficient sampling with discrete diffusion models: Sharp and adaptive guarantees

    Daniil Dmitriev, Zhihan Huang, and Yuting Wei. Efficient sampling with discrete diffusion models: Sharp and adaptive guarantees. arXiv preprint arXiv:2602.15008, 2026

  62. [68]

    Approximate accelerated stochastic simulation of chemically reacting systems

    Daniel T Gillespie. Approximate accelerated stochastic simulation of chemically reacting systems. The Journal of chemical physics, 115 0 (4): 0 1716--1733, 2001

  63. [69]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33: 0 6840--6851, 2020

  64. [70]

    Simulation from endpoint-conditioned, continuous-time markov chains on a finite state space, with applications to molecular evolution

    Asger Hobolth and Eric A Stone. Simulation from endpoint-conditioned, continuous-time markov chains on a finite state space, with applications to molecular evolution. The annals of applied statistics, 3 0 (3): 0 1204, 2009

  65. [71]

    Argmax flows and multinomial diffusion: Learning categorical distributions

    Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forr \'e , and Max Welling. Argmax flows and multinomial diffusion: Learning categorical distributions. Advances in neural information processing systems, 34: 0 12454--12465, 2021

  66. [72]

    Reversibility and stochastic networks

    Frank P Kelly. Reversibility and stochastic networks. J. Wiley, 1979

  67. [73]

    Tabddpm: Modelling tabular data with diffusion models

    Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, and Artem Babenko. Tabddpm: Modelling tabular data with diffusion models. In International conference on machine learning, pp.\ 17564--17579. PMLR, 2023

  68. [74]

    Discrete markov probabilistic models: An improved discrete score-based framework with sharp convergence bounds under minimal assumptions

    PHAM Le-Tuyet-Nhi, Dario Shariatian, Antonio Ocello, Giovanni Conforti, and Alain Oliviero Durmus. Discrete markov probabilistic models: An improved discrete score-based framework with sharp convergence bounds under minimal assumptions. In Forty-second International Conference on Machine Learning, 2025

  69. [75]

    Markov chains and mixing times, volume 107

    David A Levin and Yuval Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017

  70. [76]

    Breaking ar’s sampling bottleneck: Provable acceleration via diffusion language models

    Gen Li and Changxiao Cai. Breaking ar’s sampling bottleneck: Provable acceleration via diffusion language models. Advances in Neural Information Processing Systems, 38: 0 11700--11725, 2026

  71. [77]

    Neural Continuous-Time Markov Chain: Discrete Diffusion via Decoupled Jump Timing and Direction

    Jingyuan Li, Xiaoyi Jiang, Fukang Wen, Wei Liu, Renqian Luo, Yi Zhu, Zuoqiang Shi, and Pipi Hu. Neural continuous-time markov chain: Discrete diffusion via decoupled jump timing and direction. arXiv preprint arXiv:2604.15694, 2026

  72. [78]

    Absorb and converge: Provable convergence guarantee for absorbing discrete diffusion models

    Yuchen Liang, Renxiang Huang, Lifeng Lai, Ness Shroff, and Yingbin Liang. Absorb and converge: Provable convergence guarantee for absorbing discrete diffusion models. Advances in Neural Information Processing Systems, 38: 0 20283--20318, 2025

  73. [79]

    Discrete diffusion models: Novel analysis and new sampler guarantees

    Yuchen Liang, Yingbin Liang, Lifeng Lai, and Ness Shroff. Discrete diffusion models: Novel analysis and new sampler guarantees. Advances in Neural Information Processing Systems, 38: 0 165511--165548, 2026 a

  74. [80]

    Sharp convergence rates for masked diffusion models, 2026 b

    Yuchen Liang, Zhiheng Tan, Ness Shroff, and Yingbin Liang. Sharp convergence rates for masked diffusion models, 2026 b . URL https://arxiv.org/abs/2602.22505

  75. [81]

    Discrete diffusion modeling by estimating the ratios of the data distribution

    Aaron Lou, Chenlin Meng, and Stefano Ermon. Discrete diffusion modeling by estimating the ratios of the data distribution. In Forty-first International Conference on Machine Learning, 2024. URL https://openreview.net/forum?id=CNicRIVIPA

  76. [82]

    Concrete score matching: Generalized score matching for discrete data

    Chenlin Meng, Kristy Choi, Jiaming Song, and Stefano Ermon. Concrete score matching: Generalized score matching for discrete data. Advances in Neural Information Processing Systems, 35: 0 34532--34545, 2022

  77. [83]

    Score-based generative models are provably robust: an uncertainty quantification perspective

    Nikiforos Mimikos-Stamatopoulos, Benjamin J Zhang, and Markos A Katsoulakis. Score-based generative models are provably robust: an uncertainty quantification perspective. Advances in Neural Information Processing Systems, 37: 0 63154--63183, 2024

  78. [84]

    Large language diffusion models

    Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models. Advances in Neural Information Processing Systems, 38: 0 50608--50646, 2026

  79. [85]

    Jump your steps: Optimizing sampling schedule of discrete diffusion models

    Yong-Hyun Park, Chieh-Hsin Lai, Satoshi Hayakawa, Yuhta Takida, and Yuki Mitsufuji. Jump your steps: Optimizing sampling schedule of discrete diffusion models. In International Conference on Learning Representations, volume 2025, pp.\ 96272--96300, 2025

  80. [86]

    Computational optimal transport: With applications to data science

    Gabriel Peyr \'e and Marco Cuturi. Computational optimal transport: With applications to data science. Now Foundations and Trends, 2019

Showing first 80 references.