Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space
Pith reviewed 2026-05-20 14:43 UTC · model grok-4.3
The pith
Adjoint equations let discrete diffusion converge in any integral probability metric without depending on state space size.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We develop a unified adjoint-equation-based framework that establishes dimension-free convergence guarantees in any integral probability metric. To the best of our knowledge, our bounds are the first to be entirely free of S and applicable to both masked and uniform priors. The theory relies only on a single standard rate-matrix regularity assumption and is compatible with time-inhomogeneous schedules. Four novel techniques drive the improvements: working in the space of observables via adjoint equations rather than directly with probability measures, a regularity analysis that yields bounds on any IPM, a coupling argument that removes S-dependence under uniform transitions, and a score-marg
What carries the argument
The adjoint-equation framework, which analyzes the evolution of observables instead of probability measures to obtain contraction in any integral probability metric.
If this is right
- Convergence guarantees remain useful even when the discrete state space reaches hundreds of thousands of tokens.
- The same analysis covers both uniform and masked transition kernels without modification.
- Bounds continue to hold when the diffusion schedule varies with time.
- The framework supplies a general toolkit for studying additional properties of discrete diffusion beyond convergence.
Where Pith is reading between the lines
- Similar adjoint techniques might remove dimension dependence when analyzing other discrete generative processes such as autoregressive models.
- The cancellation identity used for masked transitions could be adapted to derive bounds under additional structured priors.
- The regularity assumption may be verified or relaxed for common rate matrices arising in biological sequence generation.
Load-bearing premise
The rate matrix must obey a standard regularity condition that permits bounding the generator and establishing contraction in the chosen integral probability metric.
What would settle it
A concrete counter-example in which a practical rate matrix for masked diffusion on a large state space yields an IPM bound that grows with S or fails to contract at the claimed rate.
read the original abstract
Discrete diffusion has become a leading framework for generative modeling in various applications including language, vision, and biology. Existing convergence theory, however, exhibits fundamental limitations. KL-based analyses diverge under singular priors such as the masked distribution, while bounds in total variation (TV) depend on the state space size $S$ and become vacuous for modern language tasks, where vocabularies contain hundreds of thousands of tokens. We develop a unified adjoint-equation-based framework that establishes dimension-free convergence guarantees in any integral probability metric (IPM). To the best of our knowledge, our bounds are the first to be entirely free of $S$ and applicable to both masked and uniform priors. Importantly, our theory relies only on a single standard rate-matrix regularity assumption and is compatible with time-inhomogeneous schedules. Four novel techniques drive our improvements: working in the space of observables via adjoint equations rather than directly with probability measures, a regularity analysis that yields bounds on any IPM, a coupling argument that removes $S$-dependence under uniform transitions, and a score-marginal cancellation technique that removes $S$-dependence under masked transitions. Our framework thus sharply departs from prior analyses and avoids the shortcomings of pathspace-KL and existing TV-based approaches. Beyond convergence bounds, our framework provides a versatile toolkit for further theoretical study of discrete diffusion models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a unified adjoint-equation-based framework for proving dimension-free convergence of discrete diffusion models in arbitrary integral probability metrics (IPMs). It claims the first S-free bounds that apply to both masked and uniform priors, relying on a single standard rate-matrix regularity assumption and remaining compatible with time-inhomogeneous schedules. The improvements are attributed to four techniques: working in the space of observables via adjoints, a regularity analysis yielding IPM bounds, a coupling argument removing S-dependence for uniform transitions, and a score-marginal cancellation technique for masked transitions.
Significance. If the dimension-free IPM bounds hold under the stated regularity assumption, the result would be significant for the theoretical analysis of discrete diffusion models in high-dimensional discrete spaces such as large-vocabulary language modeling, where prior KL and TV analyses either diverge or become vacuous. The adjoint perspective and the explicit handling of both prior types constitute a versatile toolkit that could support further study of time-inhomogeneous schedules.
major comments (2)
- [masked transition analysis] The central claim that the score-marginal cancellation technique removes all S-dependence for masked transitions (Abstract and the corresponding derivation) rests on the rate-matrix regularity assumption producing S-independent constants in the generator bound. The manuscript should explicitly verify that the assumption holds with S-independent constants for standard masked rate matrices under typical time-inhomogeneous schedules; otherwise the IPM contraction retains an implicit dimension factor.
- [uniform vs. masked comparison] § on coupling argument: the uniform-transition coupling is stated to be robust, but the manuscript should clarify whether the same regularity assumption is used uniformly for both priors or whether separate constants appear; any S-dependent factor in the masked case would undermine the unified dimension-free claim.
minor comments (2)
- [Introduction] The abstract lists four techniques; ensure each is given a dedicated subsection with a clear statement of the technical novelty relative to prior path-space KL and TV analyses.
- [Preliminaries] Notation for the IPM and the adjoint operator should be introduced once and used consistently; currently the transition from probability-measure to observable space is described at a high level.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive feedback. We address each major comment below and have revised the manuscript to incorporate clarifications and explicit verifications where needed.
read point-by-point responses
-
Referee: [masked transition analysis] The central claim that the score-marginal cancellation technique removes all S-dependence for masked transitions (Abstract and the corresponding derivation) rests on the rate-matrix regularity assumption producing S-independent constants in the generator bound. The manuscript should explicitly verify that the assumption holds with S-independent constants for standard masked rate matrices under typical time-inhomogeneous schedules; otherwise the IPM contraction retains an implicit dimension factor.
Authors: We agree that an explicit verification is valuable for clarity. In the revised manuscript we have added a new subsection that directly verifies the rate-matrix regularity assumption for standard masked rate matrices (including the common linear and cosine time-inhomogeneous schedules). The verification shows that the resulting constants in the generator bound remain independent of S, confirming that the score-marginal cancellation removes all dimension dependence without introducing implicit factors. revision: yes
-
Referee: [uniform vs. masked comparison] § on coupling argument: the uniform-transition coupling is stated to be robust, but the manuscript should clarify whether the same regularity assumption is used uniformly for both priors or whether separate constants appear; any S-dependent factor in the masked case would undermine the unified dimension-free claim.
Authors: The manuscript employs the identical rate-matrix regularity assumption for both priors. We have revised the coupling-argument section to state explicitly that the constants obtained from this assumption are the same for the uniform and masked cases, with the score-marginal cancellation ensuring that no S-dependent factors appear in the masked setting. This preserves the unified dimension-free claim. revision: partial
Circularity Check
No circularity: derivation relies on external regularity assumption and independent techniques
full rationale
The paper presents a unified adjoint-equation framework deriving dimension-free IPM convergence bounds for discrete diffusion under masked and uniform priors. It explicitly invokes one standard rate-matrix regularity assumption as the sole modeling hypothesis, with the four listed techniques (adjoint observables, regularity analysis for IPMs, coupling for uniform transitions, score-marginal cancellation for masked) operating on top of that assumption. No equations or claims in the abstract reduce a prediction to a fitted input, rename a known result, or depend on self-citation chains for load-bearing steps. The derivation chain is therefore self-contained against external benchmarks and does not collapse to its inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Single standard rate-matrix regularity assumption sufficient for IPM contraction under both masked and uniform transitions
Reference graph
Works this paper leans on
-
[1]
Katsoulakis and Yannis Pantazis and Luc Rey-Bellet , title =
Jeremiah Birrell and Paul Dupuis and Markos A. Katsoulakis and Yannis Pantazis and Luc Rey-Bellet , title =. Journal of Machine Learning Research , year =
-
[2]
Sharp Convergence Rates for Masked Diffusion Models , author=. 2026 , eprint=
work page 2026
-
[3]
Forty-first International Conference on Machine Learning , year=
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution , author=. Forty-first International Conference on Machine Learning , year=
-
[4]
Advances in Neural Information Processing Systems , volume=
A continuous time framework for discrete denoising models , author=. Advances in Neural Information Processing Systems , volume=
-
[5]
Forty-first International Conference on Machine Learning , year=
Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design , author=. Forty-first International Conference on Machine Learning , year=
-
[6]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
-
[7]
International conference on machine learning , pages=
Tabddpm: Modelling tabular data with diffusion models , author=. International conference on machine learning , pages=. 2023 , organization=
work page 2023
-
[8]
The Eleventh International Conference on Learning Representations , year=
DiGress: Discrete Denoising diffusion for graph generation , author=. The Eleventh International Conference on Learning Representations , year=
-
[9]
International Conference on Learning Representations , volume=
Simple guidance mechanisms for discrete diffusion models , author=. International Conference on Learning Representations , volume=
-
[10]
International Conference on Learning Representations , volume=
Convergence of score-based discrete diffusion models: A discrete-time analysis , author=. International Conference on Learning Representations , volume=
-
[11]
Journal of Machine Learning , volume =
Convergence Analysis of Discrete Diffusion Model: Exact Implementation through Uniformization , author =. Journal of Machine Learning , volume =. 2025 , month = jun, doi =
work page 2025
-
[12]
Advances in neural information processing systems , volume=
Structured denoising diffusion models in discrete state-spaces , author=. Advances in neural information processing systems , volume=
- [13]
-
[14]
Journal of Machine Learning Research , volume=
Unified discrete diffusion for categorical data , author=. Journal of Machine Learning Research , volume=
-
[15]
The Eleventh International Conference on Learning Representations , year=
Score-based Continuous-time Discrete Diffusion Models , author=. The Eleventh International Conference on Learning Representations , year=
-
[16]
Advances in Neural Information Processing Systems , volume=
Concrete score matching: Generalized score matching for discrete data , author=. Advances in Neural Information Processing Systems , volume=
-
[17]
International Conference on Learning Representations , year=
Score-Based Generative Modeling through Stochastic Differential Equations , author=. International Conference on Learning Representations , year=
-
[18]
International Conference on Learning Representations , volume=
How discrete and continuous diffusion meet: Comprehensive analysis of discrete diffusion models via a stochastic integral framework , author=. International Conference on Learning Representations , volume=
-
[19]
Advances in neural information processing systems , volume=
Elucidating the design space of diffusion-based generative models , author=. Advances in neural information processing systems , volume=
-
[20]
Advances in Neural Information Processing Systems , volume=
Fast solvers for discrete diffusion models: Theory and applications of high-order algorithms , author=. Advances in Neural Information Processing Systems , volume=
-
[21]
The Journal of chemical physics , volume=
Approximate accelerated stochastic simulation of chemically reacting systems , author=. The Journal of chemical physics , volume=. 2001 , publisher=
work page 2001
-
[22]
The Journal of chemical physics , volume=
Efficient step size selection for the tau-leaping simulation method , author=. The Journal of chemical physics , volume=. 2006 , publisher=
work page 2006
-
[23]
The annals of applied statistics , volume=
Simulation from endpoint-conditioned, continuous-time Markov chains on a finite state space, with applications to molecular evolution , author=. The annals of applied statistics , volume=
-
[25]
Advances in Neural Information Processing Systems , volume=
Mdns: Masked diffusion neural sampler via stochastic optimal control , author=. Advances in Neural Information Processing Systems , volume=
-
[26]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
Maskgit: Masked generative image transformer , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
-
[28]
Advances in Neural Information Processing Systems , volume=
Large language diffusion models , author=. Advances in Neural Information Processing Systems , volume=
-
[29]
Advances in neural information processing systems , volume=
Simplified and generalized masked diffusion for discrete data , author=. Advances in neural information processing systems , volume=
-
[30]
Advances in Neural Information Processing Systems , volume=
Informed correctors for discrete diffusion models , author=. Advances in Neural Information Processing Systems , volume=
-
[31]
Forty-second International Conference on Machine Learning , year=
Discrete markov probabilistic models: An improved discrete score-based framework with sharp convergence bounds under minimal assumptions , author=. Forty-second International Conference on Machine Learning , year=
-
[32]
Journal of Chemical Physics , volume=
The numerical stability of leaping methods for stochastic simulation of chemically reacting systems , author=. Journal of Chemical Physics , volume=
-
[33]
INFORMS Journal on Computing , volume=
On the numerical analysis of inhomogeneous continuous-time Markov chains , author=. INFORMS Journal on Computing , volume=. 2010 , publisher=
work page 2010
-
[34]
Computational optimal transport: With applications to data science , author=. 2019 , publisher=
work page 2019
-
[36]
Advances in Neural Information Processing Systems , volume=
Score-based generative models are provably robust: an uncertainty quantification perspective , author=. Advances in Neural Information Processing Systems , volume=
-
[37]
Advances in Neural Information Processing Systems , volume=
Simple and effective masked diffusion language models , author=. Advances in Neural Information Processing Systems , volume=
-
[38]
International Conference on Learning Representations , volume=
Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling , author=. International Conference on Learning Representations , volume=
-
[39]
International Conference on Learning Representations , volume=
Jump your steps: Optimizing sampling schedule of discrete diffusion models , author=. International Conference on Learning Representations , volume=
-
[40]
Advances in Neural Information Processing Systems , volume=
Why masking diffusion works: Condition on the jump schedule for improved discrete diffusion , author=. Advances in Neural Information Processing Systems , volume=
- [41]
-
[42]
Advances in Neural Information Processing Systems , volume=
Absorb and converge: Provable convergence guarantee for absorbing discrete diffusion models , author=. Advances in Neural Information Processing Systems , volume=
-
[43]
International conference on machine learning , pages=
Deep unsupervised learning using nonequilibrium thermodynamics , author=. International conference on machine learning , pages=. 2015 , organization=
work page 2015
-
[44]
Advances in neural information processing systems , volume=
Argmax flows and multinomial diffusion: Learning categorical distributions , author=. Advances in neural information processing systems , volume=
-
[45]
Advances in Neural Information Processing Systems , volume=
Fast sampling via discrete non-markov diffusion models with predetermined transition time , author=. Advances in Neural Information Processing Systems , volume=
-
[46]
Advances in neural information processing systems , volume=
Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=
-
[47]
International Conference on Learning Representations , year=
Denoising Diffusion Implicit Models , author=. International Conference on Learning Representations , year=
-
[48]
Advances in Neural Information Processing Systems , volume=
Discrete diffusion models: Novel analysis and new sampler guarantees , author=. Advances in Neural Information Processing Systems , volume=
-
[51]
Advances in Neural Information Processing Systems , volume=
Breaking AR’s sampling bottleneck: Provable acceleration via diffusion language models , author=. Advances in Neural Information Processing Systems , volume=
-
[53]
Proceedings of the 42nd International Conference on Machine Learning , year =
Scheduling Thoughts: Learning the Order of Thought in Diffusion Language Models , author =. Proceedings of the 42nd International Conference on Machine Learning , year =
-
[54]
On the numerical analysis of inhomogeneous continuous-time markov chains
Markus Arns, Peter Buchholz, and Andriy Panchenko. On the numerical analysis of inhomogeneous continuous-time markov chains. INFORMS Journal on Computing, 22 0 (3): 0 416--432, 2010
work page 2010
-
[55]
Structured denoising diffusion models in discrete state-spaces
Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tarlow, and Rianne Van Den Berg. Structured denoising diffusion models in discrete state-spaces. Advances in neural information processing systems, 34: 0 17981--17993, 2021
work page 2021
-
[56]
Katsoulakis, Yannis Pantazis, and Luc Rey-Bellet
Jeremiah Birrell, Paul Dupuis, Markos A. Katsoulakis, Yannis Pantazis, and Luc Rey-Bellet. (f,gamma)-divergences: Interpolating between f-divergences and integral probability metrics. Journal of Machine Learning Research, 23 0 (39): 0 1--70, 2022. URL http://jmlr.org/papers/v23/21-0100.html
work page 2022
-
[57]
A continuous time framework for discrete denoising models
Andrew Campbell, Joe Benton, Valentin De Bortoli, Thomas Rainforth, George Deligiannidis, and Arnaud Doucet. A continuous time framework for discrete denoising models. Advances in Neural Information Processing Systems, 35: 0 28266--28279, 2022
work page 2022
-
[58]
Andrew Campbell, Jason Yim, Regina Barzilay, Tom Rainforth, and Tommi Jaakkola. Generative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design. In Forty-first International Conference on Machine Learning, 2024. URL https://openreview.net/forum?id=kQwSbv0BR4
work page 2024
-
[59]
The numerical stability of leaping methods for stochastic simulation of chemically reacting systems
Yang Cao, Linda R Petzold, Muruhan Rathinam, and Daniel T Gillespie. The numerical stability of leaping methods for stochastic simulation of chemically reacting systems. Journal of Chemical Physics, 121 0 (24): 0 12169--12178, 2004
work page 2004
-
[60]
Efficient step size selection for the tau-leaping simulation method
Yang Cao, Daniel T Gillespie, and Linda R Petzold. Efficient step size selection for the tau-leaping simulation method. The Journal of chemical physics, 124 0 (4), 2006
work page 2006
-
[61]
Maskgit: Masked generative image transformer
Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, and William T Freeman. Maskgit: Masked generative image transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 11315--11325, 2022
work page 2022
-
[62]
Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T Freeman, Michael Rubinstein, et al. Muse: Text-to-image generation via masked generative transformers. arXiv preprint arXiv:2301.00704, 2023
-
[63]
Convergence analysis of discrete diffusion model: Exact implementation through uniformization
Hongrui Chen and Lexing Ying. Convergence analysis of discrete diffusion model: Exact implementation through uniformization. Journal of Machine Learning, 4 0 (2): 0 108--127, June 2025. doi:10.4208/jml.240812. URL https://www.global-sci.com/index.php/jml/article/view/13211
-
[64]
Optimal inference schedules for masked diffusion models
Sitan Chen, Kevin Cong, and Jerry Li. Optimal inference schedules for masked diffusion models. arXiv preprint arXiv:2511.04647, 2025
-
[65]
Fast sampling via discrete non-markov diffusion models with predetermined transition time
Zixiang Chen, Huizhuo Yuan, Yongqian Li, Yiwen Kou, Junkai Zhang, and Quanquan Gu. Fast sampling via discrete non-markov diffusion models with predetermined transition time. Advances in Neural Information Processing Systems, 37: 0 106870--106905, 2024
work page 2024
-
[66]
Non-asymptotic convergence of discrete diffusion models: Masked and random walk dynamics
Giovanni Conforti, Alain Durmus, Le-Tuyet-Nhi Pham, and Gael Raoul. Non-asymptotic convergence of discrete diffusion models: Masked and random walk dynamics. arXiv preprint arXiv:2512.00580, 2025
-
[67]
Efficient sampling with discrete diffusion models: Sharp and adaptive guarantees
Daniil Dmitriev, Zhihan Huang, and Yuting Wei. Efficient sampling with discrete diffusion models: Sharp and adaptive guarantees. arXiv preprint arXiv:2602.15008, 2026
-
[68]
Approximate accelerated stochastic simulation of chemically reacting systems
Daniel T Gillespie. Approximate accelerated stochastic simulation of chemically reacting systems. The Journal of chemical physics, 115 0 (4): 0 1716--1733, 2001
work page 2001
-
[69]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33: 0 6840--6851, 2020
work page 2020
-
[70]
Asger Hobolth and Eric A Stone. Simulation from endpoint-conditioned, continuous-time markov chains on a finite state space, with applications to molecular evolution. The annals of applied statistics, 3 0 (3): 0 1204, 2009
work page 2009
-
[71]
Argmax flows and multinomial diffusion: Learning categorical distributions
Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forr \'e , and Max Welling. Argmax flows and multinomial diffusion: Learning categorical distributions. Advances in neural information processing systems, 34: 0 12454--12465, 2021
work page 2021
-
[72]
Reversibility and stochastic networks
Frank P Kelly. Reversibility and stochastic networks. J. Wiley, 1979
work page 1979
-
[73]
Tabddpm: Modelling tabular data with diffusion models
Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, and Artem Babenko. Tabddpm: Modelling tabular data with diffusion models. In International conference on machine learning, pp.\ 17564--17579. PMLR, 2023
work page 2023
-
[74]
PHAM Le-Tuyet-Nhi, Dario Shariatian, Antonio Ocello, Giovanni Conforti, and Alain Oliviero Durmus. Discrete markov probabilistic models: An improved discrete score-based framework with sharp convergence bounds under minimal assumptions. In Forty-second International Conference on Machine Learning, 2025
work page 2025
-
[75]
Markov chains and mixing times, volume 107
David A Levin and Yuval Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017
work page 2017
-
[76]
Breaking ar’s sampling bottleneck: Provable acceleration via diffusion language models
Gen Li and Changxiao Cai. Breaking ar’s sampling bottleneck: Provable acceleration via diffusion language models. Advances in Neural Information Processing Systems, 38: 0 11700--11725, 2026
work page 2026
-
[77]
Neural Continuous-Time Markov Chain: Discrete Diffusion via Decoupled Jump Timing and Direction
Jingyuan Li, Xiaoyi Jiang, Fukang Wen, Wei Liu, Renqian Luo, Yi Zhu, Zuoqiang Shi, and Pipi Hu. Neural continuous-time markov chain: Discrete diffusion via decoupled jump timing and direction. arXiv preprint arXiv:2604.15694, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[78]
Absorb and converge: Provable convergence guarantee for absorbing discrete diffusion models
Yuchen Liang, Renxiang Huang, Lifeng Lai, Ness Shroff, and Yingbin Liang. Absorb and converge: Provable convergence guarantee for absorbing discrete diffusion models. Advances in Neural Information Processing Systems, 38: 0 20283--20318, 2025
work page 2025
-
[79]
Discrete diffusion models: Novel analysis and new sampler guarantees
Yuchen Liang, Yingbin Liang, Lifeng Lai, and Ness Shroff. Discrete diffusion models: Novel analysis and new sampler guarantees. Advances in Neural Information Processing Systems, 38: 0 165511--165548, 2026 a
work page 2026
-
[80]
Sharp convergence rates for masked diffusion models, 2026 b
Yuchen Liang, Zhiheng Tan, Ness Shroff, and Yingbin Liang. Sharp convergence rates for masked diffusion models, 2026 b . URL https://arxiv.org/abs/2602.22505
-
[81]
Discrete diffusion modeling by estimating the ratios of the data distribution
Aaron Lou, Chenlin Meng, and Stefano Ermon. Discrete diffusion modeling by estimating the ratios of the data distribution. In Forty-first International Conference on Machine Learning, 2024. URL https://openreview.net/forum?id=CNicRIVIPA
work page 2024
-
[82]
Concrete score matching: Generalized score matching for discrete data
Chenlin Meng, Kristy Choi, Jiaming Song, and Stefano Ermon. Concrete score matching: Generalized score matching for discrete data. Advances in Neural Information Processing Systems, 35: 0 34532--34545, 2022
work page 2022
-
[83]
Score-based generative models are provably robust: an uncertainty quantification perspective
Nikiforos Mimikos-Stamatopoulos, Benjamin J Zhang, and Markos A Katsoulakis. Score-based generative models are provably robust: an uncertainty quantification perspective. Advances in Neural Information Processing Systems, 37: 0 63154--63183, 2024
work page 2024
-
[84]
Large language diffusion models
Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models. Advances in Neural Information Processing Systems, 38: 0 50608--50646, 2026
work page 2026
-
[85]
Jump your steps: Optimizing sampling schedule of discrete diffusion models
Yong-Hyun Park, Chieh-Hsin Lai, Satoshi Hayakawa, Yuhta Takida, and Yuki Mitsufuji. Jump your steps: Optimizing sampling schedule of discrete diffusion models. In International Conference on Learning Representations, volume 2025, pp.\ 96272--96300, 2025
work page 2025
-
[86]
Computational optimal transport: With applications to data science
Gabriel Peyr \'e and Marco Cuturi. Computational optimal transport: With applications to data science. Now Foundations and Trends, 2019
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.