Accelerating Discrete Diffusion Models with Parallel-In-Time Sampling

Andi Han; Huanjian Zhou; Masashi Sugiyama; Wei Huang; Yu Yao

arxiv: 2607.00773 · v1 · pith:3K6WDTWBnew · submitted 2026-07-01 · 💻 cs.LG · cs.DC· cs.DS· cs.NA· math.NA

Accelerating Discrete Diffusion Models with Parallel-In-Time Sampling

Yu Yao , Huanjian Zhou , Andi Han , Wei Huang , Masashi Sugiyama This is my paper

Pith reviewed 2026-07-02 16:07 UTC · model grok-4.3

classification 💻 cs.LG cs.DCcs.DScs.NAmath.NA

keywords discrete diffusion modelsparallel-in-time samplingtau-leapingcontinuous-time Markov chainPicard iterationsampling accelerationabsorbing diffusion

0 comments

The pith

Parallel-in-time sampling via Picard iteration accelerates τ-leaping in discrete diffusion models, reducing complexity from O(d log S) to O(log(d log S) log d).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to speed up the sequential sampling process in discrete diffusion models for generating discrete data distributions. It reformulates the mainstream τ-leaping algorithm in continuous-time stochastic integral form and applies Picard iteration to enable parallel computation across time steps. The work includes a convergence proof and shows improved time complexity with respect to number of function evaluations. A sympathetic reader would care because faster parallel sampling would make these models more usable for tasks like text or molecular generation without sacrificing output quality. The approach is tested on both synthetic distributions and real image and text data.

Core claim

By leveraging the continuous-time stochastic integral form of the τ-leaping algorithm and the Picard iteration method, we achieve parallel-in-time sampling acceleration and provide a proof of exponential-factorial convergence for our algorithm. We improve the overall time complexity of τ-leaping under absorbing settings from O(d log S) to O(log (d log S) · log d) with respect to NFE.

What carries the argument

Picard iteration applied to the continuous-time stochastic integral form of the τ-leaping algorithm in an absorbing CTMC setting, enabling parallel evaluation across time steps.

If this is right

Time complexity with respect to NFE drops from linear in dimension d to logarithmic in d and log S.
The sampler achieves at most 7-9x runtime speedup on synthetic distributions.
Sample quality is preserved with 50 percent fewer NFE and 1.45-1.86x runtime speedups on image and text tasks.
Acceleration remains consistent across synthetic and real-data settings on a single GPU.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same Picard-based parallelization could be tested on other CTMC sampling procedures that are not diffusion models.
If the exponential-factorial convergence rate holds in practice, it may allow further reduction of iterations for very high-dimensional discrete spaces.
Hardware that supports fine-grained parallelism might see additional wall-clock gains beyond the reported single-GPU numbers.

Load-bearing premise

The continuous-time stochastic integral form of τ-leaping admits Picard iterations that can be run in parallel across time without changing the statistical properties of the samples produced by the original sequential process.

What would settle it

An experiment that runs both the sequential τ-leaping sampler and the proposed parallel version on identical models and finds that the output sample distributions differ in measurable statistics such as per-coordinate marginals or pairwise correlations.

Figures

Figures reproduced from arXiv: 2607.00773 by Andi Han, Huanjian Zhou, Masashi Sugiyama, Wei Huang, Yu Yao.

**Figure 2.** Figure 2: Visualization of L⋆. F.2 Lscore Estimation In this experiment, we measure the unweighted score sensitivity along RADD text generation at sequence length d = 512. The sampler uses 64 Picard blocks, M = 4 fine microsteps per block, and Picard depth Kp = 2, corresponding to 256 underlying fine-grid microsteps and 128 critical-path model evaluations. For a trajectory state z, we construct a perturbed state z ′… view at source ↗

**Figure 3.** Figure 3: Visualization of Lscore. 29 [PITH_FULL_IMAGE:figures/full_fig_p029_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of Bn. F.4 Synthetic data In this section we provide more details about synthetic experiments. F.4.1 2D Toy Model Chessboard The chessboard distribution has several key characteristics that make it an excellent test benchmark: (1) Discrete: The distribution is defined on a finite set of points on a two-dimensional grid; (2) Sparse: Nearly half of the grid points have a probability of exactly … view at source ↗

**Figure 5.** Figure 5: Visualization of the chessboard experiments. (Left) The target distribution. (Middle) The [PITH_FULL_IMAGE:figures/full_fig_p031_5.png] view at source ↗

**Figure 6.** Figure 6: Visualization of the circle experiments. (Left) The target distribution. (Middle) The Picard [PITH_FULL_IMAGE:figures/full_fig_p031_6.png] view at source ↗

**Figure 7.** Figure 7: Oracle synthetic quality-matched scaling experiment. Panel (a) compares the per-coordinate [PITH_FULL_IMAGE:figures/full_fig_p032_7.png] view at source ↗

**Figure 8.** Figure 8: Generated samples from the Imagenet experiments. [PITH_FULL_IMAGE:figures/full_fig_p032_8.png] view at source ↗

read the original abstract

Discrete diffusion models are widely used for learning and generating discrete distributions. As the generation process is inherently sequential, the acceleration of sampling is of significant importance. In this work, we parallelize the mainstream $\tau$-leaping algorithm for absorbing discrete diffusion in a Continuous-Time Markov Chain (CTMC) framework. By leveraging the continuous-time stochastic integral form of the $\tau$-leaping algorithm and the Picard iteration method, we achieve parallel-in-time sampling acceleration and provide a proof of exponential-factorial convergence for our algorithm. We improve the overall time complexity of $\tau$-leaping under absorbing settings from ${\mathcal{O}}(d \log S)$ to ${\mathcal{O}}(\log (d\log S)\cdot \log d)$ with respect to NFE. Empirically, our method shows consistent acceleration across synthetic and real-data settings. The new sampler achieves at most $7$--$9\times$ runtime speedup for synthetic distribution, and maintains the same quality with $50\%$ fewer NFE and $1.45$--$1.86\times$ runtime speedups in image/text tasks on a single GPU. Our research expands the potential of discrete diffusion models for efficient parallel inference, with broader implications for applications such as molecular structure and language generation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a parallel-in-time τ-leaping sampler for absorbing discrete diffusion via Picard iteration on the continuous-time integral, with a claimed convergence proof and measured speedups, but the exact-distribution claim needs checking.

read the letter

Colleague,

The main point is that this work parallelizes the τ-leaping algorithm for discrete diffusion models inside a CTMC framework. They rewrite the process in continuous-time stochastic integral form, apply Picard iteration to enable time-parallel execution, and state an exponential-factorial convergence result. This is said to drop the NFE complexity from O(d log S) to O(log(d log S) · log d) while keeping the same distribution.

What stands out as new is the direct use of Picard iteration for parallel-in-time sampling specifically under absorbing discrete diffusion; prior work on parallel methods has not been applied this way in the references given. The empirical side is straightforward: synthetic cases reach 7–9× runtime gains, and image/text tasks show 1.45–1.86× speedups at 50 % fewer NFEs with no quality drop. That gives a usable data point for anyone measuring wall-clock cost on a single GPU.

The soft spot is the load-bearing assumption that the parallel Picard version exactly reproduces the law of the original sequential CTMC rather than converging only in a weaker sense. The abstract asserts a proof, yet the provided text supplies no derivation steps or explicit verification that the iteration preserves the jump-process distribution at finite steps. If that step holds, the complexity claim is interesting; if it introduces bias or requires many serial iterations, the practical gain shrinks. The experiments report consistent acceleration but would be stronger with more detail on iteration count, parallel overhead, and exact baseline implementations.

This paper is for researchers who already work on sampling efficiency in discrete diffusion for language, images, or molecules. A reader who needs a concrete algorithm plus runtime numbers will find something to test. It is coherent enough on its own terms to merit a serious referee who can examine the proof and the CTMC equivalence claim.

I would send it to peer review.

Referee Report

2 major / 1 minor

Summary. The paper claims to accelerate sampling for discrete diffusion models by parallelizing the mainstream τ-leaping algorithm for absorbing discrete diffusion within a CTMC framework. It does so by recasting τ-leaping in continuous-time stochastic integral form, applying Picard iteration to enable parallel-in-time execution, and supplying a proof of exponential-factorial convergence. The claimed complexity improvement is from O(d log S) to O(log (d log S) · log d) with respect to NFE; empirical results report up to 7–9× runtime speedup on synthetic distributions and 1.45–1.86× speedups on image/text tasks with 50 % fewer NFE while preserving sample quality.

Significance. If the convergence proof is correct and the parallel sampler exactly preserves the law of the original absorbing CTMC, the result would be significant for efficient inference in discrete diffusion models used in language and molecular generation. The reported complexity reduction and single-GPU speedups would be practically relevant, and the explicit convergence guarantee would be a strength relative to purely empirical acceleration methods.

major comments (2)

[Proof of convergence] The central claim rests on the assertion that the continuous-time stochastic integral form of τ-leaping admits a Picard iteration that can be parallelized across time steps while exactly preserving the statistical properties of the sequential CTMC (rather than converging only in a weaker sense). The proof section must supply the explicit steps and error analysis showing that the parallel execution yields samples with identical distribution to the original absorbing jump process; without these steps the complexity and speedup claims cannot be verified.
[§3] Abstract and §3: the stated complexity reduction from O(d log S) to O(log (d log S) · log d) w.r.t. NFE is load-bearing for the contribution. The derivation must be shown explicitly, including how the number of Picard iterations and the parallel schedule produce the logarithmic dependence on d and log S while maintaining the exact marginals of the τ-leaping process.

minor comments (1)

[Abstract] The phrase 'exponential-factorial convergence' is non-standard; the proof section should define the precise rate (e.g., factorial decay in the error bound or exponential in the number of iterations) with the relevant constants.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful and constructive review. The two major comments both concern the need for greater explicitness in the convergence proof and complexity analysis. We address each below and will revise the manuscript to incorporate the requested details.

read point-by-point responses

Referee: [Proof of convergence] The central claim rests on the assertion that the continuous-time stochastic integral form of τ-leaping admits a Picard iteration that can be parallelized across time steps while exactly preserving the statistical properties of the sequential CTMC (rather than converging only in a weaker sense). The proof section must supply the explicit steps and error analysis showing that the parallel execution yields samples with identical distribution to the original absorbing jump process; without these steps the complexity and speedup claims cannot be verified.

Authors: We agree that the proof requires additional explicit steps to make the exact distributional equivalence fully transparent. In the revised manuscript we will expand the relevant section to include: (i) the precise mapping from the τ-leaping CTMC to its continuous-time stochastic integral representation, (ii) the Picard iteration operator applied to this integral equation, (iii) an inductive argument showing that each parallel iteration preserves the exact finite-dimensional marginals of the original absorbing jump process, and (iv) the error analysis establishing the exponential-factorial convergence rate. These additions will confirm that the parallel sampler produces samples from the identical law, not merely a weaker approximation. revision: yes
Referee: [§3] Abstract and §3: the stated complexity reduction from O(d log S) to O(log (d log S) · log d) w.r.t. NFE is load-bearing for the contribution. The derivation must be shown explicitly, including how the number of Picard iterations and the parallel schedule produce the logarithmic dependence on d and log S while maintaining the exact marginals of the τ-leaping process.

Authors: We concur that the complexity claim must be derived in full. The revised §3 will contain a dedicated derivation subsection that: (i) recalls the sequential τ-leaping cost O(d log S), (ii) shows how the chosen parallel schedule together with the number of Picard iterations needed for the target accuracy reduces the depth to O(log(d log S)), and (iii) multiplies by the per-iteration parallel cost O(log d) to obtain the overall O(log(d log S) · log d) bound with respect to NFE, all while the exact marginal preservation established in the expanded proof guarantees that the complexity improvement does not alter the target distribution. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained algorithmic construction

full rationale

The paper presents an independent algorithmic construction that parallelizes the τ-leaping algorithm via its continuous-time stochastic integral representation and Picard iteration, followed by a claimed convergence proof and complexity analysis. No load-bearing step reduces by definition, fitted input, or self-citation chain to the target result itself; the abstract and described method treat the parallelization and proof as external to the input CTMC process rather than tautological. The central claims rest on the mathematical properties of the integral form and iteration, which are presented as verifiable independently rather than presupposed. This is the normal case of a non-circular derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.1-grok · 5772 in / 1220 out tokens · 26336 ms · 2026-07-02T16:07:25.328177+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 26 canonical work pages · 6 internal anchors

[1]

A pytorch reproduction of masked generative image transformer.arXiv preprint arXiv:2310.14400,

[BC23] Victor Besnier and Mickael Chen. A pytorch reproduction of masked generative image transformer.arXiv preprint arXiv:2310.14400,

work page arXiv
[2]

Non- asymptotic convergence of discrete diffusion models: Masked and random walk dy- namics.arXiv preprint arXiv:2512.00580,

[CDPR25] Giovanni Conforti, Alain Durmus, Le-Tuyet-Nhi Pham, and Gael Raoul. Non- asymptotic convergence of discrete diffusion models: Masked and random walk dy- namics.arXiv preprint arXiv:2512.00580,

work page arXiv
[3]

Beyond masked and unmasked: Discrete diffusion models via partial masking.arXiv preprint arXiv:2505.18495,

[CSL+25] Chen-Hao Chao, Wei-Fang Sun, Hanwen Liang, Chun-Yi Lee, and Rahul G Krishnan. Beyond masked and unmasked: Discrete diffusion models via partial masking.arXiv preprint arXiv:2505.18495,

work page arXiv
[4]

Convergence analysis of discrete diffusion model: Exact implementation through uniformization.arXiv preprint arXiv:2402.08095,

[CY24] Hongrui Chen and Lexing Ying. Convergence analysis of discrete diffusion model: Exact implementation through uniformization.arXiv preprint arXiv:2402.08095,

work page arXiv
[5]

Efficient Sampling with Discrete Diffusion Models: Sharp and Adaptive Guarantees

[DHW26] Daniil Dmitriev, Zhihan Huang, and Yuting Wei. Efficient sampling with discrete diffusion models: Sharp and adaptive guarantees.arXiv preprint arXiv:2602.15008,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

On the complexity theory of masked discrete diffusion: Frompoly(1/ϵ) to nearlyϵ-free.arXiv preprint arXiv:2509.21835,

[HLJ+25] Xunpeng Huang, Yingyu Lin, Nishant Jain, Kaibo Wang, Difan Zou, Yian Ma, and Tong Zhang. On the complexity theory of masked discrete diffusion: Frompoly(1/ϵ) to nearlyϵ-free.arXiv preprint arXiv:2509.21835,

work page arXiv
[7]

Discrete diffusion models: Novel analysis and new sampler guarantees.arXiv preprint arXiv:2509.16756,

[LLLS25] Yuchen Liang, Yingbin Liang, Lifeng Lai, and Ness Shroff. Discrete diffusion models: Novel analysis and new sampler guarantees.arXiv preprint arXiv:2509.16756,

work page arXiv
[8]

Think while you generate: Discrete diffusion with planned denoising.arXiv preprint arXiv:2410.06264,

[LNC+24] Sulin Liu, Juno Nam, Andrew Campbell, Hannes Stärk, Yilun Xu, Tommi Jaakkola, and Rafael Gómez-Bombarelli. Think while you generate: Discrete diffusion with planned denoising.arXiv preprint arXiv:2410.06264,

work page arXiv
[9]

Sharp convergence rates for masked diffusion models.arXiv preprint arXiv:2602.22505,

[LTSL26] Yuchen Liang, Zhiheng Tan, Ness Shroff, and Yingbin Liang. Sharp convergence rates for masked diffusion models.arXiv preprint arXiv:2602.22505,

work page arXiv
[10]

Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data

[ONX+24] Jingyang Ou, Shen Nie, Kaiwen Xue, Fengqi Zhu, Jiacheng Sun, Zhenguo Li, and Chongxuan Li. Your absorbing discrete diffusion secretly models the conditional distributions of clean data.arXiv preprint arXiv:2406.03736,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Pillutla, K., Swayamdipta, S., Zellers, R., Thickstun, J., Welleck, S., Choi, Y ., and Harchaoui, Z

[PBP+25] Fred Zhangzhi Peng, Zachary Bezemek, Sawan Patel, Jarrid Rector-Brooks, Sherwood Yao, Avishek Joey Bose, Alexander Tong, and Pranam Chatterjee. Path planning for masked diffusion model sampling.arXiv preprint arXiv:2502.03540,

work page arXiv
[12]

Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models

[RAR26] Liran Ringel, Ameen Ali, and Yaniv Romano. Dependency-guided parallel decoding in discrete diffusion language models.arXiv preprint arXiv:2604.02560,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Fast solvers for discrete diffusion models: Theory and applications of high-order algorithms.arXiv preprint arXiv:2502.00234,

[RCZ+25] Yinuo Ren, Haoxuan Chen, Yuchen Zhu, Wei Guo, Yongxin Chen, Grant M Rotskoff, Molei Tao, and Lexing Ying. Fast solvers for discrete diffusion models: Theory and applications of high-order algorithms.arXiv preprint arXiv:2502.00234,

work page arXiv
[14]

S., Deschenaux, J., Gokaslan, A., Wang, G., Chiu, J

[SDG+25] Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, and V olodymyr Kuleshov. The diffusion duality.arXiv preprint arXiv:2506.10892,

work page arXiv
[15]

Flow matching with general discrete paths: A kinetic-optimal perspective.arXiv preprint arXiv:2412.03487,

[SGH+24] Neta Shaul, Itai Gat, Marton Havasi, Daniel Severo, Anuroop Sriram, Peter Holderrieth, Brian Karrer, Yaron Lipman, and Ricky TQ Chen. Flow matching with general discrete paths: A kinetic-optimal perspective.arXiv preprint arXiv:2412.03487,

work page arXiv
[16]

Designing dna with tunable regulatory activity using score-entropy discrete diffusion.bioRxiv, pages 2024–05,

[SKS+24] Anirban Sarkar, Yijie Kang, Nirali Somia, Pablo Mantilla, Jessica Lu Zhou, Masayuki Nagai, Ziqi Tang, Chris Zhao, and Peter Koo. Designing dna with tunable regulatory activity using score-entropy discrete diffusion.bioRxiv, pages 2024–05,

2024
[17]

Denoising Diffusion Implicit Models

[SME20] Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502,

work page internal anchor Pith review Pith/arXiv arXiv 2010
[18]

Digress: Discrete denoising diffusion for graph generation.arXiv preprint arXiv:2209.14734, 2022

[VKS+22] Clement Vignac, Igor Krawczuk, Antoine Siraudin, Bohan Wang, V olkan Cevher, and Pascal Frossard. Digress: Discrete denoising diffusion for graph generation.arXiv preprint arXiv:2209.14734,

work page arXiv
[19]

Fine- tuning discrete diffusion models via reward optimization with applications to dna and protein design.arXiv preprint arXiv:2410.13643,

[WUH+24] Chenyu Wang, Masatoshi Uehara, Yichun He, Amy Wang, Tommaso Biancalani, Avantika Lal, Tommi Jaakkola, Sergey Levine, Hanchen Wang, and Aviv Regev. Fine- tuning discrete diffusion models via reward optimization with applications to dna and protein design.arXiv preprint arXiv:2410.13643,

work page arXiv
[20]

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

[WZX+25] Chengyue Wu, Hao Zhang, Shuchen Xue, Zhijian Liu, Shizhe Diao, Ligeng Zhu, Ping Luo, Song Han, and Enze Xie. Fast-dllm: Training-free acceleration of diffusion llm by enabling kv cache and parallel decoding.arXiv preprint arXiv:2505.22618,

work page internal anchor Pith review Pith/arXiv arXiv
[21]

Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling

[XXF+25] Tianyu Xie, Shuchen Xue, Zijin Feng, Tianyang Hu, Jiacheng Sun, Zhenguo Li, and Cheng Zhang. Variational autoencoding discrete diffusion with enhanced dimensional correlations modeling.arXiv preprint arXiv:2505.17384,

work page internal anchor Pith review Pith/arXiv arXiv
[22]

Dimple: Discrete diffusion multimodal large language model with parallel decoding.arXiv preprint arXiv:2505.16990,

[YMW25] Runpeng Yu, Xinyin Ma, and Xinchao Wang. Dimple: Discrete diffusion multimodal large language model with parallel decoding.arXiv preprint arXiv:2505.16990,

work page arXiv
[23]

Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling.arXiv preprint arXiv:2409.02908,

[ZCM+24] Kaiwen Zheng, Yongxin Chen, Hanzi Mao, Ming-Yu Liu, Jun Zhu, and Qinsheng Zhang. Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling.arXiv preprint arXiv:2409.02908,

work page arXiv
[24]

Unified discrete diffusion for categorical data.arXiv preprint arXiv:2402.03701,

[ZDY A24]Lingxiao Zhao, Xueying Ding, Lijun Yu, and Leman Akoglu. Unified discrete diffusion for categorical data.arXiv preprint arXiv:2402.03701,

work page arXiv
[25]

The cosine schedule is fisher-rao-optimal for masked discrete diffusion models.arXiv preprint arXiv:2508.04884,

[Zha25] Leo Zhang. The cosine schedule is fisher-rao-optimal for masked discrete diffusion models.arXiv preprint arXiv:2508.04884,

work page arXiv
[26]

Dimo: Distilling masked diffusion models into one-step generator.arXiv preprint arXiv:2503.15457,

[ZWLK25] Yuanzhi Zhu, Xi Wang, Stephane Lathuiliere, and Vicky Kalogeiton. Dimo: Distilling masked diffusion models into one-step generator.arXiv preprint arXiv:2503.15457,

work page arXiv
[27]

arXiv preprint arXiv:2302.05737 , year=

[ZYYK23] Lin Zheng, Jianbo Yuan, Lei Yu, and Lingpeng Kong. A reparameterized discrete diffusion model for text generation.arXiv preprint arXiv:2302.05737,

work page arXiv

[1] [1]

A pytorch reproduction of masked generative image transformer.arXiv preprint arXiv:2310.14400,

[BC23] Victor Besnier and Mickael Chen. A pytorch reproduction of masked generative image transformer.arXiv preprint arXiv:2310.14400,

work page arXiv

[2] [2]

Non- asymptotic convergence of discrete diffusion models: Masked and random walk dy- namics.arXiv preprint arXiv:2512.00580,

[CDPR25] Giovanni Conforti, Alain Durmus, Le-Tuyet-Nhi Pham, and Gael Raoul. Non- asymptotic convergence of discrete diffusion models: Masked and random walk dy- namics.arXiv preprint arXiv:2512.00580,

work page arXiv

[3] [3]

Beyond masked and unmasked: Discrete diffusion models via partial masking.arXiv preprint arXiv:2505.18495,

[CSL+25] Chen-Hao Chao, Wei-Fang Sun, Hanwen Liang, Chun-Yi Lee, and Rahul G Krishnan. Beyond masked and unmasked: Discrete diffusion models via partial masking.arXiv preprint arXiv:2505.18495,

work page arXiv

[4] [4]

Convergence analysis of discrete diffusion model: Exact implementation through uniformization.arXiv preprint arXiv:2402.08095,

[CY24] Hongrui Chen and Lexing Ying. Convergence analysis of discrete diffusion model: Exact implementation through uniformization.arXiv preprint arXiv:2402.08095,

work page arXiv

[5] [5]

Efficient Sampling with Discrete Diffusion Models: Sharp and Adaptive Guarantees

[DHW26] Daniil Dmitriev, Zhihan Huang, and Yuting Wei. Efficient sampling with discrete diffusion models: Sharp and adaptive guarantees.arXiv preprint arXiv:2602.15008,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

On the complexity theory of masked discrete diffusion: Frompoly(1/ϵ) to nearlyϵ-free.arXiv preprint arXiv:2509.21835,

[HLJ+25] Xunpeng Huang, Yingyu Lin, Nishant Jain, Kaibo Wang, Difan Zou, Yian Ma, and Tong Zhang. On the complexity theory of masked discrete diffusion: Frompoly(1/ϵ) to nearlyϵ-free.arXiv preprint arXiv:2509.21835,

work page arXiv

[7] [7]

Discrete diffusion models: Novel analysis and new sampler guarantees.arXiv preprint arXiv:2509.16756,

[LLLS25] Yuchen Liang, Yingbin Liang, Lifeng Lai, and Ness Shroff. Discrete diffusion models: Novel analysis and new sampler guarantees.arXiv preprint arXiv:2509.16756,

work page arXiv

[8] [8]

Think while you generate: Discrete diffusion with planned denoising.arXiv preprint arXiv:2410.06264,

[LNC+24] Sulin Liu, Juno Nam, Andrew Campbell, Hannes Stärk, Yilun Xu, Tommi Jaakkola, and Rafael Gómez-Bombarelli. Think while you generate: Discrete diffusion with planned denoising.arXiv preprint arXiv:2410.06264,

work page arXiv

[9] [9]

Sharp convergence rates for masked diffusion models.arXiv preprint arXiv:2602.22505,

[LTSL26] Yuchen Liang, Zhiheng Tan, Ness Shroff, and Yingbin Liang. Sharp convergence rates for masked diffusion models.arXiv preprint arXiv:2602.22505,

work page arXiv

[10] [10]

Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data

[ONX+24] Jingyang Ou, Shen Nie, Kaiwen Xue, Fengqi Zhu, Jiacheng Sun, Zhenguo Li, and Chongxuan Li. Your absorbing discrete diffusion secretly models the conditional distributions of clean data.arXiv preprint arXiv:2406.03736,

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

Pillutla, K., Swayamdipta, S., Zellers, R., Thickstun, J., Welleck, S., Choi, Y ., and Harchaoui, Z

[PBP+25] Fred Zhangzhi Peng, Zachary Bezemek, Sawan Patel, Jarrid Rector-Brooks, Sherwood Yao, Avishek Joey Bose, Alexander Tong, and Pranam Chatterjee. Path planning for masked diffusion model sampling.arXiv preprint arXiv:2502.03540,

work page arXiv

[12] [12]

Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models

[RAR26] Liran Ringel, Ameen Ali, and Yaniv Romano. Dependency-guided parallel decoding in discrete diffusion language models.arXiv preprint arXiv:2604.02560,

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

Fast solvers for discrete diffusion models: Theory and applications of high-order algorithms.arXiv preprint arXiv:2502.00234,

[RCZ+25] Yinuo Ren, Haoxuan Chen, Yuchen Zhu, Wei Guo, Yongxin Chen, Grant M Rotskoff, Molei Tao, and Lexing Ying. Fast solvers for discrete diffusion models: Theory and applications of high-order algorithms.arXiv preprint arXiv:2502.00234,

work page arXiv

[14] [14]

S., Deschenaux, J., Gokaslan, A., Wang, G., Chiu, J

[SDG+25] Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, and V olodymyr Kuleshov. The diffusion duality.arXiv preprint arXiv:2506.10892,

work page arXiv

[15] [15]

Flow matching with general discrete paths: A kinetic-optimal perspective.arXiv preprint arXiv:2412.03487,

[SGH+24] Neta Shaul, Itai Gat, Marton Havasi, Daniel Severo, Anuroop Sriram, Peter Holderrieth, Brian Karrer, Yaron Lipman, and Ricky TQ Chen. Flow matching with general discrete paths: A kinetic-optimal perspective.arXiv preprint arXiv:2412.03487,

work page arXiv

[16] [16]

Designing dna with tunable regulatory activity using score-entropy discrete diffusion.bioRxiv, pages 2024–05,

[SKS+24] Anirban Sarkar, Yijie Kang, Nirali Somia, Pablo Mantilla, Jessica Lu Zhou, Masayuki Nagai, Ziqi Tang, Chris Zhao, and Peter Koo. Designing dna with tunable regulatory activity using score-entropy discrete diffusion.bioRxiv, pages 2024–05,

2024

[17] [17]

Denoising Diffusion Implicit Models

[SME20] Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502,

work page internal anchor Pith review Pith/arXiv arXiv 2010

[18] [18]

Digress: Discrete denoising diffusion for graph generation.arXiv preprint arXiv:2209.14734, 2022

[VKS+22] Clement Vignac, Igor Krawczuk, Antoine Siraudin, Bohan Wang, V olkan Cevher, and Pascal Frossard. Digress: Discrete denoising diffusion for graph generation.arXiv preprint arXiv:2209.14734,

work page arXiv

[19] [19]

Fine- tuning discrete diffusion models via reward optimization with applications to dna and protein design.arXiv preprint arXiv:2410.13643,

[WUH+24] Chenyu Wang, Masatoshi Uehara, Yichun He, Amy Wang, Tommaso Biancalani, Avantika Lal, Tommi Jaakkola, Sergey Levine, Hanchen Wang, and Aviv Regev. Fine- tuning discrete diffusion models via reward optimization with applications to dna and protein design.arXiv preprint arXiv:2410.13643,

work page arXiv

[20] [20]

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

[WZX+25] Chengyue Wu, Hao Zhang, Shuchen Xue, Zhijian Liu, Shizhe Diao, Ligeng Zhu, Ping Luo, Song Han, and Enze Xie. Fast-dllm: Training-free acceleration of diffusion llm by enabling kv cache and parallel decoding.arXiv preprint arXiv:2505.22618,

work page internal anchor Pith review Pith/arXiv arXiv

[21] [21]

Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling

[XXF+25] Tianyu Xie, Shuchen Xue, Zijin Feng, Tianyang Hu, Jiacheng Sun, Zhenguo Li, and Cheng Zhang. Variational autoencoding discrete diffusion with enhanced dimensional correlations modeling.arXiv preprint arXiv:2505.17384,

work page internal anchor Pith review Pith/arXiv arXiv

[22] [22]

Dimple: Discrete diffusion multimodal large language model with parallel decoding.arXiv preprint arXiv:2505.16990,

[YMW25] Runpeng Yu, Xinyin Ma, and Xinchao Wang. Dimple: Discrete diffusion multimodal large language model with parallel decoding.arXiv preprint arXiv:2505.16990,

work page arXiv

[23] [23]

Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling.arXiv preprint arXiv:2409.02908,

[ZCM+24] Kaiwen Zheng, Yongxin Chen, Hanzi Mao, Ming-Yu Liu, Jun Zhu, and Qinsheng Zhang. Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling.arXiv preprint arXiv:2409.02908,

work page arXiv

[24] [24]

Unified discrete diffusion for categorical data.arXiv preprint arXiv:2402.03701,

[ZDY A24]Lingxiao Zhao, Xueying Ding, Lijun Yu, and Leman Akoglu. Unified discrete diffusion for categorical data.arXiv preprint arXiv:2402.03701,

work page arXiv

[25] [25]

The cosine schedule is fisher-rao-optimal for masked discrete diffusion models.arXiv preprint arXiv:2508.04884,

[Zha25] Leo Zhang. The cosine schedule is fisher-rao-optimal for masked discrete diffusion models.arXiv preprint arXiv:2508.04884,

work page arXiv

[26] [26]

Dimo: Distilling masked diffusion models into one-step generator.arXiv preprint arXiv:2503.15457,

[ZWLK25] Yuanzhi Zhu, Xi Wang, Stephane Lathuiliere, and Vicky Kalogeiton. Dimo: Distilling masked diffusion models into one-step generator.arXiv preprint arXiv:2503.15457,

work page arXiv

[27] [27]

arXiv preprint arXiv:2302.05737 , year=

[ZYYK23] Lin Zheng, Jianbo Yuan, Lei Yu, and Lingpeng Kong. A reparameterized discrete diffusion model for text generation.arXiv preprint arXiv:2302.05737,

work page arXiv