pith. sign in

arxiv: 2512.15923 · v2 · submitted 2025-12-17 · 💻 cs.LG

A Unification of Discrete, Gaussian, and Simplicial Diffusion

Pith reviewed 2026-05-16 21:16 UTC · model grok-4.3

classification 💻 cs.LG
keywords diffusion modelsdiscrete diffusionsimplicial diffusionWright-Fisher modelpopulation geneticsunificationDNA generationmulti-domain training
0
0 comments X

The pith

Discrete, Gaussian, and simplicial diffusion arise as different parameterizations and large-population limits of the Wright-Fisher population genetics model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the three main diffusion approaches for discrete sequences such as DNA or language tokens are not independent techniques but instead correspond to distinct choices of parameterization within the same Wright-Fisher stochastic process. Simplicial diffusion and Gaussian diffusion appear specifically as limiting cases when the effective population size becomes large. This shared foundation makes it possible to translate results from mathematical genetics into stable numerical methods for simplicial diffusion and to train a single model whose test-time behavior can be switched among the three domains. Experiments confirm that the resulting Wright-Fisher simplicial diffusion is more stable than earlier simplex-based methods and that multi-domain training yields performance competitive with models trained on any single domain alone.

Core claim

All three major methods of diffusion for discrete sequences—discrete diffusion, Gaussian diffusion in Euclidean space, and diffusion on the simplex—are different parameterizations of the Wright-Fisher population genetics model. Simplicial and Gaussian diffusion emerge as two large-population limits of this process. The resulting theory formally connects the likelihoods and hyperparameters across the three families and supplies stable stochastic processes for simplicial diffusion drawn from the genetics literature. A single trained model can then perform diffusion in any of the three domains at test time.

What carries the argument

The Wright-Fisher model, a finite-population stochastic process from population genetics that tracks changes in allele frequencies under drift; it supplies the common dynamics whose specific discretizations and scaling limits recover the three diffusion schemes.

If this is right

  • Likelihoods and hyperparameters of discrete, Gaussian, and simplicial diffusion become formally interchangeable through their shared Wright-Fisher parameterization.
  • Stable numerical schemes for simplicial diffusion follow directly from existing mathematical genetics results.
  • A single model can be trained once and then deployed for diffusion in any of the three domains at test time.
  • Wright-Fisher simplicial diffusion achieves higher stability and better performance than prior simplicial methods on conditional DNA generation tasks.
  • Models trained jointly across domains remain competitive with models trained on any one domain separately.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Varying the effective population size parameter inside the Wright-Fisher framework could yield new families of diffusion schedules that interpolate continuously between the three regimes.
  • The large-population limits may clarify when Gaussian approximations remain accurate for discrete data and when they break down.
  • Tools developed for analyzing convergence rates in population-genetics models could be repurposed to study sampling efficiency and mixing times in diffusion generative models.
  • The unification suggests testing whether other discrete generative processes outside diffusion, such as certain autoregressive or flow models, also admit Wright-Fisher interpretations.

Load-bearing premise

The specific discretizations and stochastic processes used in existing discrete, Gaussian, and simplicial diffusion models match instances or limits of Wright-Fisher dynamics exactly, without extra approximations that would break equivalence of likelihoods and hyperparameters.

What would settle it

A side-by-side computation of exact transition probabilities or marginal likelihoods between a Wright-Fisher-derived simplicial process and a standard simplex diffusion implementation that shows systematic, non-negligible differences persisting even after population-size scaling is accounted for.

Figures

Figures reproduced from arXiv: 2512.15923 by Alan N. Amin, Alex Ali, Andrew Gordon Wilson, Aniruddh Raghu, Joshua Rollins, Nuria Alina Chandra, Sebastian W. Ober, Yucen Lily Li.

Figure 1
Figure 1. Figure 1: Discrete, Gaussian, and Simplicial diffusion for discrete data are unified by Wright￾Fisher diffusion. (a) Wright-Fisher diffusion with population size ζ “ 6, showing mutation and reproduction processes across generations. (b) The three diffusion methods emerge as different limits of Wright-Fisher: discrete diffusion corresponds to ζ “ 1, while Gaussian and simplicial diffusion arise as ζ Ñ 8 with zero and… view at source ↗
Figure 2
Figure 2. Figure 2: Discrete diffusion with a large pop￾ulation converges to Gaussian diffusion. With ζ “ 1000, we show example trajectories p⃗xtqt that converge to approximate Gaussians near ⃗π. Proof idea: As ζ Ñ 8, by the law of large num￾bers, ⃗xt approaches ⃗xT 0 e τtL which itself goes to the stationary distribution of L. We can there￾fore decompose ⃗xt ´ ⃗π “ ⃗xT 0 e τ ζ t L ´ ⃗π looooomooooon signal `⃗xt ´ ⃗xT 0 e τ ζ… view at source ↗
Figure 3
Figure 3. Figure 3: The hollow parameterization leads to realistic reverse path samples. ζ “ 300. Loss comparison Thm. 4.1 suggests that there is virtually no difference to training a discrete diffusion model with ζ “ 10100 and training Gaus￾sian diffusion with Alg. 2 on a computer, suggesting their ELBOs are comparable. Yet the limiting Gaussian ELBO is infinite! [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: emb of amino acids from BLOSUM L. embpx0q from Thm. 4.1 for L from Amin et al. [2]. Hyperparameter comparison Thm. 4.1 gives us a formula for emb determined by the slowest-decaying directions in L. App. E.4 also shows that every emb can be induced from some L. Remarkably, this connection accommodates embeddings in different dimensions R r : r is determined by the dimension of the dominant eigenspace of L. … view at source ↗
Figure 5
Figure 5. Figure 5: Improved simplicial diffusion performs accurate conditional DNA generation. We generate DNA samples of length 500 conditioned on accessibility with a classifier. (a) For an example target, we plot predicted accessibility profiles at the centre 150 positions of 5 example samples from each model. We smooth profiles with a bandwidth of 2. (b) For 1000 targets and 10 samples from each model, we plot the error … view at source ↗
Figure 6
Figure 6. Figure 6: The sufficient statistic parameterization represents ⃗xt from all diffusion models in the same space. 9 [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The sufficient statistic parametrization enables a single model to perform competitive discrete, Gaussian, and simplicial diffusion. We compare individual models for each modality with a single unified model using the SSP. (a) We train on proteins and measure sample quality by predicted protein fold-ability (pLDDT). Each model was trained for the same amount of time. (b) We train on language and measure sa… view at source ↗
Figure 8
Figure 8. Figure 8: The argmax of Gaussian diffusion appears different from discrete diffusion in sim￾ulation, despite having the same marginals. We compare example paths of pppargmaxpwtqqtq (left, red; we show Gaussian diffusion wt in grey), pppz˜tqtq for uniform discrete diffusion (centre, blue), and their empirical marginals over 10’000 simulations (right); we simulate using a grid size of 0.0001. Note the two processes ha… view at source ↗
Figure 9
Figure 9. Figure 9: Leveraging mathematical genetics literature, we build fast and stable simplicial diffusion. (a) We plot the time it takes to sample a sequence of D “ 500 using an SDE, versus our exact sampling for various values of t on an A100 80GB GPU. We threshold switching to the Griffiths approximation at τt “ 0.1. (b) For τ “ 0.1 and B “ 3 we sample 3 ˆ 107 points from the exact sampling method, Griffith’s approxima… view at source ↗
Figure 10
Figure 10. Figure 10: The sufficient statistic parametrization enables a single model to perform competitive discrete, Gaussian, and simplicial optimization of antibodies. Using our protein models from [PITH_FULL_IMAGE:figures/full_fig_p042_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: The SSP enables a single model to fit image data across 3 modalities. We perform the analysis of [PITH_FULL_IMAGE:figures/full_fig_p043_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: The SSP results in no noticeable drop in generation quality for image models. We plot samples from models trained on MNIST. 44 [PITH_FULL_IMAGE:figures/full_fig_p044_12.png] view at source ↗
read the original abstract

To model discrete sequences such as DNA, proteins, and language using diffusion, practitioners must choose between three major methods: diffusion in discrete space, Gaussian diffusion in Euclidean space, or diffusion on the simplex. Despite their shared goal, these models have disparate algorithms, theoretical structures, and tradeoffs: discrete diffusion has the most natural domain, Gaussian diffusion has more mature algorithms, and diffusion on the simplex in principle combines the strengths of the other two but in practice suffers from a numerically unstable stochastic processes. Ideally we could see each of these models as instances of the same underlying framework, and enable practitioners to switch between models for downstream applications. However previous theories have only considered connections in special cases. Here we build a theory unifying all three methods of discrete diffusion as different parameterizations of the same underlying process: the Wright-Fisher population genetics model. In particular, we find simplicial and Gaussian diffusion as two large-population limits. Our theory formally connects the likelihoods and hyperparameters of these models and leverages decades of mathematical genetics literature to unlock stable simplicial diffusion. Finally, we relieve the practitioner of balancing model trade-offs by demonstrating it is possible to train a single model that can perform diffusion in any of these three domains at test time. Our experiments show that Wright-Fisher simplicial diffusion is more stable and outperforms previous simplicial diffusion models on conditional DNA generation. We also show that we can train models on multiple domains at once that are competitive with models trained on any individual domain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript claims to unify discrete, Gaussian, and simplicial diffusion models for discrete sequences by framing them as different parameterizations of the Wright-Fisher population genetics model. Simplicial and Gaussian diffusion are derived as two large-population limits of this process. The theory formally connects the likelihoods and hyperparameters across the three approaches, leverages mathematical genetics results to stabilize simplicial diffusion, and demonstrates that a single model can be trained to perform diffusion in any of the three domains at test time. Experiments report that the Wright-Fisher simplicial variant is more stable and outperforms prior simplicial models on conditional DNA generation, while multi-domain models remain competitive with single-domain baselines.

Significance. If the claimed exact equivalences hold, the unification would let practitioners interchange diffusion paradigms without retraining and borrow numerical-stability techniques from the mathematical-genetics literature. The multi-domain training result is practically attractive for applications involving DNA, proteins, or language. The work’s main contribution is the theoretical linkage rather than new algorithms, so its significance rests on the tightness of the Wright-Fisher correspondence and the reproducibility of the reported performance gains.

major comments (2)
  1. [Unification theory] Abstract and unification theory: the claim that standard discrete diffusion transition kernels match Wright-Fisher multinomial sampling exactly (required for identical marginal likelihoods and interchangeable hyperparameters) must be verified by direct comparison of the forward noising kernels, rate matrices, and absorbing-state handling. Any discretization mismatch would break the single-model multi-domain training justification.
  2. [Large-population limits] Large-population limits derivations: while the Gaussian and simplicial limits are standard diffusion approximations, the manuscript must show that the specific discretizations and stochastic processes used in existing models correspond exactly to instances of Wright-Fisher dynamics without extra approximations that would invalidate the claimed equivalence of likelihoods and hyperparameters.
minor comments (1)
  1. [Experiments] The abstract states that Wright-Fisher simplicial diffusion outperforms prior simplicial models on DNA generation, but the experimental section should include explicit protocol details, baseline implementations, and statistical significance tests to allow independent verification.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comments highlight important points about the rigor of the claimed equivalences, which we address by strengthening explicit comparisons and derivations in the revised manuscript. We believe these changes preserve the core contribution while improving clarity and verifiability.

read point-by-point responses
  1. Referee: [Unification theory] Abstract and unification theory: the claim that standard discrete diffusion transition kernels match Wright-Fisher multinomial sampling exactly (required for identical marginal likelihoods and interchangeable hyperparameters) must be verified by direct comparison of the forward noising kernels, rate matrices, and absorbing-state handling. Any discretization mismatch would break the single-model multi-domain training justification.

    Authors: We appreciate the referee's emphasis on explicit verification. The original manuscript derives the equivalence by showing that the discrete diffusion forward process is exactly the multinomial sampling step of the Wright-Fisher model under the chosen parameterization (with mutation rates governing the absorbing-state behavior). In the revision we have added a dedicated subsection (Section 3.2) that tabulates the forward noising kernels side-by-side, compares the infinitesimal rate matrices, and confirms that the absorbing-state handling is identical via the standard population-genetics mutation operator. These direct comparisons establish that the marginal likelihoods coincide exactly and that hyperparameters transfer without adjustment, thereby justifying the multi-domain training result. No discretization mismatch exists under the model definitions used. revision: yes

  2. Referee: [Large-population limits] Large-population limits derivations: while the Gaussian and simplicial limits are standard diffusion approximations, the manuscript must show that the specific discretizations and stochastic processes used in existing models correspond exactly to instances of Wright-Fisher dynamics without extra approximations that would invalidate the claimed equivalence of likelihoods and hyperparameters.

    Authors: We agree that the large-population limits must be shown to align precisely with the discretizations employed in prior Gaussian and simplicial diffusion models. The revised manuscript augments the derivations in Section 4 with explicit statements that the time-discretized Ornstein-Uhlenbeck process recovered in the Gaussian limit and the Dirichlet-multinomial process recovered in the simplicial limit are obtained directly from the Wright-Fisher generator without additional approximations beyond the classical large-N diffusion limit (citing the standard convergence theorems from mathematical genetics). We further verify that the noise schedules and step sizes used in the literature correspond one-to-one to the Wright-Fisher time parameterization, preserving the exact equivalence of likelihoods and hyperparameters. These additions eliminate any ambiguity about extraneous approximations. revision: yes

Circularity Check

0 steps flagged

No circularity: unification rests on external Wright-Fisher model from mathematical genetics

full rationale

The paper presents discrete, Gaussian, and simplicial diffusion as different parameterizations of the pre-existing Wright-Fisher population genetics process, with the latter two arising as large-population limits. This connection is explicitly grounded in decades of external mathematical genetics literature rather than any internal fitting, self-definition, or self-citation chain. The abstract states the theory 'formally connects the likelihoods and hyperparameters' by leveraging that literature to stabilize simplicial diffusion, and demonstrates multi-domain training without reducing any claimed equivalence to a tautology or fitted input. No load-bearing step reduces by construction to the paper's own inputs; the derivation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

With only the abstract available, the ledger is necessarily incomplete. The central claim rests on the domain assumption that discrete diffusion processes can be exactly reparameterized as Wright-Fisher dynamics and that the Gaussian and simplicial forms emerge cleanly as large-population limits. No new invented entities are introduced. No specific free parameters are named in the abstract.

axioms (1)
  • domain assumption Discrete, Gaussian, and simplicial diffusion processes correspond to instances or large-population limits of the Wright-Fisher population genetics model
    This mapping is the load-bearing step that allows the claimed unification of likelihoods and hyperparameters.

pith-pipeline@v0.9.0 · 5591 in / 1453 out tokens · 39954 ms · 2026-05-16T21:16:06.204701+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 1 internal anchor

  1. [1]

    Alamdari, N

    S. Alamdari, N. Thakkar, R. van den Berg, A. X. Lu, N. Fusi, A. P. Amini, and K. K. Yang. Protein generation with evolutionary diffusion: sequence is all you need.bioRxiv, Sept. 2023

  2. [2]

    A. N. Amin, N. Gruver, and A. G. Wilson. Why masking diffusion works: Condition on the jump schedule for improved discrete diffusion. InFrontiers in Probabilistic Inference: Learning meets Sampling, Apr. 2025

  3. [3]

    B. D. O. Anderson. Reverse-time diffusion equation models.Stoch. Process. Their Appl., 12(3): 313–326, May 1982

  4. [4]

    Austin, D

    J. Austin, D. D. Johnson, J. Ho, D. Tarlow, and R. Van Den Berg. Structured denoising diffusion models in discrete state-spaces.Adv. Neural Inf. Process. Syst., 34:17981–17993, 2021

  5. [5]

    Avdeyev, C

    P. Avdeyev, C. Shi, Y . Tan, K. Dudnyk, and J. Zhou. Dirichlet diffusion score model for biological sequence generation.arXiv [cs.LG], May 2023

  6. [6]

    Baron, A

    E. Baron, A. N. Amin, R. Weitzman, D. S. Marks, and A. G. Wilson. A diffusion model to shrink proteins while maintaining their function. InThe Exploration in AI Today Workshop at ICML 2025, June 2025

  7. [7]

    R. F. Bass.Stochastic Processes. Cambridge University Press, Oct. 2011

  8. [8]

    Benton, Y

    J. Benton, Y . Shi, V . De Bortoli, G. Deligiannidis, and A. Doucet. From denoising diffusions to denoising markov models.J. R. Stat. Soc. Series B Stat. Methodol., 86(2):286–301, Apr. 2024

  9. [9]

    Calderon, R

    D. Calderon, R. Blecher-Gonen, X. Huang, S. Secchia, J. Kentro, R. M. Daza, B. Martin, A. Dulja, C. Schaub, C. Trapnell, E. Larschan, K. M. O’Connor-Giles, E. E. M. Furlong, and J. Shendure. The continuum of <i>drosophila</i> embryonic development at single- cell resolution.Science, 377(6606):eabn5800, 2022. doi: 10.1126/science.abn5800. URL https://www.s...

  10. [10]

    Campbell, J

    A. Campbell, J. Benton, V . De Bortoli, T. Rainforth, G. Deligiannidis, and A. Doucet. A continuous time framework for discrete denoising models. InAdvances in Neural Information Processing Systems, Oct. 2022

  11. [11]

    N. A. Chandra, Y . Hu, J. D. Buenrostro, S. Mostafavi, and A. Sasse. Refining sequence-to- activity models by increasing model resolution.bioRxiv, 2025. doi: 10.1101/2025.01.24.634804

  12. [12]

    Davis, S

    O. Davis, S. Kessler, M. Petrache, I. I. Ceylan, M. Bronstein, and A. J. Bose. Fisher flow matching for generative modeling over discrete data.arXiv [cs.LG], May 2024. 11

  13. [13]

    Dieleman, L

    S. Dieleman, L. Sartran, A. Roshannai, N. Savinov, Y . Ganin, P. H. Richemond, A. Doucet, R. Strudel, C. Dyer, C. Durkan, C. Hawthorne, R. Leblond, W. Grathwohl, and J. Adler. Continuous diffusion for categorical data.arXiv.org, 2022

  14. [14]

    Eijkelboom, G

    F. Eijkelboom, G. Bartosh, C. Andersson Naesseth, M. Welling, and J.-W. van de Meent. Variational flow matching for graph generation.Advances in Neural Information Processing Systems, 37:11735–11764, 2024

  15. [15]

    S. N. Ethier and T. G. Kurtz.Markov Processes: Characterisation and Convergence. Probability & Mathematical Statistics S. John Wiley & Sons, Nashville, TN, May 1986

  16. [16]

    Floto, T

    G. Floto, T. Jonsson, M. Nica, S. Sanner, and E. Z. Zhu. Diffusion on the probability simplex. arXiv [cs.LG], Sept. 2023

  17. [17]

    F. Gotze. On the rate of convergence in the multivariate CLT.Ann. Probab., 19(2):724–739, 1991

  18. [18]

    R. C. Griffiths. Asymptotic line-of-descent distributions.J. Math. Biol., 21(1):67–75, Dec. 1984

  19. [19]

    Gruver, S

    N. Gruver, S. D. Stanton, N. C. Frey, T. G. J. Rudner, I. Hotzel, J. Lafrance-Vanasse, A. Rajpal, K. Cho, and A. G. Wilson. Protein design with guided discrete diffusion. InThirty-seventh Conference on Neural Information Processing Systems, Nov. 2023

  20. [20]

    X. Han, S. Kumar, and Y . Tsvetkov. SSD-LM: Semi-autoregressive simplex-based diffusion language model for text generation and modular control.arXiv [cs.CL], Oct. 2022

  21. [21]

    B. L. Hie, V . R. Shanker, D. Xu, T. U. J. Bruun, P. A. Weidenbacher, S. Tang, W. Wu, J. E. Pak, and P. S. Kim. Efficient evolution of human antibodies from general protein language models. Nat. Biotechnol., 42(2):275–283, Apr. 2023

  22. [22]

    J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 6840–6851. Curran Associates, Inc., 2020

  23. [23]

    F. M. Hoppe. Polya-like urns and the ewens’ sampling formula.J. Math. Biol., 20(1):91–94, Aug. 1984

  24. [24]

    P. A. Jenkins and D. Spanò. Exact simulation of the Wright–Fisher diffusion.Ann. Appl. Probab., 27(3):1478–1509, June 2017

  25. [25]

    Johansson and Others.mpmath: a Python library for arbitrary-precision floating-point arithmetic (version 0.14), Feb

    F. Johansson and Others.mpmath: a Python library for arbitrary-precision floating-point arithmetic (version 0.14), Feb. 2010

  26. [26]

    D. D. Johnson, J. Austin, R. van den Berg, and D. Tarlow. Beyond in-place corruption: Insertion and deletion in denoising probabilistic models. InICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models, 2021

  27. [27]

    M. Kimura. Solution of a process of random genetic drift with a continuous model.Proc. Natl. Acad. Sci. U. S. A., 41(3):144–150, Mar. 1955

  28. [28]

    B. Li, Z. Gao, and L. Xu. Unifying continuous and discrete text diffusion with non-simultaneous diffusion processes.arXiv [cs.CL], May 2025

  29. [29]

    Z. Li, Y . Ni, G. Xia, W. Beardall, A. Das, G.-B. Stan, and Y . Zhao. Absorb & escape: Overcoming single model limitations in generating heterogeneous genomic sequences.Advances in Neural Information Processing Systems, 37:21949–21978, 2024

  30. [30]

    Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin, R. Verkuil, O. Kabeli, Y . Shmueli, A. dos Santos Costa, M. Fazel-Zarandi, T. Sercu, S. Candido, and A. Rives. Evolutionary-scale 12 prediction of atomic-level protein structure with a language model.Science, 379(6637):1123– 1130, 2023. doi: 10.1126/science.ade2574. URL https://www.science.org/d...

  31. [31]

    Lou and S

    A. Lou and S. Ermon. Reflected diffusion models.ICML, abs/2304.04740:22675–22701, Apr. 2023

  32. [32]

    A. Lou, C. Meng, and S. Ermon. Discrete diffusion modeling by estimating the ratios of the data distribution. In41 st International Conference on Machine Learning, Oct. 2023

  33. [33]

    S. Luo, Y . Su, X. Peng, S. Wang, J. Peng, and J. Ma. Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. InAdvances in Neural Information Processing Systems 35. Cold Spring Harbor Laboratory, July 2022

  34. [34]

    R. K. Mahabadi, H. Ivison, J. Tae, J. Henderson, I. Beltagy, M. E. Peters, and A. Cohan. TESS: Text-to-text self-conditioned simplex diffusion.arXiv [cs.CL], May 2023

  35. [35]

    J. W. Miller. Asymptotic normality, concentration, and coverage of generalized posteriors. arXiv [math.ST], July 2019

  36. [36]

    J. Ou, S. Nie, K. Xue, F. Zhu, J. Sun, Z. Li, and C. Li. Your absorbing discrete diffusion secretly models the conditional distributions of clean data.arXiv [cs.LG], June 2024

  37. [37]

    Raghu, S

    A. Raghu, S. W. Ober, M. Kazman, and H. Elliott. Guided sequence-structure generative modeling for iterative antibody optimization. InICLR 2025 Workshop on Generative and Experimental Perspectives for Biomolecular Design, 2025

  38. [38]

    P. H. Richemond, S. Dieleman, and A. Doucet. Categorical SDEs with simplex diffusion.arXiv [cs.LG], Oct. 2022

  39. [39]

    H. Robbins. A remark on stirling’s formula.Am. Math. Mon., 62(1):26, Jan. 1955

  40. [40]

    S. S. Sahoo, M. Arriola, Y . Schiff, A. Gokaslan, E. Marroquin, J. T. Chiu, A. Rush, and V . Kuleshov. Simple and effective masked diffusion language models.arXiv [cs.CL], June 2024

  41. [41]

    S. S. Sahoo, J. Deschenaux, A. Gokaslan, G. Wang, J. Chiu, and V . Kuleshov. The diffusion duality.arXiv [cs.LG], June 2025

  42. [42]

    Sarkar, Z

    A. Sarkar, Z. Tang, C. Zhao, and P. K. Koo. Designing DNA with tunable regulatory activity using discrete diffusion.bioRxiv, page 2024.05.23.595630, May 2024

  43. [43]

    Shabalin, V

    A. Shabalin, V . Meshchaninov, and D. Vetrov. Smoothie: Smoothing diffusion on token embeddings for text generation.arXiv [cs.CL], May 2025

  44. [44]

    J. Shi, K. Han, Z. Wang, A. Doucet, and M. K. Titsias. Simplified and generalized masked diffusion for discrete data.arXiv [cs.LG], June 2024

  45. [45]

    Stark, B

    H. Stark, B. Jing, C. Wang, G. Corso, B. Berger, R. Barzilay, and T. Jaakkola. Dirichlet flow matching with applications to DNA sequence design.arXiv [q-bio.BM], Feb. 2024

  46. [46]

    C. Stone. Limit theorems for random walks, birth and death processes, and diffusion processes. Illinois J. Math., 7(4):638–660, Dec. 1963

  47. [47]

    B. E. Suzek, H. Huang, P. McGarvey, R. Mazumder, and C. H. Wu. UniRef: comprehensive and non-redundant UniProt reference clusters.Bioinformatics, 23(10):1282–1288, May 2007

  48. [48]

    S. Tang, Y . Zhang, A. Tong, and P. Chatterjee. Gumbel-softmax flow matching with straight- through guidance for controllable biological sequence generation.arXiv [cs.LG], Mar. 2025

  49. [49]

    S. Tavaré. Line-of-descent and genealogical processes, and their applications in population genetics models.Theor. Popul. Biol., 26(2):119–164, Oct. 1984

  50. [50]

    A. W. van der Vaart.Asymptotic Statistics. 1998. 13

  51. [51]

    X. Wang, Z. Zheng, F. Ye, D. Xue, S. Huang, and Q. Gu. Diffusion language models are versatile protein learners.ICML, abs/2402.18567, Feb. 2024

  52. [52]

    X. Wang, Z. Zheng, F. Ye, D. Xue, S. Huang, and Q. Gu. DPLM-2: A multimodal diffusion protein language model.arXiv [cs.LG], Oct. 2024

  53. [53]

    Winkler, L

    L. Winkler, L. Richter, and M. Opper. Bridging discrete and continuous state spaces: Exploring the ehrenfest process in time-continuous diffusion models.arXiv [stat.ML], May 2024

  54. [54]

    R. Wu, F. Ding, R. Wang, R. Shen, X. Zhang, S. Luo, C. Su, Z. Wu, Q. Xie, B. Berger, J. Ma, and J. Peng. High-resolutionde novostructure prediction from primary sequence.bioRxiv, page 2022.07.21.500999, July 2022

  55. [55]

    K. K. Yang, N. Fusi, and A. X. Lu. Convolutions are competitive with transformers for protein sequence pretraining.Cell Systems, 15(3):286–294, 2024

  56. [56]

    Diffusion Models are Evolutionary Algorithms

    Y . Zhang, B. Hartl, H. Hazan, and M. Levin. Diffusion models are evolutionary algorithms. arXiv preprint arXiv:2410.02543, 2024

  57. [57]

    denoise” sequences; we call the choice of inputs and outputs of these neural networks the “parameterization

    K. Zheng, Y . Chen, H. Mao, M.-Y . Liu, J. Zhu, and Q. Zhang. Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling.arXiv [cs.LG], Sept. 2024. 14 A Extended related work We add more related work beyond those in Sec. 2. Classical theories unifying discrete and continuous stochastic processesThere is a ...

  58. [58]

    argmaxpw tq, and x“z 0 “w 0, they state “Since the transition zt Ñz s is Markov, we get: qpzs |w t, zt, xq “qpz s |z t, xq

    has a similar idea, swapping the softmax for an asymmetric transformation and Gaussian diffusion with reflected Gaussian diffusion. With these simplifications however, the process is exactly (reflected) Gaussian diffusion except the input to the neural network is transformed onto a simplex; in particular, it doesn’t interact with the topology of the simpl...

  59. [59]

    Decompose Λ“ηVdiagp ⃗λ{ηqV T for a matrix VPR Bˆr with orthonormal columns, a vector λ of eigenvalues, and a scalar ηąmax i λi to be chosen later

    Below we simply assume that1is not orthogonal to the top eigenspace ofΛ. Decompose Λ“ηVdiagp ⃗λ{ηqV T for a matrix VPR Bˆr with orthonormal columns, a vector λ of eigenvalues, and a scalar ηąmax i λi to be chosen later. For an orthonormal matrix UPR rˆr to be chosen later, define ˜V“ « Vdiagp ⃗λ{ηq1{2 UpI´diagp ⃗λ{ηqq1{2 ff so ˜V has orthonormal columns. ...

  60. [60]

    (Convergence of marginals)⃗ xζ t ⇝⃗ zt for eacht. 37

  61. [61]

    (Local uniform convergence of conditionals) Conditional distributions exist such that for each ⃗ vPRr, săt , and bounded compactly supported measurable function f, there is an ϵą0 , such that sup }⃗ w´⃗ v}ăϵ |E⃗ xζ t |⃗ xζ s “⃗ wf´E ⃗ zt|⃗ zs“⃗ wf| Ñ0

  62. [62]

    500 and predicts a positive 250-dimensional vector that represents the predicted “accessibility-profile

    (Tightness) For every ra, bs Ă p0,1q , there are β, θ, Mą0 such that for all s, tP ra, bs , supζąM E}⃗ xζ s ´⃗ xζ t }β ăCps´tq θ. Then, with the topology of convergence on compact sets11, the paths converge in distribution p⃗ xζ t qtPp0,1q ⇝p⃗ ztqtPp0,1q. Proof. Pick a compact set ra, bs Ă p0,1q . We show p⃗ xζ t qtPra,bs ⇝p⃗ ztqtPra,bs. Say p⃗ xζm t qtPr...