arxiv: 2604.10961 · v1 · submitted 2026-04-13 · ❄️ cond-mat.stat-mech

Recognition: unknown

Dynamical Regimes of Discrete Diffusion Models

Tomoei Takahashi , Takashi Takahashi , Yoshiyuki Kabashima

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:18 UTC · model grok-4.3

classification ❄️ cond-mat.stat-mech

keywords discrete diffusion modelsspeciation transitioncollapse transitionstatistical mechanicsIsing modelbackward dynamicsphase transitionsrandom energy model

0 comments

The pith

A simple effective model for discrete diffusion on Ising data shows that backward dynamics feature a speciation transition via second-order phase transition and a collapse transition via random-energy-model condensation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the same two dynamical transitions seen in continuous diffusion models also appear in discrete versions by building and analyzing an effective statistical-mechanics model trained on two-class Ising variables with arbitrary mixture ratios. A sympathetic reader would care because these transitions mark the moments when generated samples first acquire the global structure of the training data and later lock onto individual training examples, directly affecting sample quality and diversity in practical discrete generative models. The authors derive an explicit expression for the speciation time whose scaling matches the continuous case once the noise schedule increases with time, exactly as used in real diffusion training. Numerical simulations of the model and experiments on trained discrete diffusion networks on real datasets confirm the predicted transition locations and scalings. This indicates that the original continuous-data framework remains applicable once the appropriate effective description is chosen.

Core claim

In the proposed effective model the backward dynamics undergo a speciation transition that is located by a second-order phase-transition calculation via high-temperature expansion, while the later collapse transition is identified with a condensation transition in the random energy model; an analytical formula for the speciation time is obtained whose scaling becomes identical to the continuous-data result when the noise level grows with time.

What carries the argument

The simple effective model of discrete diffusion trained on two-class Ising variables with general mixture ratio, whose backward process is analyzed by high-temperature expansion for the speciation transition and by the random energy model for the collapse transition.

If this is right

The speciation transition time can be computed analytically from a high-temperature expansion of the effective model.
The collapse transition is equivalent to a condensation transition whose statistics are given by the random energy model.
When the noise schedule increases with time, the scaling of the speciation time matches the scaling already known for continuous diffusion models.
The same statistical-mechanics criteria used for continuous data therefore locate both transitions in the discrete setting.
Trained discrete diffusion models on real data sets exhibit the predicted transitions, confirming the effective-model description.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adjusting the noise schedule could be used to control the separation between speciation and collapse times in discrete generative models.
The same phase-transition analysis may extend to multi-class or higher-dimensional discrete data once an analogous effective model is constructed.
Sampling strategies that deliberately slow the collapse transition might improve diversity without sacrificing fidelity.
The framework supplies a quantitative route to predict how changes in training data statistics affect generation quality in discrete diffusion.

Load-bearing premise

The proposed simple effective model on two-class Ising data with a mixture ratio sufficiently captures the backward dynamics of general discrete diffusion models on real datasets.

What would settle it

If the analytically predicted speciation time or the locations of both transitions deviate measurably from those observed in numerical simulations of the effective model or in experiments with trained discrete diffusion networks on real data sets, the claimed equivalence of the discrete and continuous frameworks would be falsified.

Figures

Figures reproduced from arXiv: 2604.10961 by Takashi Takahashi, Tomoei Takahashi, Yoshiyuki Kabashima.

**Figure 1.** Figure 1: Comparison of the results for class-balanced setting and class-imbalanced setting of tS with N = 10000. We have set m = 0.8 and β = 10−4 in both cases. The orange plots represent trajectories with mt > 0 at t = 0, while the light-blue plots represent those with mt < 0 at t = 0. The number of displayed trajectories is 20 in total, combining the positive and negative classes for both the class-balanced and c… view at source ↗

**Figure 2.** Figure 2: Cloning probability ϕ(t) plotted as a function of the rescaled time t/tS for increasing system sizes. For both cases we have set m = 1 and β = 10−4 . (a) Results for the balanced data case with η = 0.5. The intersection value of the four ϕ(t) curves is 0.770. (b) Results for the imbalanced data case with η = 0.9. The intersection value of the four ϕ(t) curves is 0.967. not on the specific configuration. Th… view at source ↗

**Figure 3.** Figure 3: Comparison between the collapse time t REM C , obtained as the numerical solution of st = 0, and the general criterion Eq. (30), for both the theoretical and empirical values of the Shannon entropy density S(t). For all cases we set m = 0.5, η = 0.5, and β = 5×10−4 (a): Comparison between ∆S(t) and the collapse time t REM C obtained as the numerical solution of st = 0. The solid navy-blue curve shows ∆S(t)… view at source ↗

**Figure 4.** Figure 4: Trajectories of generated data in the backward process of D3PM on BinMNIST, together with the theoretical prediction of tS given by Eq. (83), shown as a dashed vertical line. The orange trajectories represent conditional generation for label 1, M (1) t , while the light blue trajectories correspond to conditional generation for label 8, M (8) t . Ten trajectories are shown for each label. The result of the… view at source ↗

**Figure 5.** Figure 5: Empirical cloning probability ϕ e S (t) for different pair of labels of BinMNIST. The horizontal axis represents time rescaled by tS, we use the expression given in Eq. (27). The blue, yellow, and green curves correspond to the label pairs (1, 8), (0, 4), and (6, 7), respectively. For each pair, to let η = 0.5, ϕ e S (t) is computed using the same number of training samples, matched to the smaller class in… view at source ↗

**Figure 6.** Figure 6: (a): Backward dynamics of the entropy difference ∆S e (t) for the three movie groups. The sample size at each time step (nsample in Eq. (37)) is set to 10000. (b): Enlarged view of the region around ∆S e (t) = 0 extracted from panel (a). Solid lines show a centered moving average over 50 time steps, while thin dotted lines indicate the corresponding raw trajectories. Vertical dashed lines denote the empiri… view at source ↗

**Figure 7.** Figure 7: The generated images of BinMNIST of the label 1 and 8 at every 100 timesteps as well as at t = tS. Since images are generated only at integer timesteps, the sample displayed at t = tS corresponds to t = 208, given that the speciation time is estimated as tS = 207.91 according to Sec. 6.1 of the main text. For each label, we select from the 10 generated image sequences the one whose overlap with the corresp… view at source ↗

read the original abstract

Diffusion models generate high-dimensional data such as images by learning a process that gradually removes noise from corrupted data. Recent studies have shown that the backward dynamics of diffusion models exhibit two characteristic transitions: the speciation transition, at which generated samples begin to capture the global structure of the training data, and the collapse transition, at which the generation dynamics starts committing to individual training samples. While these transitions have been theoretically analyzed for continuous data, the same theoretical criteria have not been applied for discrete diffusion models, which are diffusion models for discrete data. In this work, we propose a simple effective model for discrete diffusion models trained on two-class Ising variable data with a general mixture ratio and analyze its backward dynamics using methods from statistical mechanics. We show that, as in the previous study on continuous data, the speciation transition can be determined through a second-order phase transition analysis using high-temperature expansion, while the collapse transition corresponds to a condensation transition described by the Random Energy Model. An analytical expression for the speciation time is obtained, and we show that its scaling becomes consistent with that of the continuous case when the noise increases with time as in practical diffusion models. These theoretical predictions are confirmed by numerical simulations and experiments with trained discrete diffusion models on real datasets. These results suggest that the original theoretical framework for continuous data remain valid for discrete data, and may provide a useful starting point for the statistical-mechanics analysis of discrete generative diffusion in more realistic settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper maps discrete diffusion to a simple two-class Ising model, derives speciation time analytically via high-T expansion and REM, and shows scaling matches the continuous case under practical noise schedules, with simulation and real-data experiment checks.

read the letter

The main thing to know is that this work carries the speciation and collapse transition framework over to discrete diffusion models. They build a reduced effective Ising model for two-class binary data at arbitrary mixture ratios, locate the speciation point as a second-order transition with high-temperature expansion, treat collapse as REM condensation, and obtain an explicit speciation-time formula whose scaling agrees with the continuous case once noise increases with time. Numerical simulations and experiments on trained discrete diffusion models using real datasets are included to support the predictions. This is new relative to the cited continuous literature, and the derivations plus the consistency result are the clearest contributions. The experiments add some grounding that the transitions appear in practice. The soft spot is the effective model itself. It is deliberately simple—binary variables with a mixture ratio—and real discrete data often involve higher-cardinality or structured variables whose correlations could move the transition points or weaken the mapping. The paper reports confirmation on real datasets, so the concern is not ignored, but the quantitative tightness of the match between predicted and observed times is the part that would benefit from closer referee scrutiny. No obvious circularity or free parameters in the derivations. This is for readers working on statistical-mechanics approaches to generative models or on analytical understanding of discrete diffusion. Someone already using phase-transition or REM tools on continuous diffusion would get direct value from the extension and the scaling result. It deserves a serious referee because the methods are standard and reproducible, the experiments are present, and the central claim is falsifiable. I would send it to review rather than desk-reject, with the expectation that referees will ask for more detail on how well the Ising reduction holds for the specific real datasets used.

Referee Report

1 major / 1 minor

Summary. The paper proposes a simple effective model for discrete diffusion models trained on two-class Ising variable data with a general mixture ratio. It analyzes the backward dynamics using statistical mechanics methods, showing that the speciation transition corresponds to a second-order phase transition analyzed via high-temperature expansion, while the collapse transition is a condensation transition described by the Random Energy Model. An analytical expression for the speciation time is derived, with its scaling shown to match the continuous-data case when noise increases with time. These predictions are confirmed by numerical simulations and experiments with trained discrete diffusion models on real datasets, suggesting that the theoretical framework developed for continuous data remains valid for discrete diffusion models.

Significance. If the effective model is representative, the work extends the statistical-mechanics analysis of dynamical transitions in diffusion models from continuous to discrete settings. The use of high-temperature expansion and REM provides analytical expressions and scaling relations that are confirmed numerically and experimentally, offering a useful starting point for understanding speciation and collapse in discrete generative models. This could inform training schedules and sampling strategies for discrete data applications.

major comments (1)

[Abstract and experiments section] The central generalization—that the two-class Ising effective model with mixture ratio captures the backward dynamics of general discrete diffusion models on real (higher-dimensional, non-binary) datasets—underpins the claim that the continuous framework remains valid. While the abstract and experiments section report confirmation via trained-model experiments, the quantitative matching of the derived analytical speciation-time expression to observed transition times on complex real data is not sufficiently detailed to fully secure the mapping, especially given possible shifts in transition locations due to correlations absent in the binary-spin model.

minor comments (1)

[Model definition] Notation for the mixture ratio and time-dependent noise schedule should be defined more explicitly at first use to aid readers unfamiliar with the continuous precursor work.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and the recommendation for minor revision. We address the single major comment below.

read point-by-point responses

Referee: [Abstract and experiments section] The central generalization—that the two-class Ising effective model with mixture ratio captures the backward dynamics of general discrete diffusion models on real (higher-dimensional, non-binary) datasets—underpins the claim that the continuous framework remains valid. While the abstract and experiments section report confirmation via trained-model experiments, the quantitative matching of the derived analytical speciation-time expression to observed transition times on complex real data is not sufficiently detailed to fully secure the mapping, especially given possible shifts in transition locations due to correlations absent in the binary-spin model.

Authors: We agree that the quantitative aspects of the comparison between the effective-model predictions and the trained-model experiments on real data merit additional detail to better substantiate the generalization. In the revised manuscript we will expand the experiments section with explicit tabulated values of the observed speciation times extracted from the trained discrete diffusion models (on datasets such as binarized MNIST and similar discrete image data) together with direct overlays of the analytical expression derived from the high-temperature expansion. We will also add a short discussion of the expected quantitative shifts arising from higher-order correlations and non-binary structure absent in the two-class Ising effective model, while noting that the scaling with noise schedule remains consistent with the continuous case. These additions will strengthen the evidence without changing the central claim that the statistical-mechanics framework carries over to discrete settings. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations on proposed effective model are independent

full rationale

The paper introduces a new effective model for discrete diffusion on two-class Ising variables with general mixture ratio, then applies standard high-temperature expansion to identify the speciation transition as a second-order phase transition and the Random Energy Model to identify the collapse transition. An analytical expression for speciation time is derived directly from this model, with scaling shown to match the continuous case under time-dependent noise. These steps constitute independent analysis on the proposed model rather than any reduction of outputs to fitted inputs, self-definitions, or load-bearing self-citations by construction. Confirmation via separate numerical simulations and experiments on trained models with real datasets supplies external benchmarks, rendering the derivation chain self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the validity of a simplified effective model for two-class Ising data and the direct applicability of high-temperature expansion and Random Energy Model condensation to the diffusion backward process.

axioms (2)

domain assumption High-temperature expansion accurately captures the second-order phase transition for the speciation time in the effective model
Invoked to obtain analytical expression for speciation transition
domain assumption Collapse transition maps exactly onto condensation in the Random Energy Model
Used to characterize the collapse transition

pith-pipeline@v0.9.0 · 5555 in / 1523 out tokens · 88296 ms · 2026-05-10T16:18:24.151082+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Concurrence of Symmetry Breaking and Nonlocality Phase Transitions in Diffusion Models
cs.LG 2026-05 unverdicted novelty 7.0

Symmetry breaking and nonlocality phase transitions occur nearly simultaneously during diffusion model generation in modern transformers.

Reference graph

Works this paper leans on

28 extracted references · 3 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Deep unsupervised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. pmlr, 2015

2015
[2]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems , 33:6840–6851, 2020

2020
[3]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems , 32, 2019

2019
[4]

Diffusion models: A comprehensive survey of methods and applications

Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. Diffusion models: A comprehensive survey of methods and applications. ACM computing surveys , 56(4):1–39, 2023

2023
[5]

Why diffusion models don nt memorize: The role of implicit dynamical regularization in training

Tony Bonnaire, Raphaël Urfin, Giulio Biroli, and Marc Mézard. Why diffusion models don nt memorize: The role of implicit dynamical regularization in training. In Advances in Neural Information Processing Systems 38 , 2025

2025
[6]

An analytic theory of creativity in convolutional diffusion models

Mason Kamb and Surya Ganguli. An analytic theory of creativity in convolutional diffusion models. In Forty-second International Conference on Machine Learning , 2025

2025
[7]

Analysis of learning a flow-based generative model from limited sample complexity

Hugo Cui, Florent Krzakala, Eric Vanden-Eijnden, and Lenka Zdeborova. Analysis of learning a flow-based generative model from limited sample complexity. In The Twelfth International Conference on Learning Representations , 2024

2024
[8]

Dynamical regimes of diffusion models

Giulio Biroli, Tony Bonnaire, Valentin De Bortoli, and Marc Mézard. Dynamical regimes of diffusion models. Nature Communications, 15(1):9957, 2024

2024
[9]

Generative diffusion in very large dimensions

Giulio Biroli and Marc Mézard. Generative diffusion in very large dimensions. Journal of Statistical Mechanics: Theory and Experiment , 2023(9):093402, 2023

2023
[10]

Theory of speciation transitions in diffusion models with general class structure.arXiv preprint arXiv:2602.04404,

Beatrice Achilli, Marco Benedetti, Giulio Biroli, and Marc Mézard. Theory of speciation transitions in diffusion models with general class structure. arXiv preprint arXiv:2602.04404 , 2026

work page arXiv 2026
[11]

Memorization and generalization in generative diffusion under the manifold hypothesis

Beatrice Achilli, Luca Ambrogioni, Carlo Lucibello, Marc Mézard, and Enrico Ventura. Memorization and generalization in generative diffusion under the manifold hypothesis. Journal of Statistical Mechanics: Theory and Experiment , 2025(7):073401, 2025

2025
[12]

arXiv preprint arXiv:2410.08727 , year=

Beatrice Achilli, Enrico Ventura, Gianluigi Silvestri, Bao Pham, Gabriel Raya, Dmitry Krotov, Carlo Lucibello, and Luca Ambrogioni. Losing dimensions: Geometric memorization in generative diffusion. arXiv preprint arXiv:2410.08727 , 2024

work page arXiv 2024
[13]

Manifolds, random matrices and spectral gaps: The geometric phases of generative diffusion

Enrico Ventura, Beatrice Achilli, Gianluigi Silvestri, Carlo Lucibello, and Luca Ambrogioni. Manifolds, random matrices and spectral gaps: The geometric phases of generative diffusion. In The Thirteenth International Conference on Learning Representations , 2025

2025
[14]

Emergence of Distortions in High-Dimensional Guided Diffusion Models

Enrico Ventura, Beatrice Achilli, Luca Ambrogioni, and Carlo Lucibello. Emergence of distortions in high-dimensional guided diffusion models. arXiv preprint arXiv:2602.00716 , 2026

work page internal anchor Pith review arXiv 2026
[15]

Diffusion- lm improves controllable text generation

Xiang Li, John Thickstun, Ishaan Gulrajani, Percy S Liang, and Tatsunori B Hashimoto. Diffusion- lm improves controllable text generation. Advances in neural information processing systems , 35:4328–4343, 2022

2022
[16]

Text generation with diffusion language models: A pre-training approach with continuous paragraph denoise

Zhenghao Lin, Yeyun Gong, Yelong Shen, Tong Wu, Zhihao Fan, Chen Lin, Nan Duan, and Weizhu Chen. Text generation with diffusion language models: A pre-training approach with continuous paragraph denoise. In International Conference on Machine Learning , pages 21051– 21064. PMLR, 2023

2023
[17]

Diffusion of thought: Chain-of-thought reasoning in diffusion language models

Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, et al. Diffusion of thought: Chain-of-thought reasoning in diffusion language models. Advances in Neural Information Processing Systems , 37:105345–105374, 2024

2024
[18]

Structured denoising diffusion models in discrete state-spaces

Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tarlow, and Rianne Van Den Berg. Structured denoising diffusion models in discrete state-spaces. Advances in neural information processing systems, 34:17981–17993, 2021

2021
[19]

Gritsenko, Jasmijn Bastings, Ben Poole, Rianne van den Berg, and Tim Salimans

Emiel Hoogeboom, Alexey A. Gritsenko, Jasmijn Bastings, Ben Poole, Rianne van den Berg, and Tim Salimans. Autoregressive diffusion models. In International Conference on Learning Representations, 2022

2022
[20]

P. M. Chaikin and T. C. Lubensky. Principles of Condensed Matter Physics . Cambridge University Press, 1995

1995
[21]

Random-energy model: An exactly solvable model of disordered systems

Bernard Derrida. Random-energy model: An exactly solvable model of disordered systems. Physical Review B , 24(5):2613, 1981

1981
[22]

Exponential capacity of dense associative memories

Carlo Lucibello and Marc Mézard. Exponential capacity of dense associative memories. Physical Review Letters, 132(7):077301, 2024

2024
[23]

Finite-size scaling theory

Vladimir Privman. Finite-size scaling theory. In Finite Size Scaling and Numerical Simulation of Statistical Systems , pages 1–98. World Scientific, 1990

1990
[24]

Gradient-based learning applied to document recognition

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE , 86(11):2278–2324, 1998

1998
[25]

On the quantitative analysis of deep belief networks

Ruslan Salakhutdinov and Iain Murray. On the quantitative analysis of deep belief networks. In Proceedings of the 25th International Conference on Machine Learning , pages 872–879. ACM, 2008

2008
[26]

Movielens dataset

GroupLens Research. Movielens dataset. https://grouplens.org/datasets/movielens/, 2014

2014
[27]

The tag genome: Encoding community knowledge to support novel interaction

Jesse Vig, Shilad Sen, and John Riedl. The tag genome: Encoding community knowledge to support novel interaction. In Proceedings of the International Conference on Intelligent User Interfaces (IUI) , 2012

2012
[28]

Generation meets recommendation: Proposing novel items for groups of users

Vinh Vo Thanh and Harold Soh. Generation meets recommendation: Proposing novel items for groups of users. In Proceedings of the 23rd International Conference on Intelligent User Interfaces (IUI) , pages 73–84. Association for Computing Machinery, 2018

2018