Recognition: 2 theorem links
· Lean TheoremHow Out-of-Equilibrium Phase Transitions can Seed Pattern Formation in Trained Diffusion Models
Pith reviewed 2026-05-15 08:33 UTC · model grok-4.3
The pith
Pattern formation in trained diffusion models arises as an out-of-equilibrium phase transition triggered by instabilities in low-frequency denoising modes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Pattern formation in trained diffusion models can be explained as an out-of-equilibrium phase transition driven by instabilities in the denoising dynamics. The framework connects data symmetries and architectural constraints such as locality and translation equivariance to the emergence of collective spatial modes. Structure forms when low-frequency modes become unstable, producing a rapid growth of spatial correlations that organizes noise into coherent patterns. This is confirmed analytically in patch-based models and experimentally in trained convolutional models on Fashion-MNIST and large-scale ImageNet models, where the transition coincides with a peak in correlation length and a clear,
What carries the argument
Softening of low-frequency modes at a critical denoising time, which triggers exponential growth of spatial correlations through instabilities linked to locality and translation equivariance.
If this is right
- In patch-based models a sharp rise in correlation length occurs at the analytically predicted critical time together with mode softening.
- Trained convolutional models on Fashion-MNIST exhibit the same signatures of correlation growth and low-frequency weakening.
- Large-scale ImageNet diffusion models show pattern formation coinciding with a peak in estimated correlation length and pronounced weakening of spatial modes.
- Applying classifier guidance exactly at the identified critical stage produces significantly better class alignment than guidance applied at random times.
Where Pith is reading between the lines
- The critical-time view suggests sampling algorithms could allocate most steps before and after the transition while using fewer steps exactly at the unstable point to save compute.
- Models with different locality constraints or symmetry-breaking layers might shift or suppress the transition, offering a route to control the scale of generated patterns.
- The same instability mechanism could appear in other iterative generative processes that combine local updates with global data constraints.
Load-bearing premise
Instabilities specifically in low-frequency modes, arising from data symmetries and constraints like locality and translation equivariance, are the primary driver of rapid spatial correlation growth and pattern formation.
What would settle it
Observing pattern formation in a model with the same architecture but no corresponding softening of low-frequency modes or mismatch between the predicted critical time and the observed onset of correlation growth.
Figures
read the original abstract
Diffusion models generate structure by progressively transforming noise into data, yet the mechanisms underlying this transition remain poorly understood. In this work, we show that pattern formation in trained diffusion models can be explained as an out-of-equilibrium phase transition driven by instabilities in the denoising dynamics. We develop a theoretical framework linking data symmetries and architectural constraints, such as locality and translation equivariance, to the emergence of collective spatial modes. In this view, structure arises when low-frequency modes become unstable, triggering a rapid growth of spatial correlations that organizes noise into coherent patterns. We validate this theory through a combination of analytical models and experiments. In a controlled patch-based model, we observe a sharp increase in correlation length and a simultaneous softening of low-frequency modes at a well-defined critical time, accurately predicted by theory. Similar signatures are found in trained convolutional diffusion models on Fashion-MNIST and in large-scale ImageNet models, where pattern formation coincides with a peak in estimated correlation length and a pronounced weakening of spatial modes. Finally, intervention experiments show that applying guidance precisely at this critical stage significantly improves class alignment compared to applying it at random times, demonstrating that this regime is not only descriptive but functionally important.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that pattern formation in trained diffusion models arises as an out-of-equilibrium phase transition driven by instabilities in the denoising dynamics. Data symmetries combined with architectural constraints (locality, translation equivariance) render low-frequency modes unstable at a critical time, triggering rapid growth in spatial correlations that organizes noise into coherent patterns. This is supported by an analytical patch-based model that predicts the critical time, matching observed correlation-length peaks and mode softening; analogous signatures appear in convolutional models trained on Fashion-MNIST and ImageNet; and guidance applied precisely at the critical stage improves class alignment relative to random timing.
Significance. If the central claim is upheld, the work supplies a physics-motivated account of structure emergence in diffusion models that could inform sampling schedules, guidance strategies, and architectural choices. The analytical prediction plus cross-scale empirical signatures and functional intervention constitute a coherent package; however, the causal specificity of low-frequency instabilities remains correlational rather than isolated.
major comments (2)
- [Patch-based analytical model] Patch-based model: the critical time is listed among the free parameters, which undercuts the claim that the observed correlation-length jump and mode softening are strict predictions from symmetries and constraints alone rather than post-hoc matching.
- [Intervention experiments] Intervention experiments: applying guidance at the critical time improves alignment, yet the design does not ablate low-frequency modes while preserving other dynamics; therefore the result does not rule out that the same temporal window is special for independent reasons (overall SNR, emergence of any coherent structure, or conditioning sensitivity).
minor comments (2)
- [Abstract] Abstract and methods: provide an explicit definition and estimation procedure for correlation length, including any smoothing or windowing choices, so that the reported peaks can be reproduced.
- [Experiments] Supplementary material: include the full derivation of the mode-softening prediction and quantitative error bars or statistical tests for the experimental matches on Fashion-MNIST and ImageNet.
Simulated Author's Rebuttal
We thank the referee for their insightful comments, which help clarify the presentation of our results. We respond to each major comment below and indicate the corresponding revisions.
read point-by-point responses
-
Referee: [Patch-based analytical model] Patch-based model: the critical time is listed among the free parameters, which undercuts the claim that the observed correlation-length jump and mode softening are strict predictions from symmetries and constraints alone rather than post-hoc matching.
Authors: We appreciate the referee highlighting this point. In the patch-based model the critical time is obtained by solving the linear stability condition for the onset of instability in the low-frequency modes; this condition is expressed directly in terms of the data symmetry parameters and the locality scale of the convolutional kernel. The manuscript lists the resulting expression among the model parameters for notational convenience, but it is not adjusted to fit the observed jump. We will revise the relevant section to include the explicit derivation of the critical time from the instability criterion and to state that no post-hoc fitting is performed. revision: yes
-
Referee: [Intervention experiments] Intervention experiments: applying guidance at the critical time improves alignment, yet the design does not ablate low-frequency modes while preserving other dynamics; therefore the result does not rule out that the same temporal window is special for independent reasons (overall SNR, emergence of any coherent structure, or conditioning sensitivity).
Authors: We agree that the guidance intervention demonstrates functional importance of the critical window but does not isolate low-frequency instabilities from other time-dependent factors. Performing a clean ablation of specific modes while leaving the remainder of the dynamics unchanged is technically difficult in the full model. We will add an explicit discussion of this limitation in the revised manuscript and note that the observed improvement is consistent with the proposed mechanism while remaining correlational; we will also suggest targeted mode-ablation experiments as future work. revision: partial
Circularity Check
No significant circularity; derivation from symmetries remains independent
full rationale
The paper builds its central claim by linking stated data symmetries and architectural constraints (locality, translation equivariance) to the emergence of unstable low-frequency modes via an analytical patch-based model. The critical time and correlation-length jump are derived as predictions from that model and then compared against observations in trained convolutional networks on Fashion-MNIST and ImageNet. The guidance-timing intervention supplies an external functional test rather than a re-fit of the same quantities. No equation or step reduces the claimed prediction to a post-hoc fit of the validation data, nor does any load-bearing premise rest on a self-citation chain whose content is itself unverified. The derivation chain is therefore self-contained against the supplied symmetries and constraints.
Axiom & Free-Parameter Ledger
free parameters (1)
- critical time
axioms (1)
- domain assumption Data symmetries and architectural constraints (locality, translation equivariance) determine the emergence and instability of collective spatial modes.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
λ(k, t) =−r(t)−κ(t)k² +O(k⁴) … critical time tc defined by λ(0,tc)=0 or equivalently r(tc)=0
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
patch score model … tc ≈ log(1+|Ω|)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Concurrence of Symmetry Breaking and Nonlocality Phase Transitions in Diffusion Models
Symmetry breaking and nonlocality phase transitions occur nearly simultaneously during diffusion model generation in modern transformers.
Reference graph
Works this paper leans on
-
[1]
Beatrice Achilli, Marco Benedetti, Giulio Biroli, and Marc Mézard. Theory of speciation transitions in diffusion models with general class structure.arXiv preprint arXiv:2602.04404,
-
[2]
doi: 10.48550/arXiv.2602.04404
-
[3]
Luca Ambrogioni. The statistical thermodynamics of generative diffusion models: Phase transitions, symmetry breaking, and critical instability.Entropy, 27(3):291, 2025. doi: 10.3390/ e27030291
work page 2025
-
[4]
Dynamical regimes of diffusion models.Nature Communications, 15:9957, 2024
Giulio Biroli, Tony Bonnaire, Valentin de Bortoli, and Marc Mézard. Dynamical regimes of diffusion models.Nature Communications, 15:9957, 2024. doi: 10.1038/s41467-024-54281-3
-
[5]
Sampling from the sherrington– kirkpatrick gibbs measure via algorithmic stochastic localization
Ahmed El Alaoui, Andrea Montanari, and Mark Sellke. Sampling from the sherrington– kirkpatrick gibbs measure via algorithmic stochastic localization. In2022 IEEE 63rd Annual Symposium on F oundations of Computer Science, pp. 323–334, 2022. doi: 10.1109/FOCS54457. 2022.00038
-
[6]
Maria Esteban-Casadevall, Rafal Karczewski, Alison Pouplin, Søren Hauberg, and Erik J. Bekkers. On the fisher geometry of diffusion models’ latent space. InICLR 2026 Workshop on Geometry-grounded Representation Learning and Generative Modeling, 2026
work page 2026
-
[7]
Davide Ghio, Yatin Dandi, Florent Krzakala, and Lenka Zdeborová. Sampling with flows, diffusion, and autoregressive neural networks from a spin-glass perspective.Proceedings of the National Academy of Sciences, 121(27):e2311810121, 2024. doi: 10.1073/pnas.2311810121
-
[8]
The entropic signature of class speciation in diffusion models.arXiv preprint arXiv:2602.09651,
Florian Handke, Dejan Stanˇcevi´c, Felix Koulischer, Thomas Demeester, and Luca Ambrogioni. The entropic signature of class speciation in diffusion models.arXiv preprint arXiv:2602.09651,
-
[9]
doi: 10.48550/arXiv.2602.09651
-
[10]
Classifier-Free Diffusion Guidance
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022. doi: 10.48550/arXiv.2207.12598
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2207.12598 2022
-
[11]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, volume 33, pp. 6840–6851, 2020
work page 2020
-
[12]
Leo P. Kadanoff. Scaling laws for ising models near tc.Physics, 2(6):263–272, 1966. doi: 10.1103/PhysicsPhysiqueFizika.2.263
-
[13]
An analytic theory of creativity in convolutional diffusion models
Mason Kamb and Surya Ganguli. An analytic theory of creativity in convolutional diffusion models. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, 2025
work page 2025
-
[14]
Elucidating the design space of diffusion-based generative models
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. InAdvances in Neural Information Processing Systems, volume 35, pp. 26565–26577, 2022
work page 2022
-
[15]
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Analyzing and improving the training dynamics of diffusion models.arXiv preprint arXiv:2312.02696, 2024
-
[16]
Tom W. B. Kibble. Topology of cosmic domains and strings.Journal of Physics A: Mathematical and General, 9(8):1387–1398, 1976. doi: 10.1088/0305-4470/9/8/029
-
[17]
Lev D. Landau. On the theory of phase transitions.Zhurnal Eksperimental’noi i Teoreticheskoi Fiziki, 7:19–32, 1937. 10
work page 1937
-
[18]
Critical windows: Non-asymptotic theory for feature emergence in diffusion models
Marvin Li and Sitan Chen. Critical windows: Non-asymptotic theory for feature emergence in diffusion models. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pp. 27474–27498, 2024
work page 2024
-
[19]
Blink of an eye: A simple theory for feature localization in generative models
Marvin Li, Aayush Karan, and Sitan Chen. Blink of an eye: A simple theory for feature localization in generative models. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pp. 35047– 35080, 2025
work page 2025
-
[20]
Posterior sampling in high dimension via diffusion processes
Andrea Montanari and Yuchen Wu. Posterior sampling in high dimension via diffusion processes. arXiv preprint, 2023. doi: 10.48550/arXiv.2304.11449
-
[21]
Maxime Oquab, Timothée Darcet, Theo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jégou, and Piotr Bojanowski....
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[22]
Gabriel Raya and Luca Ambrogioni. Spontaneous symmetry breaking in generative diffusion models.Journal of Statistical Mechanics: Theory and Experiment, 2024(10):104025, 2024. doi: 10.1088/1742-5468/ad64bd
-
[23]
The geometry of diffusion models: Tubular neighbourhoods and singularities
Kotaro Sakamoto, Ryosuke Sakamoto, Masato Tanabe, Masatomo Akagawa, Yusuke Hayashi, Manato Yaguchi, Masahiro Suzuki, and Yutaka Matsuo. The geometry of diffusion models: Tubular neighbourhoods and singularities. InICML 2024 Workshop on Geometry-grounded Representation Learning and Generative Modeling, 2024
work page 2024
-
[24]
Antonio Sclocchi, Alessandro Favero, Noam Itzhak Levi, and Matthieu Wyart. Probing the latent hierarchical structure of data via diffusion models.Journal of Statistical Mechanics: Theory and Experiment, 2025(8):084005, 2025. doi: 10.1088/1742-5468/aded6c
-
[25]
Antonio Sclocchi, Alessandro Favero, and Matthieu Wyart. A phase transition in diffusion models reveals the hierarchical nature of data.Proceedings of the National Academy of Sciences, 122(1):e2408799121, 2025. doi: 10.1073/pnas.2408799121
-
[26]
Deep unsuper- vised learning using nonequilibrium thermodynamics
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InProceedings of the 32nd International Conference on Machine Learning, pp. 2256–2265, 2015
work page 2015
-
[27]
Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021
work page 2021
-
[28]
Eugene Stanley.Introduction to Phase Transitions and Critical Phenomena
H. Eugene Stanley.Introduction to Phase Transitions and Critical Phenomena. Clarendon Press, Oxford, 1971
work page 1971
-
[29]
Kenneth G. Wilson. Renormalization group and critical phenomena. i. renormalization group and the kadanoff scaling picture.Physical Review B, 4(9):3174–3183, 1971. doi: 10.1103/ PhysRevB.4.3174
work page 1971
-
[30]
Kenneth G. Wilson and Michael E. Fisher. Critical exponents in 3.99 dimensions.Physical Review Letters, 28(4):240–243, 1972. doi: 10.1103/PhysRevLett.28.240
-
[31]
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747, 2017. doi: 10.48550/arXiv.1708.07747. 11 A Explicit Ginzburg–Landau Parameters and mean-field critical time The parameters of the coarse–grained Ginzburg–Landau (GL) description can be written exp...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1708.07747 2017
-
[32]
3.Translation equivariance:s t(τax) =τ ast(x)for all lattice shiftsa
Locality: there exists a finite radius R such that st,i(x) depends only on {xi+u :u∈Ω R}. 3.Translation equivariance:s t(τax) =τ ast(x)for all lattice shiftsa. 4.LocalZ 2 symmetry:s t(−x) =−s t(x). We expand the dynamics around a translation-invariant symmetric branch, which without loss of generality we take to bex= 0. The linearization is given by a Jac...
work page 2000
-
[33]
Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.