pith. machine review for the scientific record. sign in

arxiv: 2603.20092 · v5 · submitted 2026-03-20 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

How Out-of-Equilibrium Phase Transitions can Seed Pattern Formation in Trained Diffusion Models

Authors on Pith no claims yet

Pith reviewed 2026-05-15 08:33 UTC · model grok-4.3

classification 💻 cs.LG
keywords diffusion modelspattern formationphase transitionsdenoising dynamicsspatial correlationsout-of-equilibrium systemsgenerative modelingconvolutional architectures
0
0 comments X

The pith

Pattern formation in trained diffusion models arises as an out-of-equilibrium phase transition triggered by instabilities in low-frequency denoising modes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that diffusion models generate coherent images by passing through a critical point where low-frequency spatial modes in the denoising process suddenly become unstable. These instabilities, arising from the model's local architecture and the translation symmetries in the training data, cause spatial correlations to grow rapidly and organize random noise into structured patterns. A sympathetic reader would see this as a physical mechanism that explains why structure emerges abruptly rather than gradually during generation. The theory is tested in controlled patch models where the critical time matches predictions, and similar signatures appear in convolutional models trained on Fashion-MNIST and ImageNet. Guidance applied exactly at this critical stage improves class alignment, showing the transition is functionally relevant for generation quality.

Core claim

Pattern formation in trained diffusion models can be explained as an out-of-equilibrium phase transition driven by instabilities in the denoising dynamics. The framework connects data symmetries and architectural constraints such as locality and translation equivariance to the emergence of collective spatial modes. Structure forms when low-frequency modes become unstable, producing a rapid growth of spatial correlations that organizes noise into coherent patterns. This is confirmed analytically in patch-based models and experimentally in trained convolutional models on Fashion-MNIST and large-scale ImageNet models, where the transition coincides with a peak in correlation length and a clear,

What carries the argument

Softening of low-frequency modes at a critical denoising time, which triggers exponential growth of spatial correlations through instabilities linked to locality and translation equivariance.

If this is right

  • In patch-based models a sharp rise in correlation length occurs at the analytically predicted critical time together with mode softening.
  • Trained convolutional models on Fashion-MNIST exhibit the same signatures of correlation growth and low-frequency weakening.
  • Large-scale ImageNet diffusion models show pattern formation coinciding with a peak in estimated correlation length and pronounced weakening of spatial modes.
  • Applying classifier guidance exactly at the identified critical stage produces significantly better class alignment than guidance applied at random times.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The critical-time view suggests sampling algorithms could allocate most steps before and after the transition while using fewer steps exactly at the unstable point to save compute.
  • Models with different locality constraints or symmetry-breaking layers might shift or suppress the transition, offering a route to control the scale of generated patterns.
  • The same instability mechanism could appear in other iterative generative processes that combine local updates with global data constraints.

Load-bearing premise

Instabilities specifically in low-frequency modes, arising from data symmetries and constraints like locality and translation equivariance, are the primary driver of rapid spatial correlation growth and pattern formation.

What would settle it

Observing pattern formation in a model with the same architecture but no corresponding softening of low-frequency modes or mismatch between the predicted critical time and the observed onset of correlation growth.

Figures

Figures reproduced from arXiv: 2603.20092 by Luca Ambrogioni.

Figure 2
Figure 2. Figure 2: Quantitative analysis of pattern formation dynamics in a trained EDM2 model [ [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 1
Figure 1. Figure 1: Visualization of mode soften￾ing. Low frequency modes lose stabil￾ity at the critical points, leading to long range pattern formation. The key insight of this work is that the structure of these transitions arises from the interplay between symmetries in the data and architectural constraints of the network, which impose locality, sparsity, and invariance constraints. Build￾ing on this perspective, we anal… view at source ↗
Figure 3
Figure 3. Figure 3: Five snapshots of the evolving configuration are shown at logarithmically spaced times [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Quantitative analysis of pattern formation dynamics in patch score model (top), two patterns [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Results of pulse guidance experiment. A) Example image generated by the EDM2 model [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Patch dictionary (top) and a representative reverse denoising trajectory (bottom). Five [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Quantitative analysis of pattern formation dynamics on binarized ConvNets trained on [PITH_FULL_IMAGE:figures/full_fig_p029_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Quantitative analysis of pattern formation dynamics on non-binarized ConvNets trained [PITH_FULL_IMAGE:figures/full_fig_p030_8.png] view at source ↗
read the original abstract

Diffusion models generate structure by progressively transforming noise into data, yet the mechanisms underlying this transition remain poorly understood. In this work, we show that pattern formation in trained diffusion models can be explained as an out-of-equilibrium phase transition driven by instabilities in the denoising dynamics. We develop a theoretical framework linking data symmetries and architectural constraints, such as locality and translation equivariance, to the emergence of collective spatial modes. In this view, structure arises when low-frequency modes become unstable, triggering a rapid growth of spatial correlations that organizes noise into coherent patterns. We validate this theory through a combination of analytical models and experiments. In a controlled patch-based model, we observe a sharp increase in correlation length and a simultaneous softening of low-frequency modes at a well-defined critical time, accurately predicted by theory. Similar signatures are found in trained convolutional diffusion models on Fashion-MNIST and in large-scale ImageNet models, where pattern formation coincides with a peak in estimated correlation length and a pronounced weakening of spatial modes. Finally, intervention experiments show that applying guidance precisely at this critical stage significantly improves class alignment compared to applying it at random times, demonstrating that this regime is not only descriptive but functionally important.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that pattern formation in trained diffusion models arises as an out-of-equilibrium phase transition driven by instabilities in the denoising dynamics. Data symmetries combined with architectural constraints (locality, translation equivariance) render low-frequency modes unstable at a critical time, triggering rapid growth in spatial correlations that organizes noise into coherent patterns. This is supported by an analytical patch-based model that predicts the critical time, matching observed correlation-length peaks and mode softening; analogous signatures appear in convolutional models trained on Fashion-MNIST and ImageNet; and guidance applied precisely at the critical stage improves class alignment relative to random timing.

Significance. If the central claim is upheld, the work supplies a physics-motivated account of structure emergence in diffusion models that could inform sampling schedules, guidance strategies, and architectural choices. The analytical prediction plus cross-scale empirical signatures and functional intervention constitute a coherent package; however, the causal specificity of low-frequency instabilities remains correlational rather than isolated.

major comments (2)
  1. [Patch-based analytical model] Patch-based model: the critical time is listed among the free parameters, which undercuts the claim that the observed correlation-length jump and mode softening are strict predictions from symmetries and constraints alone rather than post-hoc matching.
  2. [Intervention experiments] Intervention experiments: applying guidance at the critical time improves alignment, yet the design does not ablate low-frequency modes while preserving other dynamics; therefore the result does not rule out that the same temporal window is special for independent reasons (overall SNR, emergence of any coherent structure, or conditioning sensitivity).
minor comments (2)
  1. [Abstract] Abstract and methods: provide an explicit definition and estimation procedure for correlation length, including any smoothing or windowing choices, so that the reported peaks can be reproduced.
  2. [Experiments] Supplementary material: include the full derivation of the mode-softening prediction and quantitative error bars or statistical tests for the experimental matches on Fashion-MNIST and ImageNet.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which help clarify the presentation of our results. We respond to each major comment below and indicate the corresponding revisions.

read point-by-point responses
  1. Referee: [Patch-based analytical model] Patch-based model: the critical time is listed among the free parameters, which undercuts the claim that the observed correlation-length jump and mode softening are strict predictions from symmetries and constraints alone rather than post-hoc matching.

    Authors: We appreciate the referee highlighting this point. In the patch-based model the critical time is obtained by solving the linear stability condition for the onset of instability in the low-frequency modes; this condition is expressed directly in terms of the data symmetry parameters and the locality scale of the convolutional kernel. The manuscript lists the resulting expression among the model parameters for notational convenience, but it is not adjusted to fit the observed jump. We will revise the relevant section to include the explicit derivation of the critical time from the instability criterion and to state that no post-hoc fitting is performed. revision: yes

  2. Referee: [Intervention experiments] Intervention experiments: applying guidance at the critical time improves alignment, yet the design does not ablate low-frequency modes while preserving other dynamics; therefore the result does not rule out that the same temporal window is special for independent reasons (overall SNR, emergence of any coherent structure, or conditioning sensitivity).

    Authors: We agree that the guidance intervention demonstrates functional importance of the critical window but does not isolate low-frequency instabilities from other time-dependent factors. Performing a clean ablation of specific modes while leaving the remainder of the dynamics unchanged is technically difficult in the full model. We will add an explicit discussion of this limitation in the revised manuscript and note that the observed improvement is consistent with the proposed mechanism while remaining correlational; we will also suggest targeted mode-ablation experiments as future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation from symmetries remains independent

full rationale

The paper builds its central claim by linking stated data symmetries and architectural constraints (locality, translation equivariance) to the emergence of unstable low-frequency modes via an analytical patch-based model. The critical time and correlation-length jump are derived as predictions from that model and then compared against observations in trained convolutional networks on Fashion-MNIST and ImageNet. The guidance-timing intervention supplies an external functional test rather than a re-fit of the same quantities. No equation or step reduces the claimed prediction to a post-hoc fit of the validation data, nor does any load-bearing premise rest on a self-citation chain whose content is itself unverified. The derivation chain is therefore self-contained against the supplied symmetries and constraints.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Central claim rests on the premise that low-frequency mode instabilities arise directly from data symmetries and architectural constraints such as locality and translation equivariance; no explicit free parameters or invented entities are named in the abstract, but the critical time and correlation length appear to function as fitted or observed quantities.

free parameters (1)
  • critical time
    Time at which low-frequency modes soften and correlation length increases sharply; predicted by theory yet validated against observed peaks in experiments.
axioms (1)
  • domain assumption Data symmetries and architectural constraints (locality, translation equivariance) determine the emergence and instability of collective spatial modes.
    Invoked to link symmetries to the phase-transition mechanism in the theoretical framework.

pith-pipeline@v0.9.0 · 5501 in / 1235 out tokens · 45995 ms · 2026-05-15T08:33:24.511405+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Concurrence of Symmetry Breaking and Nonlocality Phase Transitions in Diffusion Models

    cs.LG 2026-05 unverdicted novelty 7.0

    Symmetry breaking and nonlocality phase transitions occur nearly simultaneously during diffusion model generation in modern transformers.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · cited by 1 Pith paper · 3 internal anchors

  1. [1]

    Theory of speciation transitions in diffusion models with general class structure.arXiv preprint arXiv:2602.04404,

    Beatrice Achilli, Marco Benedetti, Giulio Biroli, and Marc Mézard. Theory of speciation transitions in diffusion models with general class structure.arXiv preprint arXiv:2602.04404,

  2. [2]

    doi: 10.48550/arXiv.2602.04404

  3. [3]

    The statistical thermodynamics of generative diffusion models: Phase transitions, symmetry breaking, and critical instability.Entropy, 27(3):291, 2025

    Luca Ambrogioni. The statistical thermodynamics of generative diffusion models: Phase transitions, symmetry breaking, and critical instability.Entropy, 27(3):291, 2025. doi: 10.3390/ e27030291

  4. [4]

    Dynamical regimes of diffusion models.Nature Communications, 15:9957, 2024

    Giulio Biroli, Tony Bonnaire, Valentin de Bortoli, and Marc Mézard. Dynamical regimes of diffusion models.Nature Communications, 15:9957, 2024. doi: 10.1038/s41467-024-54281-3

  5. [5]

    Sampling from the sherrington– kirkpatrick gibbs measure via algorithmic stochastic localization

    Ahmed El Alaoui, Andrea Montanari, and Mark Sellke. Sampling from the sherrington– kirkpatrick gibbs measure via algorithmic stochastic localization. In2022 IEEE 63rd Annual Symposium on F oundations of Computer Science, pp. 323–334, 2022. doi: 10.1109/FOCS54457. 2022.00038

  6. [6]

    Maria Esteban-Casadevall, Rafal Karczewski, Alison Pouplin, Søren Hauberg, and Erik J. Bekkers. On the fisher geometry of diffusion models’ latent space. InICLR 2026 Workshop on Geometry-grounded Representation Learning and Generative Modeling, 2026

  7. [7]

    Sampling with flows, diffusion, and autoregressive neural networks from a spin-glass perspective.Proceedings of the National Academy of Sciences, 121(27):e2311810121, 2024

    Davide Ghio, Yatin Dandi, Florent Krzakala, and Lenka Zdeborová. Sampling with flows, diffusion, and autoregressive neural networks from a spin-glass perspective.Proceedings of the National Academy of Sciences, 121(27):e2311810121, 2024. doi: 10.1073/pnas.2311810121

  8. [8]

    The entropic signature of class speciation in diffusion models.arXiv preprint arXiv:2602.09651,

    Florian Handke, Dejan Stanˇcevi´c, Felix Koulischer, Thomas Demeester, and Luca Ambrogioni. The entropic signature of class speciation in diffusion models.arXiv preprint arXiv:2602.09651,

  9. [9]

    doi: 10.48550/arXiv.2602.09651

  10. [10]

    Classifier-Free Diffusion Guidance

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022. doi: 10.48550/arXiv.2207.12598

  11. [11]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, volume 33, pp. 6840–6851, 2020

  12. [12]

    Kadanoff

    Leo P. Kadanoff. Scaling laws for ising models near tc.Physics, 2(6):263–272, 1966. doi: 10.1103/PhysicsPhysiqueFizika.2.263

  13. [13]

    An analytic theory of creativity in convolutional diffusion models

    Mason Kamb and Surya Ganguli. An analytic theory of creativity in convolutional diffusion models. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, 2025

  14. [14]

    Elucidating the design space of diffusion-based generative models

    Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. InAdvances in Neural Information Processing Systems, volume 35, pp. 26565–26577, 2022

  15. [15]

    Analyzing and improving the training dynamics of diffusion models.arXiv preprint arXiv:2312.02696, 2024

    Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Analyzing and improving the training dynamics of diffusion models.arXiv preprint arXiv:2312.02696, 2024

  16. [16]

    Tom W. B. Kibble. Topology of cosmic domains and strings.Journal of Physics A: Mathematical and General, 9(8):1387–1398, 1976. doi: 10.1088/0305-4470/9/8/029

  17. [17]

    Lev D. Landau. On the theory of phase transitions.Zhurnal Eksperimental’noi i Teoreticheskoi Fiziki, 7:19–32, 1937. 10

  18. [18]

    Critical windows: Non-asymptotic theory for feature emergence in diffusion models

    Marvin Li and Sitan Chen. Critical windows: Non-asymptotic theory for feature emergence in diffusion models. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pp. 27474–27498, 2024

  19. [19]

    Blink of an eye: A simple theory for feature localization in generative models

    Marvin Li, Aayush Karan, and Sitan Chen. Blink of an eye: A simple theory for feature localization in generative models. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pp. 35047– 35080, 2025

  20. [20]

    Posterior sampling in high dimension via diffusion processes

    Andrea Montanari and Yuchen Wu. Posterior sampling in high dimension via diffusion processes. arXiv preprint, 2023. doi: 10.48550/arXiv.2304.11449

  21. [21]

    Maxime Oquab, Timothée Darcet, Theo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jégou, and Piotr Bojanowski....

  22. [22]

    Spontaneous symmetry breaking in generative diffusion models.Journal of Statistical Mechanics: Theory and Experiment, 2024(10):104025, 2024

    Gabriel Raya and Luca Ambrogioni. Spontaneous symmetry breaking in generative diffusion models.Journal of Statistical Mechanics: Theory and Experiment, 2024(10):104025, 2024. doi: 10.1088/1742-5468/ad64bd

  23. [23]

    The geometry of diffusion models: Tubular neighbourhoods and singularities

    Kotaro Sakamoto, Ryosuke Sakamoto, Masato Tanabe, Masatomo Akagawa, Yusuke Hayashi, Manato Yaguchi, Masahiro Suzuki, and Yutaka Matsuo. The geometry of diffusion models: Tubular neighbourhoods and singularities. InICML 2024 Workshop on Geometry-grounded Representation Learning and Generative Modeling, 2024

  24. [24]

    Probing the latent hierarchical structure of data via diffusion models.Journal of Statistical Mechanics: Theory and Experiment, 2025(8):084005, 2025

    Antonio Sclocchi, Alessandro Favero, Noam Itzhak Levi, and Matthieu Wyart. Probing the latent hierarchical structure of data via diffusion models.Journal of Statistical Mechanics: Theory and Experiment, 2025(8):084005, 2025. doi: 10.1088/1742-5468/aded6c

  25. [25]

    A phase transition in diffusion models reveals the hierarchical nature of data.Proceedings of the National Academy of Sciences, 122(1):e2408799121, 2025

    Antonio Sclocchi, Alessandro Favero, and Matthieu Wyart. A phase transition in diffusion models reveals the hierarchical nature of data.Proceedings of the National Academy of Sciences, 122(1):e2408799121, 2025. doi: 10.1073/pnas.2408799121

  26. [26]

    Deep unsuper- vised learning using nonequilibrium thermodynamics

    Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InProceedings of the 32nd International Conference on Machine Learning, pp. 2256–2265, 2015

  27. [27]

    Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

    Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021

  28. [28]

    Eugene Stanley.Introduction to Phase Transitions and Critical Phenomena

    H. Eugene Stanley.Introduction to Phase Transitions and Critical Phenomena. Clarendon Press, Oxford, 1971

  29. [29]

    Kenneth G. Wilson. Renormalization group and critical phenomena. i. renormalization group and the kadanoff scaling picture.Physical Review B, 4(9):3174–3183, 1971. doi: 10.1103/ PhysRevB.4.3174

  30. [30]

    Wilson and Michael E

    Kenneth G. Wilson and Michael E. Fisher. Critical exponents in 3.99 dimensions.Physical Review Letters, 28(4):240–243, 1972. doi: 10.1103/PhysRevLett.28.240

  31. [31]

    Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

    Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747, 2017. doi: 10.48550/arXiv.1708.07747. 11 A Explicit Ginzburg–Landau Parameters and mean-field critical time The parameters of the coarse–grained Ginzburg–Landau (GL) description can be written exp...

  32. [32]

    3.Translation equivariance:s t(τax) =τ ast(x)for all lattice shiftsa

    Locality: there exists a finite radius R such that st,i(x) depends only on {xi+u :u∈Ω R}. 3.Translation equivariance:s t(τax) =τ ast(x)for all lattice shiftsa. 4.LocalZ 2 symmetry:s t(−x) =−s t(x). We expand the dynamics around a translation-invariant symmetric branch, which without loss of generality we take to bex= 0. The linearization is given by a Jac...

  33. [33]

    Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...