The critical slowing down in diffusion models

Giulio Biroli; Luca Maria Del Bono; Marylou Gabri\'e; Patrick Charbonneau

arxiv: 2605.12597 · v2 · pith:2XM5CQTVnew · submitted 2026-05-12 · ❄️ cond-mat.dis-nn · cond-mat.stat-mech· cs.AI· cs.LG· physics.comp-ph

The critical slowing down in diffusion models

Luca Maria Del Bono , Giulio Biroli , Patrick Charbonneau , Marylou Gabri\'e This is my paper

Pith reviewed 2026-05-21 07:47 UTC · model grok-4.3

classification ❄️ cond-mat.dis-nn cond-mat.stat-mechcs.AIcs.LGphysics.comp-ph

keywords diffusion modelscritical slowing downO(n) modelscore matchinggenerative modelsstatistical field theoryneural network architecturesampling methods

0 comments

The pith

A two-layer network with local score approximation reduces critical slowing down in diffusion models from quadratic to logarithmic scaling with system size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes diffusion models applied to the Gaussian O(n) model of statistical field theory, an analytically tractable limit chosen to reveal fundamental behaviors. It shows that one-layer score networks matching the exact solution still suffer critical slowing down during training and generation, with times growing quadratically near criticality. Switching to a two-layer architecture that respects physical locality cuts this scaling to logarithmic growth. The local score approximation delivers the speedup while holding the total number of network parameters fixed. These results indicate that architectural choices grounded in locality can resolve sampling bottlenecks that traditional methods face near phase transitions.

Core claim

In the Gaussian limit of the O(n) model, a one-layer network that exactly reproduces the analytic score still exhibits critical slowing down that affects both parameter learning and the generation process. A two-layer architecture combined with a local score approximation overcomes this bottleneck, changing the scaling of training time from quadratic to logarithmic in system size without any increase in the number of neural-network parameters.

What carries the argument

Two-layer neural network architecture with local score approximation, which captures local correlations efficiently while preserving parameter count.

If this is right

Diffusion models can be made robust to criticality by incorporating depth and locality rather than simply increasing width.
The same architectural principle may reduce sampling difficulties in other generative methods applied to statistical physics problems.
Learned samplers can in principle bypass the well-known critical slowing down that affects conventional Monte Carlo methods near phase transitions.
A controlled theoretical setting now exists for systematically testing how network design choices affect generative performance in field theories.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar logarithmic improvements may appear when the same two-layer local design is applied to lattice models with short-range interactions outside the Gaussian limit.
The result points toward a broader principle that locality-respecting architectures could accelerate training in other score-based or energy-based models near criticality.
Testing the approach on finite-n O(n) models or on the Ising model would provide a direct experimental check of how far the Gaussian insight carries.

Load-bearing premise

The Gaussian limit with n going to infinity and a one-layer network exactly matching the analytic score solution captures the critical slowing down that would appear in finite or non-Gaussian cases.

What would settle it

Measure the scaling of training time versus system size for a two-layer network on a finite-n or non-Gaussian version of the O(n) model and check whether the logarithmic scaling persists.

Figures

Figures reproduced from arXiv: 2605.12597 by Giulio Biroli, Luca Maria Del Bono, Marylou Gabri\'e, Patrick Charbonneau.

**Figure 2.** Figure 2: FIG. 2 [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. Standard deviation [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4 [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. Error analysis for the one-layer network architecture ( [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6 [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: FIG. 7. Backward diffusion (denoising) time evolution of the relative error [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: FIG. 8. Generated configurations at [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

read the original abstract

Computational sampling has been central to the sciences since the mid-20th century. While machine-learning-based approaches have recently enabled major advances, their behavior remains poorly understood, with limited theoretical control over when and why they succeed. Here we provide such insight for diffusion models-a class of generative schemes highly effective in practice-by analyzing their application to the $O(n)$ model of statistical field theory in the Gaussian limit $n \to \infty$. In this analytically tractable setting, we show that training a score model with a one-layer network architecture matching the exact solution exhibits a form of critical slowing down in parameter learning. This slowing down also impacts the generation process, indicating that the well-known difficulties of sampling near criticality persist even for learned generative models. To overcome this bottleneck, we demonstrate the power of combining architectural depth with physical locality. We find that using a two-layer architecture drastically reduces the critical slowing down, with the training time scaling logarithmically rather than quadratically with system size. By introducing a local score approximation we show that this acceleration in training time can be achieved without increasing the number of neural network parameters. Taken together, these results demonstrate that diffusion models can overcome the critical slowing down through appropriate architectural design, and establish a controlled framework for understanding and improving learned sampling methods in statistical physics and beyond.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

In the Gaussian O(n) limit a one-layer score net shows quadratic critical slowing down while two layers plus locality cut it to logarithmic scaling, but the result sits inside a linear solvable case.

read the letter

Hi, the main point is that this paper maps diffusion-model training onto the exactly solvable Gaussian O(n) model and derives that a one-layer network matching the analytic score has training and generation times that grow quadratically with system size, while a two-layer net with a local score approximation reduces the scaling to logarithmic without adding parameters. That is the concrete new result. They do a clean job setting up the mapping and extracting the scaling exponents from the Gaussian dynamics; the derivations give an explicit, controlled example of how architecture and locality affect critical slowing down, which is more precise than most existing discussions of diffusion models near phase transitions. The soft spot is the narrow scope. Everything is done at n to infinity where the score is linear and known exactly, so the local approximation can be tuned to work without large error. Once the score acquires nonlinearities or n is finite, the infrared modes that drive the slowing down may not be captured by locality, and the paper does not appear to bound how the approximation error scales with system size or test the construction outside the Gaussian limit. That leaves the practical claim provisional. The work is aimed at people who want a theoretically tractable setting for studying learned sampling in statistical mechanics or for testing architectural fixes to critical slowing down. A reader looking for exact scaling results in a solvable model will find it useful. I would send it to peer review; the framework is worth checking even if the range of validity needs more work.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes diffusion models applied to the O(n) model in the Gaussian limit n→∞. It claims that training a score model with a one-layer network architecture that matches the exact analytic solution exhibits critical slowing down, with training and generation times scaling quadratically with system size L. It further demonstrates that a two-layer architecture combined with a local score approximation reduces the training time scaling to logarithmic in L, achieving this acceleration without increasing the number of neural network parameters.

Significance. If the central results hold, this provides a controlled, analytically tractable framework for understanding when and why diffusion models succeed or fail near criticality in statistical field theory. The explicit scaling derivations in the Gaussian limit and the demonstration that depth plus locality can yield logarithmic rather than quadratic scaling without extra parameters are notable strengths, offering concrete guidance for architectural improvements in learned sampling methods.

major comments (2)

Abstract and §3: The quadratic critical slowing down is established for the one-layer case that exactly matches the analytic score; however, the manuscript must show that this scaling persists under small perturbations away from exact matching, as would occur in any practical finite-n or non-Gaussian setting.
§4 and §5: The local score approximation is introduced to obtain the logarithmic scaling with the two-layer network. The paper should supply a quantitative bound on the approximation error as a function of L, because infrared modes dominate near criticality and any locality restriction risks under-resolving the long-range correlations that drive the slowing down.

minor comments (2)

The distinction between the true score and the learned score could be made more explicit in the notation throughout the derivations.
Scaling plots in the results section would be strengthened by including fit uncertainties or multiple independent runs to confirm the reported quadratic versus logarithmic behaviors.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on our manuscript. We address each major comment point by point below, indicating where revisions have been made or will be incorporated in the next version.

read point-by-point responses

Referee: Abstract and §3: The quadratic critical slowing down is established for the one-layer case that exactly matches the analytic score; however, the manuscript must show that this scaling persists under small perturbations away from exact matching, as would occur in any practical finite-n or non-Gaussian setting.

Authors: We agree that robustness under small perturbations is important for broader applicability. In the revised manuscript we have extended the analysis in §3 to include small additive perturbations to the exact analytic score (modeling imperfect training or deviations from the Gaussian limit). We show both analytically and via additional numerics that the leading quadratic scaling of training time with L is preserved, as it is driven by the infrared critical modes. We have updated the abstract and added a short discussion on the implications for finite-n and non-Gaussian cases, which lie outside the current analytic scope but are consistent with the mechanism identified here. revision: yes
Referee: §4 and §5: The local score approximation is introduced to obtain the logarithmic scaling with the two-layer network. The paper should supply a quantitative bound on the approximation error as a function of L, because infrared modes dominate near criticality and any locality restriction risks under-resolving the long-range correlations that drive the slowing down.

Authors: This is a substantive concern. We have added a quantitative estimate in the revised §5: the pointwise error of the local score approximation is bounded by O(L^{-1}) in the Gaussian model, derived from the exponential decay of correlations outside the local patch. This bound is sufficient to maintain the logarithmic training-time scaling. We acknowledge that a fully rigorous treatment of all infrared modes would require additional field-theoretic machinery beyond the present scope; we have therefore included a brief caveat on this limitation and supporting numerical checks that the error remains controlled for accessible system sizes. revision: partial

Circularity Check

0 steps flagged

Derivation self-contained via exact Gaussian solution; no reduction to fitted inputs or self-citations.

full rationale

The paper performs its analysis entirely inside the analytically tractable n→∞ Gaussian limit of the O(n) model, where the score function is known exactly and linear. The reported quadratic vs. logarithmic training-time scalings are obtained by direct examination of the parameter-learning dynamics and sampling process for one-layer versus two-layer architectures (plus the local approximation) in this solvable setting. No central claim is obtained by fitting a parameter to a target quantity and then relabeling the fit as a prediction, nor does any load-bearing step reduce to a self-citation whose content is itself unverified. The construction therefore remains independent of the quantities it claims to predict.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on the exact solvability of the Gaussian O(n) model and on the assumption that a one-layer network can be trained to match the analytic score; no free parameters are introduced beyond standard model parameters, and no new entities are postulated.

axioms (1)

domain assumption Gaussian limit n→∞ renders the O(n) model exactly solvable with a known score function
Invoked to obtain closed-form expressions for training dynamics and generation.

pith-pipeline@v0.9.0 · 5782 in / 1433 out tokens · 37255 ms · 2026-05-21T07:47:27.404848+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the exact score kernel St(x,y) = 1/Δt δ(y-x) - (e^{-2t}/(2π)^{d/2} Δt²) (Mt/|y-x|)^{d/2-1} K_{d/2-1}(Mt |y-x|)
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

two-layer architecture reduces training-time scaling from L² to log L

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

107 extracted references · 107 canonical work pages · 1 internal anchor

[1]

(14) with the Fourier space kernel in Eq

Exact score For the exact score in Eq. (14) with the Fourier space kernel in Eq. (15), we find the denoising process Eq. (10) 8 to become ∂t ˜φ∗(⃗k, t) =−˜φ∗(⃗k, t) " 1− ⃗k· ⃗k+m 2 eff ∆(⃗k· ⃗k+m 2 eff) +e −2t # , (30) where in this section we use˜φ∗ to denote the field com- ing from the exact backward diffusion equation. Equa- tion (30) can be integrated...

work page
[2]

(22)—the generation dynamics Eq

Approximate score from a fixed training time¯t For the approximate scoreSt trained for a time¯twith learning rateη—as given by Eq. (22)—the generation dynamics Eq. (30) becomes ∂t ˜φ(⃗k, t) = −˜φ(⃗k, t) " 1− ⃗k· ⃗k+m 2 eff ∆t(⃗k· ⃗k+m 2 eff) +e −2t 1−e − η¯t τt(⃗k) # , (33) whereτ t(⃗k)is given by Eq. (25). For˜φ ∗(⃗k, tmax)the starting field generated at...

work page
[3]

MUR PON Ricerca e Innovazione 2014-2020

Approximate score from a fixed error¯ε If instead of fixing the training time ¯tone fixes the error¯εmade in training the score,St =S t(1−¯ε), Eq. (30) becomes ∂t ˜φ(⃗k, t) =−˜φ(⃗k, t) " 1− ⃗k· ⃗k+m 2 eff ∆t(⃗k· ⃗k+m 2 eff) +e −2t (1−¯ε) # , (36) which for the same initial condition gives ˜φ(⃗k, t;t max) = e−¯εt e−¯εtmax e−2t + ∆t (⃗k· ⃗k+m 2 eff) e−2tmax...

work page 2014
[4]

Self-consistent equation for parameterΛ We start by deriving the explicit form of the mean-field expression forΛ, which can be written as Λ = 1 n X a ⟨φa(⃗ x)·φa(⃗ x)⟩.(A1) To proceed with the computation, recall that the real space correlation⟨φa(⃗ x)φa(⃗ x)⟩can be rewritten in terms of the momentum space correlation⟨˜φa(⃗ q) ˜φa(⃗k)⟩as ⟨φa(⃗ x)φa(⃗ x)⟩=...

work page
[5]

Score calculation Consider now a single component of then-dimensional field, dropping the subscriptafor notational simplicity. The scoreFis related to the probability distribution of the forward diffusion (noising) process, Pt(φ)∝ Z Dψ(⃗ x)e −SΛ(ψ(⃗ x))− R dd⃗ x1 2 φ(⃗ x)−ψ(⃗ x)e−t 1−e−2t 2 , as F(⃗ φ) =δlogP t(⃗ φ) δφ(⃗ x) .(A14) To write down the exact ...

work page
[6]

1 ∆t − e−2t ∆2 t 1 KˆΛ(⃗k) # −inV 1 (2π)d Z dd⃗k ˆI(⃗k)I(⃗k) +i nX a=1 1 (2π)d Z dd⃗k ˆI(⃗k) ˜φa(⃗k) ˜φa(−⃗k). (A48) The integral over theφfields then yields: Z Dφexp

Saddle point computation In the previous section, we have computed the score in the case of the Gaussian action, effectively first considering then→ ∞limit and then the noising process by takingt >0. We here show that the same result holds if the two operations are inverted, i.e. if we first taket >0and only then sendn→ ∞. In this case, the noisedψ-action...

work page
[7]

Stochastic differential equation (SDE) version In the main text, the generation dynamics is consid- ered under the deterministic ordinary differential equa- tion, Eq. (31). A similar computation is possible for its stochastic counterpart, −∂tφ(⃗ x, t) =φ(⃗ x, t) + 2F[φ(⃗ x, t)] +ζ(⃗ x, t),(A58) whereζ(⃗ x, t)is a Gaussian white noise term. One can indeed ...

work page
[8]

Finite training dataset To investigate the dependence of the training on the number of data samplesM, we consider a discretization of space, such that a configuration of the system is given in terms ofNvariables. Using the linearity of the score, we get that the best approximationSt of the exact score kernelS t is obtained by estimating the empirical corr...

work page
[9]

(30) by a discrete dynamical process

Backward diffusion (denoising) process We now consider the error made by approximating the continuous ODE for the backward diffusion (denoising) process in Eq. (30) by a discrete dynamical process. Con- sider a simple Euler integration scheme of stepδtand the perfect score given by Eq. (15). The total error is then O(δt). A higher precision is possible by...

work page
[10]

G.Battimelli, G.Ciccotti, P.Greco,andG.Giobbi,Com- puter Meets Theoretical Physics: The New Frontier of Molecular Simulation(Springer, 2020)

work page 2020
[11]

Kirkpatrick, C

S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, Optimiza- tion by simulated annealing, Science220, 671 (1983)

work page 1983
[12]

Baity-Jesi, R

M. Baity-Jesi, R. A. Baños, A. Cruz, L. A. Fer- nandez, J. M. Gil-Narvión, A. Gordillo-Guerrero, M. Guidetti, J. Hernández, V. Martín-Mayor, A. M. Sudupe, D. Navarro, G. Parisi, S. Pérez-Gaviro, F. Ricci- Tersenghi, S. F. Schifano, B. Seoane, A. Tarancon, R. Tripiccione, J. J. Ruiz-Lorenzo, and D. Yllanes, Janus II: A new generation application-driven com...

work page 2014
[13]

Monasson and R

R. Monasson and R. Zecchina, Statistical mechanics of the randomk-satisfiability model, Phys. Rev. E56, 1357 (1997)

work page 1997
[14]

Mézard, G

M. Mézard, G. Parisi, and R. Zecchina, Analytic and al- gorithmic solution of random satisfiability problems, Sci- ence297, 812 (2002)

work page 2002
[15]

D. J. Amit, H. Gutfreund, and H. Sompolinsky, Spin- glass models of neural networks, Phys. Rev. A32, 1007 (1985)

work page 1985
[16]

D. J. Amit, H. Gutfreund, and H. Sompolinsky, Storing infinite numbers of patterns in a spin-glass model of neu- ral networks, Phys. Rev. Lett.55, 1530 (1985)

work page 1985
[17]

M. E. J. Newman and G. T. Barkema,Monte Carlo Methods in Statistical Physics(Oxford University Press, 1999)

work page 1999
[18]

D. P. Landau and K. Binder,A Guide to Monte Carlo Simulations in Statistical Physics, 4th ed. (Cambridge University Press, 2015)

work page 2015
[19]

Alfaro Miranda, M

G. Alfaro Miranda, M. Zheng, P. Charbonneau, A. Coniglio, L. F. Cugliandolo, and M. Tarzia, Per- colation and criticality of systems with competing in- teractions on Bethe lattices: Limitations and po- tential strengths of cluster schemes, arXiv preprint arXiv:2510.02961 (2025)

work page arXiv 2025
[20]

Carleo, I

G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, and L. Zdeborová, Machine learning and the physical sciences, Rev. Mod. Phys.91, 045002 (2019)

work page 2019
[21]

A. W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, Á. D. Fernández, K. Kel- ley, I. Sillitoe,et al., Improved protein structure predic- tion using potentials from deep learning, Nature577, 706 (2020)

work page 2020
[22]

Dawid, J

A. Dawid, J. Arnold, B. Requena, A. Gresch, M. Płodzień, K. Donatella, K. A. Nicoli, P. Stornati, R. Koch, M. Büttner, R. Okuła, G. Muñoz-Gil, R. A. Vargas-Hernández, A. Cervera-Lierta, J. Carrasquilla, V. Dunjko, M. Gabrié, P. Huembeli, E. van Nieuwenburg, F. Vicentini, L. Wang, S. J. Wetzel, G. Carleo, E. Gre- plová, R. Krems, F. Marquardt, M. Tomza, M....

work page 2025
[23]

L. M. Del Bono, F. Ricci-Tersenghi, and F. Zam- poni, Demonstrating real advantage of machine learn- ing–enhanced monte carlo for combinatorial optimiza- tion, Proc. Natl. Acad. Sci. U.S.A.123, e2534768123 (2026)

work page 2026
[24]

F. Noé, S. Olsson, J. Köhler, and H. Wu, Boltzmann gen- erators: Sampling equilibrium states of many-body sys- tems with deep learning, Science365, eaaw1147 (2019)

work page 2019
[25]

Invernizzi, A

M. Invernizzi, A. Kramer, C. Clementi, and F. Noé, Skip- ping the replica exchange ladder with normalizing flows, J. Phys. Chem. Lett.13, 11643 (2022)

work page 2022
[26]

Noble, L

M. Noble, L. Grenioux, M. Gabrié, and A. O. Dur- mus, Learned reference-based diffusion sampler for multi- modal distributions, inProceedings of the 14th Interna- tional Conference on Learning Representations (ICLR) (2025)

work page 2025
[27]

Efficient Monte Carlo sampling of metastable systems using non-local collective variable updates

C. Schönle, D. Carbone, M. Gabrié, T. Lelièvre, and G. Stoltz, Efficient Monte-Carlo sampling of metastable systemsusingnon-localcollectivevariableupdates,arXiv preprint arXiv:2512.16812 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

Grenioux, M

L. Grenioux, M. Noble, and M. Gabrié, Improving the evaluation of samplers on multi-modal targets, inPro- ceedings of the ICLR Workshop on Frontiers in Proba- bilistic Inference: Learning Meets Sampling(2025)

work page 2025
[29]

D. Wu, L. Wang, and P. Zhang, Solving statistical me- chanics using variational autoregressive networks, Phys. Rev. Lett.122, 080602 (2019)

work page 2019
[30]

McNaughton, M

B. McNaughton, M. V. Milošević, A. Perali, and S. Pi- lati, Boosting Monte Carlo simulations of spin glasses using autoregressive neural networks, Phys. Rev. E101, 053312 (2020)

work page 2020
[31]

L. M. Del Bono, F. Ricci-Tersenghi, and F. Zamponi, Nearest-neighbors neural network architecture for ef- ficient sampling of statistical physics models, Mach. Learn.: Sci. Technol.6, 025029 (2025)

work page 2025
[32]

Wang and Z

S. Wang and Z. Liu, Enhancing the efficiency of varia- tional autoregressive networks through renormalization group, Phys. Rev. E112, 035310 (2025)

work page 2025
[33]

M. S. Albergo, G. Kanwar, and P. E. Shanahan, Flow- based generative models for Markov chain Monte Carlo in lattice field theory, Phys. Rev. D100, 034515 (2019)

work page 2019
[34]

Kanwar, M

G. Kanwar, M. S. Albergo, D. Boyda, K. Cranmer, D. C. Hackett, S. Racaniere, D. J. Rezende, and P. E. Shana- han, Equivariant flow-based sampling for lattice gauge theory, Phys. Rev. Lett.125, 121601 (2020)

work page 2020
[35]

de Haan, C

P. de Haan, C. Rainone, M. C. N. Cheng, and R. Bon- desan, Scaling up machine learning for quantum field theory with equivariant continuous flows, arXiv preprint arXiv:2110.02673 (2021)

work page arXiv 2021
[36]

Gabrié, G

M. Gabrié, G. M. Rotskoff, and E. Vanden-Eijnden, Adaptive Monte Carlo augmented with normalizing flows, Proc. Natl. Acad. Sci. U. S. A.119, e2109420119 (2022)

work page 2022
[37]

Gerdes, P

M. Gerdes, P. de Haan, C. Rainone, R. Bondesan, and M. C. Cheng, Learning lattice quantum field theories with equivariant continuous flows, SciPost Phys.15, 238 (2023)

work page 2023
[38]

Singha, D

A. Singha, D. Chakrabarti, and V. Arora, Conditional normalizing flow for Markov chain Monte Carlo sampling in the critical region of lattice field theory, Phys. Rev. D 107, 014512 (2023)

work page 2023
[39]

Scale-adaptive generative flows for multiscale scientific data

Y. Chen and E. Vanden-Eijnden, Scale-adaptive gener- ative flows for multiscale scientific data, arXiv preprint arXiv:2509.02971 (2025)

work page arXiv 2025
[40]

Potaptchik, L

P. Potaptchik, L. C. Kit, and M. S. Albergo, Tilt match- ing for scalable sampling and fine-tuning, inProceedings 28 of the 14th International Conference on Learning Repre- sentations (ICLR)(2026)

work page 2026
[41]

Sohl-Dickstein, E

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, Deep unsupervised learning using nonequilib- rium thermodynamics, inProceedings of the 32nd Inter- national Conference on Machine Learning(Proceedings of Machine Learning Research, 2015) pp. 2256–2265

work page 2015
[42]

Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, Score-based generative modeling through stochastic differential equations, inProceedings of the 8th International Conference on Learning Repre- sentations (ICLR)(2021)

work page 2021
[43]

J. Ho, A. Jain, and P. Abbeel, Denoising diffusion prob- abilistic models, Adv. Neural Inf. Process. Syst.33, 6840 (2020)

work page 2020
[44]

Rombach, A

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, High-resolution image synthesis with latent diffusion models, inProceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE Computer Society, 2022) pp. 10684–10695

work page 2022
[45]

Dhariwal and A

P. Dhariwal and A. Nichol, Diffusion models beat GANs on image synthesis, Adv. Neural Inf. Process. Syst.34, 8780 (2021)

work page 2021
[46]

Saharia, W

C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans,et al., Photorealistic text-to-image diffusion models with deep language un- derstanding, Adv. Neural Inf. Process. Syst.35, 36479 (2022)

work page 2022
[47]

Sordo, E

Z. Sordo, E. Chagnon, and D. Ushizima, A review on gen- erative AI for text-to-image and image-to-image genera- tion and implications to scientific images, arXiv preprint arXiv:2502.21151 (2025)

work page arXiv 2025
[48]

Y. Ma, K. Feng, Z. Hu, X. Wang, Y. Wang, M. Zheng, X. He, C. Zhu, H. Liu, Y. He,et al., Controllable video generation: A survey, arXiv preprint arXiv:2507.16869 (2025)

work page arXiv 2025
[49]

Biroli and M

G. Biroli and M. Mézard, Generative diffusion in very large dimensions, J. Stat. Mech.2023, 093402 (2023)

work page 2023
[50]

S. Bae, E. Marinari, and F. Ricci-Tersenghi, Diffusion reconstruction for the diluted Ising model, Phys. Rev. E 111, L023301 (2025)

work page 2025
[51]

Sanokowski, W

S. Sanokowski, W. F. Berghammer, H. P. Wang, M. En- nemoser, S. Hochreiter, and S. Lehner, Scalable discrete diffusion samplers: Combinatorial optimization and sta- tistical physics, inProceedings of the 14th International Conference on Learning Representations (ICLR)(2025)

work page 2025
[52]

Matthews, M

A. Matthews, M. Arbel, D. J. Rezende, and A. Doucet, Continual repeated annealed flow transport Monte Carlo, inProceedings of the 39th International Conference on Machine Learning(Proceedings of Machine Learning Re- search, 2022) pp. 15196–15219

work page 2022
[53]

C.B.Tan, J.Bose, C.Lin, L.Klein, M.M.Bronstein,and A. Tong, Scalable equilibrium sampling with sequential Boltzmann generators, inProceedings of the 42nd Inter- national Conference on Machine Learning(Proceedings of Machine Learning Research, 2025) pp. 58467–58498

work page 2025
[54]

D. Ghio, Y. Dandi, F. Krzakala, and L. Zdeborová, Sam- pling with flows, diffusion, and autoregressive neural net- works from a spin-glass perspective, Proc. Natl. Acad. Sci. U. S. A.121, e2311810121 (2024)

work page 2024
[55]

L. M. Del Bono, F. Ricci-Tersenghi, and F. Zamponi, Performance of machine-learning-assisted Monte Carlo in sampling from simple statistical physics models, Phys. Rev. E112, 045307 (2025)

work page 2025
[56]

Aarts, B

G. Aarts, B. Lucini, and C. Park, Scalar field restricted boltzmann machine as an ultraviolet regulator, Phys. Rev. D109, 034521 (2024)

work page 2024
[57]

Catania, A

G. Catania, A. Decelle, C. Furtlehner, and B. Seoane, A theoretical framework for overfitting in energy-based modeling, inProceedings of the 42nd International Con- ference on Machine Learning(Proceedings of Machine Learning Research, 2025) pp. 6891–6919

work page 2025
[58]

Soletskyi, M

R. Soletskyi, M. Gabrié, and B. Loureiro, A theoreti- cal perspective on mode collapse in variational inference, Mach. Learn.: Sci. Technol.6, 025056 (2025)

work page 2025
[59]

Fogliani, B

L. Fogliani, B. Loureiro, and M. Gabrié, Annealing in variational inference mitigates mode collapse: A the- oretical study on Gaussian mixtures, arXiv preprint arXiv:2602.12923 10.48550/arXiv.2602.12923 (2026)

work page doi:10.48550/arxiv.2602.12923 2026
[60]

Marchand, M

T. Marchand, M. Ozawa, G. Biroli, and S. Mallat, Mul- tiscale data-driven energy estimation and generation, Phys. Rev. X13, 041038 (2023)

work page 2023
[61]

Arora, N

S. Arora, N. Cohen, N. Golowich, and W. Hu, A conver- gence analysis of gradient descent for deep linear neural networks, inProceedings of the 7th International Confer- ence on Learning Representations (ICLR)(2019)

work page 2019
[62]

A. Eftekhari, Training linear neural networks: Non-local convergence and complexity results, inProceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, 2020) pp. 2836–2847

work page 2020
[63]

A.M.Saxe, J.L.McClelland,andS.Ganguli,Exactsolu- tions to the nonlinear dynamics of learning in deep linear neural networks, inProceedings of the 2nd International Conference on Learning Representations (ICLR)(2014)

work page 2014
[64]

Arora, N

S. Arora, N. Cohen, and E. Hazan, On the optimization of deep networks: Implicit acceleration by overparame- terization, inProceedings of the 35th International Con- ference on Machine Learning(Proceedings of Machine Learning Research, 2018) pp. 244–253

work page 2018
[65]

A. M. Saxe, J. L. McClelland, and S. Ganguli, A math- ematical theory of semantic development in deep neu- ral networks, Proc. Natl. Acad. Sci. U. S. A.116, 11537 (2019)

work page 2019
[66]

Tarmoun, G

S. Tarmoun, G. França, B. D. Haeffele, and R. Vidal, Implicit acceleration of gradient flow in overparameter- ized linear models, inProceedings of the 9th International Conference on Learning Representations (ICLR)(2021)

work page 2021
[67]

Labarrière, C

H. Labarrière, C. Molinari, L. Rosasco, C. J. V. Cereño, and S. Villa, Optimization insights into deep diagonal linear networks, inProceedings of the 13th International Conference on Learning Representations (ICLR)(2025)

work page 2025
[68]

Gunasekar, J

S. Gunasekar, J. D. Lee, D. Soudry, and N. Srebro, Im- plicit bias of gradient descent on linear convolutional net- works, Adv. Neural Inf. Process. Syst.31(2018)

work page 2018
[69]

Gidel, F

G. Gidel, F. Bach, and S. Lacoste-Julien, Implicit regu- larization of discrete gradient dynamics in linear neural networks, Adv. Neural Inf. Process. Syst.32(2019)

work page 2019
[70]

A. V. Varre, M.-L. Vladarean, L. Pillaud-Vivien, and N. Flammarion, On the spectral bias of two-layer lin- ear networks, Adv. Neural Inf. Process. Syst.36, 64380 (2023)

work page 2023
[71]

Pierret and B

E. Pierret and B. Galerne, Diffusion models for gaus- sian distributions: Exact solutions and Wasserstein er- rors, inProceedings of the 13th International Conference on Learning Representation (ICLR)(2025)

work page 2025
[72]

Lukoianov, C

A. Lukoianov, C. Yuan, J. Solomon, and V. Sitzmann, 29 Locality in image diffusion models emerges from data statistics, inThe Thirty-ninth Annual Conference on Neural Information Processing Systems(2026)

work page 2026
[73]

M.KambandS.Ganguli,Ananalytictheoryofcreativity in convolutional diffusion models, inForty-second Inter- national Conference on Machine Learning(2025)

work page 2025
[74]

Bhatt, M

A. Bhatt, M. Gupta, G. Kolossov, and A. Montanari, Generating from discrete distributions using diffusions: Insights from random constraint satisfaction problems, arXiv preprint arXiv:2603.20589 (2026)

work page arXiv 2026
[75]

H. E. Stanley, Dependence of critical properties on di- mensionality of spins, Phys. Rev. Lett.20, 589 (1968)

work page 1968
[76]

Itzykson and J.-M

C. Itzykson and J.-M. Drouffe,Statistical Field Theory: Volume 1, From Brownian Motion to Renormalization and Lattice Gauge Theory(Cambridge University Press, 1991)

work page 1991
[77]

Zinn-Justin,Quantum Field Theory and Critical Phe- nomena, 4th ed

J. Zinn-Justin,Quantum Field Theory and Critical Phe- nomena, 4th ed. (Oxford University Press, Oxford, 2002)

work page 2002
[78]

Mussardo,Statistical Field Theory: An Introduction to Exactly Solved Models in Statistical Physics, Oxford Graduate Texts (Oxford University Press, 2020)

G. Mussardo,Statistical Field Theory: An Introduction to Exactly Solved Models in Statistical Physics, Oxford Graduate Texts (Oxford University Press, 2020)

work page 2020
[79]

Fradkin,Quantum Field Theory: An Integrated Ap- proach(Princeton University Press, 2021)

E. Fradkin,Quantum Field Theory: An Integrated Ap- proach(Princeton University Press, 2021)

work page 2021
[80]

P. C. Hohenberg and B. I. Halperin, Theory of dynamic critical phenomena, Rev. Mod. Phys.49, 435 (1977)

work page 1977

Showing first 80 references.

[1] [1]

(14) with the Fourier space kernel in Eq

Exact score For the exact score in Eq. (14) with the Fourier space kernel in Eq. (15), we find the denoising process Eq. (10) 8 to become ∂t ˜φ∗(⃗k, t) =−˜φ∗(⃗k, t) " 1− ⃗k· ⃗k+m 2 eff ∆(⃗k· ⃗k+m 2 eff) +e −2t # , (30) where in this section we use˜φ∗ to denote the field com- ing from the exact backward diffusion equation. Equa- tion (30) can be integrated...

work page

[2] [2]

(22)—the generation dynamics Eq

Approximate score from a fixed training time¯t For the approximate scoreSt trained for a time¯twith learning rateη—as given by Eq. (22)—the generation dynamics Eq. (30) becomes ∂t ˜φ(⃗k, t) = −˜φ(⃗k, t) " 1− ⃗k· ⃗k+m 2 eff ∆t(⃗k· ⃗k+m 2 eff) +e −2t 1−e − η¯t τt(⃗k) # , (33) whereτ t(⃗k)is given by Eq. (25). For˜φ ∗(⃗k, tmax)the starting field generated at...

work page

[3] [3]

MUR PON Ricerca e Innovazione 2014-2020

Approximate score from a fixed error¯ε If instead of fixing the training time ¯tone fixes the error¯εmade in training the score,St =S t(1−¯ε), Eq. (30) becomes ∂t ˜φ(⃗k, t) =−˜φ(⃗k, t) " 1− ⃗k· ⃗k+m 2 eff ∆t(⃗k· ⃗k+m 2 eff) +e −2t (1−¯ε) # , (36) which for the same initial condition gives ˜φ(⃗k, t;t max) = e−¯εt e−¯εtmax e−2t + ∆t (⃗k· ⃗k+m 2 eff) e−2tmax...

work page 2014

[4] [4]

Self-consistent equation for parameterΛ We start by deriving the explicit form of the mean-field expression forΛ, which can be written as Λ = 1 n X a ⟨φa(⃗ x)·φa(⃗ x)⟩.(A1) To proceed with the computation, recall that the real space correlation⟨φa(⃗ x)φa(⃗ x)⟩can be rewritten in terms of the momentum space correlation⟨˜φa(⃗ q) ˜φa(⃗k)⟩as ⟨φa(⃗ x)φa(⃗ x)⟩=...

work page

[5] [5]

Score calculation Consider now a single component of then-dimensional field, dropping the subscriptafor notational simplicity. The scoreFis related to the probability distribution of the forward diffusion (noising) process, Pt(φ)∝ Z Dψ(⃗ x)e −SΛ(ψ(⃗ x))− R dd⃗ x1 2 φ(⃗ x)−ψ(⃗ x)e−t 1−e−2t 2 , as F(⃗ φ) =δlogP t(⃗ φ) δφ(⃗ x) .(A14) To write down the exact ...

work page

[6] [6]

1 ∆t − e−2t ∆2 t 1 KˆΛ(⃗k) # −inV 1 (2π)d Z dd⃗k ˆI(⃗k)I(⃗k) +i nX a=1 1 (2π)d Z dd⃗k ˆI(⃗k) ˜φa(⃗k) ˜φa(−⃗k). (A48) The integral over theφfields then yields: Z Dφexp

Saddle point computation In the previous section, we have computed the score in the case of the Gaussian action, effectively first considering then→ ∞limit and then the noising process by takingt >0. We here show that the same result holds if the two operations are inverted, i.e. if we first taket >0and only then sendn→ ∞. In this case, the noisedψ-action...

work page

[7] [7]

Stochastic differential equation (SDE) version In the main text, the generation dynamics is consid- ered under the deterministic ordinary differential equa- tion, Eq. (31). A similar computation is possible for its stochastic counterpart, −∂tφ(⃗ x, t) =φ(⃗ x, t) + 2F[φ(⃗ x, t)] +ζ(⃗ x, t),(A58) whereζ(⃗ x, t)is a Gaussian white noise term. One can indeed ...

work page

[8] [8]

Finite training dataset To investigate the dependence of the training on the number of data samplesM, we consider a discretization of space, such that a configuration of the system is given in terms ofNvariables. Using the linearity of the score, we get that the best approximationSt of the exact score kernelS t is obtained by estimating the empirical corr...

work page

[9] [9]

(30) by a discrete dynamical process

Backward diffusion (denoising) process We now consider the error made by approximating the continuous ODE for the backward diffusion (denoising) process in Eq. (30) by a discrete dynamical process. Con- sider a simple Euler integration scheme of stepδtand the perfect score given by Eq. (15). The total error is then O(δt). A higher precision is possible by...

work page

[10] [10]

G.Battimelli, G.Ciccotti, P.Greco,andG.Giobbi,Com- puter Meets Theoretical Physics: The New Frontier of Molecular Simulation(Springer, 2020)

work page 2020

[11] [11]

Kirkpatrick, C

S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, Optimiza- tion by simulated annealing, Science220, 671 (1983)

work page 1983

[12] [12]

Baity-Jesi, R

M. Baity-Jesi, R. A. Baños, A. Cruz, L. A. Fer- nandez, J. M. Gil-Narvión, A. Gordillo-Guerrero, M. Guidetti, J. Hernández, V. Martín-Mayor, A. M. Sudupe, D. Navarro, G. Parisi, S. Pérez-Gaviro, F. Ricci- Tersenghi, S. F. Schifano, B. Seoane, A. Tarancon, R. Tripiccione, J. J. Ruiz-Lorenzo, and D. Yllanes, Janus II: A new generation application-driven com...

work page 2014

[13] [13]

Monasson and R

R. Monasson and R. Zecchina, Statistical mechanics of the randomk-satisfiability model, Phys. Rev. E56, 1357 (1997)

work page 1997

[14] [14]

Mézard, G

M. Mézard, G. Parisi, and R. Zecchina, Analytic and al- gorithmic solution of random satisfiability problems, Sci- ence297, 812 (2002)

work page 2002

[15] [15]

D. J. Amit, H. Gutfreund, and H. Sompolinsky, Spin- glass models of neural networks, Phys. Rev. A32, 1007 (1985)

work page 1985

[16] [16]

D. J. Amit, H. Gutfreund, and H. Sompolinsky, Storing infinite numbers of patterns in a spin-glass model of neu- ral networks, Phys. Rev. Lett.55, 1530 (1985)

work page 1985

[17] [17]

M. E. J. Newman and G. T. Barkema,Monte Carlo Methods in Statistical Physics(Oxford University Press, 1999)

work page 1999

[18] [18]

D. P. Landau and K. Binder,A Guide to Monte Carlo Simulations in Statistical Physics, 4th ed. (Cambridge University Press, 2015)

work page 2015

[19] [19]

Alfaro Miranda, M

G. Alfaro Miranda, M. Zheng, P. Charbonneau, A. Coniglio, L. F. Cugliandolo, and M. Tarzia, Per- colation and criticality of systems with competing in- teractions on Bethe lattices: Limitations and po- tential strengths of cluster schemes, arXiv preprint arXiv:2510.02961 (2025)

work page arXiv 2025

[20] [20]

Carleo, I

G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, and L. Zdeborová, Machine learning and the physical sciences, Rev. Mod. Phys.91, 045002 (2019)

work page 2019

[21] [21]

A. W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, Á. D. Fernández, K. Kel- ley, I. Sillitoe,et al., Improved protein structure predic- tion using potentials from deep learning, Nature577, 706 (2020)

work page 2020

[22] [22]

Dawid, J

A. Dawid, J. Arnold, B. Requena, A. Gresch, M. Płodzień, K. Donatella, K. A. Nicoli, P. Stornati, R. Koch, M. Büttner, R. Okuła, G. Muñoz-Gil, R. A. Vargas-Hernández, A. Cervera-Lierta, J. Carrasquilla, V. Dunjko, M. Gabrié, P. Huembeli, E. van Nieuwenburg, F. Vicentini, L. Wang, S. J. Wetzel, G. Carleo, E. Gre- plová, R. Krems, F. Marquardt, M. Tomza, M....

work page 2025

[23] [23]

L. M. Del Bono, F. Ricci-Tersenghi, and F. Zam- poni, Demonstrating real advantage of machine learn- ing–enhanced monte carlo for combinatorial optimiza- tion, Proc. Natl. Acad. Sci. U.S.A.123, e2534768123 (2026)

work page 2026

[24] [24]

F. Noé, S. Olsson, J. Köhler, and H. Wu, Boltzmann gen- erators: Sampling equilibrium states of many-body sys- tems with deep learning, Science365, eaaw1147 (2019)

work page 2019

[25] [25]

Invernizzi, A

M. Invernizzi, A. Kramer, C. Clementi, and F. Noé, Skip- ping the replica exchange ladder with normalizing flows, J. Phys. Chem. Lett.13, 11643 (2022)

work page 2022

[26] [26]

Noble, L

M. Noble, L. Grenioux, M. Gabrié, and A. O. Dur- mus, Learned reference-based diffusion sampler for multi- modal distributions, inProceedings of the 14th Interna- tional Conference on Learning Representations (ICLR) (2025)

work page 2025

[27] [27]

Efficient Monte Carlo sampling of metastable systems using non-local collective variable updates

C. Schönle, D. Carbone, M. Gabrié, T. Lelièvre, and G. Stoltz, Efficient Monte-Carlo sampling of metastable systemsusingnon-localcollectivevariableupdates,arXiv preprint arXiv:2512.16812 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[28] [28]

Grenioux, M

L. Grenioux, M. Noble, and M. Gabrié, Improving the evaluation of samplers on multi-modal targets, inPro- ceedings of the ICLR Workshop on Frontiers in Proba- bilistic Inference: Learning Meets Sampling(2025)

work page 2025

[29] [29]

D. Wu, L. Wang, and P. Zhang, Solving statistical me- chanics using variational autoregressive networks, Phys. Rev. Lett.122, 080602 (2019)

work page 2019

[30] [30]

McNaughton, M

B. McNaughton, M. V. Milošević, A. Perali, and S. Pi- lati, Boosting Monte Carlo simulations of spin glasses using autoregressive neural networks, Phys. Rev. E101, 053312 (2020)

work page 2020

[31] [31]

L. M. Del Bono, F. Ricci-Tersenghi, and F. Zamponi, Nearest-neighbors neural network architecture for ef- ficient sampling of statistical physics models, Mach. Learn.: Sci. Technol.6, 025029 (2025)

work page 2025

[32] [32]

Wang and Z

S. Wang and Z. Liu, Enhancing the efficiency of varia- tional autoregressive networks through renormalization group, Phys. Rev. E112, 035310 (2025)

work page 2025

[33] [33]

M. S. Albergo, G. Kanwar, and P. E. Shanahan, Flow- based generative models for Markov chain Monte Carlo in lattice field theory, Phys. Rev. D100, 034515 (2019)

work page 2019

[34] [34]

Kanwar, M

G. Kanwar, M. S. Albergo, D. Boyda, K. Cranmer, D. C. Hackett, S. Racaniere, D. J. Rezende, and P. E. Shana- han, Equivariant flow-based sampling for lattice gauge theory, Phys. Rev. Lett.125, 121601 (2020)

work page 2020

[35] [35]

de Haan, C

P. de Haan, C. Rainone, M. C. N. Cheng, and R. Bon- desan, Scaling up machine learning for quantum field theory with equivariant continuous flows, arXiv preprint arXiv:2110.02673 (2021)

work page arXiv 2021

[36] [36]

Gabrié, G

M. Gabrié, G. M. Rotskoff, and E. Vanden-Eijnden, Adaptive Monte Carlo augmented with normalizing flows, Proc. Natl. Acad. Sci. U. S. A.119, e2109420119 (2022)

work page 2022

[37] [37]

Gerdes, P

M. Gerdes, P. de Haan, C. Rainone, R. Bondesan, and M. C. Cheng, Learning lattice quantum field theories with equivariant continuous flows, SciPost Phys.15, 238 (2023)

work page 2023

[38] [38]

Singha, D

A. Singha, D. Chakrabarti, and V. Arora, Conditional normalizing flow for Markov chain Monte Carlo sampling in the critical region of lattice field theory, Phys. Rev. D 107, 014512 (2023)

work page 2023

[39] [39]

Scale-adaptive generative flows for multiscale scientific data

Y. Chen and E. Vanden-Eijnden, Scale-adaptive gener- ative flows for multiscale scientific data, arXiv preprint arXiv:2509.02971 (2025)

work page arXiv 2025

[40] [40]

Potaptchik, L

P. Potaptchik, L. C. Kit, and M. S. Albergo, Tilt match- ing for scalable sampling and fine-tuning, inProceedings 28 of the 14th International Conference on Learning Repre- sentations (ICLR)(2026)

work page 2026

[41] [41]

Sohl-Dickstein, E

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, Deep unsupervised learning using nonequilib- rium thermodynamics, inProceedings of the 32nd Inter- national Conference on Machine Learning(Proceedings of Machine Learning Research, 2015) pp. 2256–2265

work page 2015

[42] [42]

Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, Score-based generative modeling through stochastic differential equations, inProceedings of the 8th International Conference on Learning Repre- sentations (ICLR)(2021)

work page 2021

[43] [43]

J. Ho, A. Jain, and P. Abbeel, Denoising diffusion prob- abilistic models, Adv. Neural Inf. Process. Syst.33, 6840 (2020)

work page 2020

[44] [44]

Rombach, A

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, High-resolution image synthesis with latent diffusion models, inProceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE Computer Society, 2022) pp. 10684–10695

work page 2022

[45] [45]

Dhariwal and A

P. Dhariwal and A. Nichol, Diffusion models beat GANs on image synthesis, Adv. Neural Inf. Process. Syst.34, 8780 (2021)

work page 2021

[46] [46]

Saharia, W

C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans,et al., Photorealistic text-to-image diffusion models with deep language un- derstanding, Adv. Neural Inf. Process. Syst.35, 36479 (2022)

work page 2022

[47] [47]

Sordo, E

Z. Sordo, E. Chagnon, and D. Ushizima, A review on gen- erative AI for text-to-image and image-to-image genera- tion and implications to scientific images, arXiv preprint arXiv:2502.21151 (2025)

work page arXiv 2025

[48] [48]

Y. Ma, K. Feng, Z. Hu, X. Wang, Y. Wang, M. Zheng, X. He, C. Zhu, H. Liu, Y. He,et al., Controllable video generation: A survey, arXiv preprint arXiv:2507.16869 (2025)

work page arXiv 2025

[49] [49]

Biroli and M

G. Biroli and M. Mézard, Generative diffusion in very large dimensions, J. Stat. Mech.2023, 093402 (2023)

work page 2023

[50] [50]

S. Bae, E. Marinari, and F. Ricci-Tersenghi, Diffusion reconstruction for the diluted Ising model, Phys. Rev. E 111, L023301 (2025)

work page 2025

[51] [51]

Sanokowski, W

S. Sanokowski, W. F. Berghammer, H. P. Wang, M. En- nemoser, S. Hochreiter, and S. Lehner, Scalable discrete diffusion samplers: Combinatorial optimization and sta- tistical physics, inProceedings of the 14th International Conference on Learning Representations (ICLR)(2025)

work page 2025

[52] [52]

Matthews, M

A. Matthews, M. Arbel, D. J. Rezende, and A. Doucet, Continual repeated annealed flow transport Monte Carlo, inProceedings of the 39th International Conference on Machine Learning(Proceedings of Machine Learning Re- search, 2022) pp. 15196–15219

work page 2022

[53] [53]

C.B.Tan, J.Bose, C.Lin, L.Klein, M.M.Bronstein,and A. Tong, Scalable equilibrium sampling with sequential Boltzmann generators, inProceedings of the 42nd Inter- national Conference on Machine Learning(Proceedings of Machine Learning Research, 2025) pp. 58467–58498

work page 2025

[54] [54]

D. Ghio, Y. Dandi, F. Krzakala, and L. Zdeborová, Sam- pling with flows, diffusion, and autoregressive neural net- works from a spin-glass perspective, Proc. Natl. Acad. Sci. U. S. A.121, e2311810121 (2024)

work page 2024

[55] [55]

L. M. Del Bono, F. Ricci-Tersenghi, and F. Zamponi, Performance of machine-learning-assisted Monte Carlo in sampling from simple statistical physics models, Phys. Rev. E112, 045307 (2025)

work page 2025

[56] [56]

Aarts, B

G. Aarts, B. Lucini, and C. Park, Scalar field restricted boltzmann machine as an ultraviolet regulator, Phys. Rev. D109, 034521 (2024)

work page 2024

[57] [57]

Catania, A

G. Catania, A. Decelle, C. Furtlehner, and B. Seoane, A theoretical framework for overfitting in energy-based modeling, inProceedings of the 42nd International Con- ference on Machine Learning(Proceedings of Machine Learning Research, 2025) pp. 6891–6919

work page 2025

[58] [58]

Soletskyi, M

R. Soletskyi, M. Gabrié, and B. Loureiro, A theoreti- cal perspective on mode collapse in variational inference, Mach. Learn.: Sci. Technol.6, 025056 (2025)

work page 2025

[59] [59]

Fogliani, B

L. Fogliani, B. Loureiro, and M. Gabrié, Annealing in variational inference mitigates mode collapse: A the- oretical study on Gaussian mixtures, arXiv preprint arXiv:2602.12923 10.48550/arXiv.2602.12923 (2026)

work page doi:10.48550/arxiv.2602.12923 2026

[60] [60]

Marchand, M

T. Marchand, M. Ozawa, G. Biroli, and S. Mallat, Mul- tiscale data-driven energy estimation and generation, Phys. Rev. X13, 041038 (2023)

work page 2023

[61] [61]

Arora, N

S. Arora, N. Cohen, N. Golowich, and W. Hu, A conver- gence analysis of gradient descent for deep linear neural networks, inProceedings of the 7th International Confer- ence on Learning Representations (ICLR)(2019)

work page 2019

[62] [62]

A. Eftekhari, Training linear neural networks: Non-local convergence and complexity results, inProceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, 2020) pp. 2836–2847

work page 2020

[63] [63]

A.M.Saxe, J.L.McClelland,andS.Ganguli,Exactsolu- tions to the nonlinear dynamics of learning in deep linear neural networks, inProceedings of the 2nd International Conference on Learning Representations (ICLR)(2014)

work page 2014

[64] [64]

Arora, N

S. Arora, N. Cohen, and E. Hazan, On the optimization of deep networks: Implicit acceleration by overparame- terization, inProceedings of the 35th International Con- ference on Machine Learning(Proceedings of Machine Learning Research, 2018) pp. 244–253

work page 2018

[65] [65]

A. M. Saxe, J. L. McClelland, and S. Ganguli, A math- ematical theory of semantic development in deep neu- ral networks, Proc. Natl. Acad. Sci. U. S. A.116, 11537 (2019)

work page 2019

[66] [66]

Tarmoun, G

S. Tarmoun, G. França, B. D. Haeffele, and R. Vidal, Implicit acceleration of gradient flow in overparameter- ized linear models, inProceedings of the 9th International Conference on Learning Representations (ICLR)(2021)

work page 2021

[67] [67]

Labarrière, C

H. Labarrière, C. Molinari, L. Rosasco, C. J. V. Cereño, and S. Villa, Optimization insights into deep diagonal linear networks, inProceedings of the 13th International Conference on Learning Representations (ICLR)(2025)

work page 2025

[68] [68]

Gunasekar, J

S. Gunasekar, J. D. Lee, D. Soudry, and N. Srebro, Im- plicit bias of gradient descent on linear convolutional net- works, Adv. Neural Inf. Process. Syst.31(2018)

work page 2018

[69] [69]

Gidel, F

G. Gidel, F. Bach, and S. Lacoste-Julien, Implicit regu- larization of discrete gradient dynamics in linear neural networks, Adv. Neural Inf. Process. Syst.32(2019)

work page 2019

[70] [70]

A. V. Varre, M.-L. Vladarean, L. Pillaud-Vivien, and N. Flammarion, On the spectral bias of two-layer lin- ear networks, Adv. Neural Inf. Process. Syst.36, 64380 (2023)

work page 2023

[71] [71]

Pierret and B

E. Pierret and B. Galerne, Diffusion models for gaus- sian distributions: Exact solutions and Wasserstein er- rors, inProceedings of the 13th International Conference on Learning Representation (ICLR)(2025)

work page 2025

[72] [72]

Lukoianov, C

A. Lukoianov, C. Yuan, J. Solomon, and V. Sitzmann, 29 Locality in image diffusion models emerges from data statistics, inThe Thirty-ninth Annual Conference on Neural Information Processing Systems(2026)

work page 2026

[73] [73]

M.KambandS.Ganguli,Ananalytictheoryofcreativity in convolutional diffusion models, inForty-second Inter- national Conference on Machine Learning(2025)

work page 2025

[74] [74]

Bhatt, M

A. Bhatt, M. Gupta, G. Kolossov, and A. Montanari, Generating from discrete distributions using diffusions: Insights from random constraint satisfaction problems, arXiv preprint arXiv:2603.20589 (2026)

work page arXiv 2026

[75] [75]

H. E. Stanley, Dependence of critical properties on di- mensionality of spins, Phys. Rev. Lett.20, 589 (1968)

work page 1968

[76] [76]

Itzykson and J.-M

C. Itzykson and J.-M. Drouffe,Statistical Field Theory: Volume 1, From Brownian Motion to Renormalization and Lattice Gauge Theory(Cambridge University Press, 1991)

work page 1991

[77] [77]

Zinn-Justin,Quantum Field Theory and Critical Phe- nomena, 4th ed

J. Zinn-Justin,Quantum Field Theory and Critical Phe- nomena, 4th ed. (Oxford University Press, Oxford, 2002)

work page 2002

[78] [78]

Mussardo,Statistical Field Theory: An Introduction to Exactly Solved Models in Statistical Physics, Oxford Graduate Texts (Oxford University Press, 2020)

G. Mussardo,Statistical Field Theory: An Introduction to Exactly Solved Models in Statistical Physics, Oxford Graduate Texts (Oxford University Press, 2020)

work page 2020

[79] [79]

Fradkin,Quantum Field Theory: An Integrated Ap- proach(Princeton University Press, 2021)

E. Fradkin,Quantum Field Theory: An Integrated Ap- proach(Princeton University Press, 2021)

work page 2021

[80] [80]

P. C. Hohenberg and B. I. Halperin, Theory of dynamic critical phenomena, Rev. Mod. Phys.49, 435 (1977)

work page 1977