arxiv: 2605.12597 · v1 · submitted 2026-05-12 · ❄️ cond-mat.dis-nn · cond-mat.stat-mech· cs.AI· cs.LG· physics.comp-ph

Recognition: unknown

The critical slowing down in diffusion models

Luca Maria Del Bono , Giulio Biroli , Patrick Charbonneau , Marylou Gabri\'e

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:26 UTC · model grok-4.3

classification ❄️ cond-mat.dis-nn cond-mat.stat-mechcs.AIcs.LGphysics.comp-ph

keywords diffusion modelscritical slowing downO(n) modelscore-based generative modelsstatistical field theorysampling near criticalityneural network architecturelocal approximations

0 comments

The pith

Two-layer networks with local score approximation reduce critical slowing down in diffusion models to logarithmic scaling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies diffusion models on the O(n) model of statistical field theory in the exactly solvable Gaussian limit. It shows that even a one-layer network whose weights exactly match the model's score function still exhibits critical slowing down, so that both training and sampling times grow quadratically with system size. Introducing a two-layer architecture combined with a local score approximation changes the scaling to logarithmic while keeping the total number of parameters fixed. The results indicate that the sampling difficulties long known near criticality survive in learned generative models but can be mitigated by depth and locality. This supplies a controlled setting in which to understand and improve machine-learning-based sampling methods.

Core claim

In the Gaussian limit of the O(n) model, a score model trained with a one-layer network matching the exact solution displays critical slowing down in parameter learning that also slows the generation process. A two-layer architecture with a local score approximation reduces the training-time scaling from quadratic to logarithmic in system size without increasing the number of network parameters.

What carries the argument

The two-layer network with local score approximation, which incorporates physical locality to accelerate training while preserving parameter count.

If this is right

Training time for diffusion models near criticality scales logarithmically with system size under the two-layer local approximation.
Critical slowing down affects both parameter learning and the generation step in learned score-based models.
Architectural depth combined with locality overcomes the quadratic bottleneck without raising parameter count.
The same slowing-down mechanism known from traditional sampling persists in diffusion models but can be controlled by design choices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same two-layer local construction could be tested on finite-n or interacting versions of the O(n) model to check whether the logarithmic improvement survives beyond the Gaussian limit.
The framework offers a route to study whether depth-plus-locality strategies help other generative models near phase transitions.
One could measure the scaling of generation time itself, rather than only training time, in larger systems to quantify the remaining practical cost.
Extending the analysis to higher-dimensional lattices would test whether the locality advantage persists when the underlying correlation length grows.

Load-bearing premise

The Gaussian limit of the O(n) model together with a one-layer network that exactly matches its score function is representative of the critical slowing down seen in practical diffusion models.

What would settle it

A numerical experiment on the O(n) model in the Gaussian limit that finds quadratic rather than logarithmic growth of training time with system size when the two-layer local approximation is used would falsify the claimed reduction.

Figures

Figures reproduced from arXiv: 2605.12597 by Giulio Biroli, Luca Maria Del Bono, Marylou Gabri\'e, Patrick Charbonneau.

**Figure 2.** Figure 2: FIG. 2 [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. Standard deviation [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4 [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. Error analysis for the one-layer network architecture ( [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6 [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: FIG. 7. Backward diffusion (denoising) time evolution of the relative error [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: FIG. 8. Generated configurations at [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

read the original abstract

Computational sampling has been central to the sciences since the mid-20th century. While machine-learning-based approaches have recently enabled major advances, their behavior remains poorly understood, with limited theoretical control over when and why they succeed. Here we provide such insight for diffusion models-a class of generative schemes highly effective in practice-by analyzing their application to the $O(n)$ model of statistical field theory in the Gaussian limit $n \to \infty$. In this analytically tractable setting, we show that training a score model with a one-layer network architecture matching the exact solution exhibits a form of critical slowing down in parameter learning. This slowing down also impacts the generation process, indicating that the well-known difficulties of sampling near criticality persist even for learned generative models. To overcome this bottleneck, we demonstrate the power of combining architectural depth with physical locality. We find that using a two-layer architecture drastically reduces the critical slowing down, with the training time scaling logarithmically rather than quadratically with system size. By introducing a local score approximation we show that this acceleration in training time can be achieved without increasing the number of neural network parameters. Taken together, these results demonstrate that diffusion models can overcome the critical slowing down through appropriate architectural design, and establish a controlled framework for understanding and improving learned sampling methods in statistical physics and beyond.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

In the Gaussian O(n) limit a two-layer local score network turns quadratic critical slowing down into logarithmic scaling without extra parameters.

read the letter

The paper maps critical slowing down directly onto score-network training in the exactly solvable Gaussian O(n) model. A one-layer network that matches the exact score shows training time scaling quadratically with system size L, and the same slowdown appears in the generation step. Switching to a two-layer architecture plus a local score approximation reduces the scaling to logarithmic while keeping the parameter count fixed. This is the clearest new piece: an explicit, parameter-free derivation that isolates how depth and locality change the exponent in a limit where everything can be tracked analytically. The work does well by starting from the known exact score and deriving the loss landscape and gradient flow without fitting extra parameters, then comparing the two architectures head-to-head on scaling. The derivations are reproducible in principle and the claims are falsifiable with finite-size checks. The soft spot is the local approximation. The exact score contains long-range correlations set by the diverging length scale, so a strictly local truncation can introduce an L-dependent error whose size is not bounded in the manuscript. If that error grows even polynomially the effective conditioning reverts and the log scaling may not survive. The Gaussian limit is also narrow; how much carries to finite-n or non-Gaussian critical points remains open. This is for readers who want a controlled analytical window into why diffusion models struggle near criticality and whether architecture can fix it. It deserves a serious referee because the mapping and scaling comparison are cleanly executed even if the quantitative claims need the error analysis and finite-size tests filled in.

Referee Report

1 major / 2 minor

Summary. The paper analyzes diffusion models applied to the O(n) model in the Gaussian limit n→∞. It shows that training a score model with a one-layer network matching the exact solution exhibits critical slowing down, with training time scaling quadratically with system size L; this also affects the generation process. A two-layer architecture reduces the slowing down to logarithmic scaling in L. Introducing a local score approximation achieves this acceleration while keeping the number of neural network parameters fixed.

Significance. If the results hold, this establishes a controlled, analytically tractable framework for understanding critical slowing down in learned generative models for statistical physics systems near criticality. The explicit scaling comparisons in the Gaussian limit and the demonstration that depth plus locality can mitigate quadratic scaling without parameter growth are notable strengths that could guide architectural improvements for diffusion models in physics applications.

major comments (1)

[Results on two-layer architecture and local score approximation] The central claim of logarithmic training-time scaling under the local score approximation (see the section deriving the two-layer results and the local approximation) assumes the approximation error remains sub-dominant as L grows at criticality. However, the exact score contains long-range correlations set by the diverging length scale; without an explicit error bound or scaling analysis of the approximation error in the loss landscape and gradient flow, the claimed improvement over the one-layer quadratic scaling is not fully supported.

minor comments (2)

[Abstract] The abstract and quantitative claims provide no error bars, finite-n checks, or details on construction/validation of the local score approximation, which would strengthen the presentation of the scaling results.
[Methods] Notation for the local approximation and its dependence on nearest-neighbor fields should be clarified to allow readers to assess its range.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments. We address the major comment point by point below.

read point-by-point responses

Referee: The central claim of logarithmic training-time scaling under the local score approximation (see the section deriving the two-layer results and the local approximation) assumes the approximation error remains sub-dominant as L grows at criticality. However, the exact score contains long-range correlations set by the diverging length scale; without an explicit error bound or scaling analysis of the approximation error in the loss landscape and gradient flow, the claimed improvement over the one-layer quadratic scaling is not fully supported.

Authors: We thank the referee for highlighting this important point. The manuscript derives the logarithmic scaling explicitly for the two-layer network in the Gaussian limit by solving the gradient flow equations. For the local score approximation, we show through direct calculation that it reproduces the same scaling as the full two-layer model for the leading terms. However, we agree that a rigorous bound on the approximation error as L → ∞ at criticality would strengthen the result. In the revised version, we add an appendix with a scaling analysis of the error in the loss function, demonstrating that the long-range contributions lead to corrections that do not alter the logarithmic scaling. This addresses the concern without changing the main conclusions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations start from exact Gaussian score

full rationale

The paper begins from the analytically known exact score of the Gaussian O(n) model at n→∞ and derives the critical slowing down for a one-layer network by explicit comparison of the loss and gradient flow to that exact score. The logarithmic scaling improvement with two-layer depth plus local approximation is obtained by direct analysis of the resulting optimization dynamics and parameter count, without any reported scaling being forced by a fitted parameter or by redefinition of the input. No load-bearing step reduces to a self-citation chain or to an ansatz smuggled from prior work; the central claims remain independent of the reported measurements.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on the exact solvability of the Gaussian O(n) model and the assumption that a one-layer network can be made to match its score function exactly; no free parameters are introduced in the abstract, and no new entities are postulated.

axioms (1)

domain assumption The Gaussian limit n→∞ of the O(n) model is exactly solvable and its score function can be matched by a one-layer network.
Invoked to obtain an analytically tractable setting for studying critical slowing down.

pith-pipeline@v0.9.0 · 5551 in / 1285 out tokens · 27968 ms · 2026-05-14T20:26:53.131007+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

103 extracted references · 103 canonical work pages · 1 internal anchor

[1]

(14) with the Fourier space kernel in Eq

Exact score For the exact score in Eq. (14) with the Fourier space kernel in Eq. (15), we find the denoising process Eq. (10) 8 to become ∂t ˜φ∗(⃗k, t) =−˜φ∗(⃗k, t) " 1− ⃗k· ⃗k+m 2 eff ∆(⃗k· ⃗k+m 2 eff) +e −2t # , (30) where in this section we use˜φ∗ to denote the field com- ing from the exact backward diffusion equation. Equa- tion (30) can be integrated...

work page
[2]

(22)—the generation dynamics Eq

Approximate score from a fixed training time¯t For the approximate scoreSt trained for a time¯twith learning rateη—as given by Eq. (22)—the generation dynamics Eq. (30) becomes ∂t ˜φ(⃗k, t) = −˜φ(⃗k, t) " 1− ⃗k· ⃗k+m 2 eff ∆t(⃗k· ⃗k+m 2 eff) +e −2t 1−e − η¯t τt(⃗k) # , (33) whereτ t(⃗k)is given by Eq. (25). For˜φ ∗(⃗k, tmax)the starting field generated at...

work page
[3]

MUR PON Ricerca e Innovazione 2014-2020

Approximate score from a fixed error¯ε If instead of fixing the training time ¯tone fixes the error¯εmade in training the score,St =S t(1−¯ε), Eq. (30) becomes ∂t ˜φ(⃗k, t) =−˜φ(⃗k, t) " 1− ⃗k· ⃗k+m 2 eff ∆t(⃗k· ⃗k+m 2 eff) +e −2t (1−¯ε) # , (36) which for the same initial condition gives ˜φ(⃗k, t;t max) = e−¯εt e−¯εtmax e−2t + ∆t (⃗k· ⃗k+m 2 eff) e−2tmax...

work page 2014
[4]

G.Battimelli, G.Ciccotti, P.Greco,andG.Giobbi,Com- puter Meets Theoretical Physics: The New Frontier of Molecular Simulation(Springer, 2020)

work page 2020
[5]

Kirkpatrick, C

S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, Optimiza- tion by simulated annealing, Science220, 671 (1983)

work page 1983
[6]

Baity-Jesi, R

M. Baity-Jesi, R. A. Baños, A. Cruz, L. A. Fer- nandez, J. M. Gil-Narvión, A. Gordillo-Guerrero, M. Guidetti, J. Hernández, V. Martín-Mayor, A. M. Sudupe, D. Navarro, G. Parisi, S. Pérez-Gaviro, F. Ricci- Tersenghi, S. F. Schifano, B. Seoane, A. Tarancon, R. Tripiccione, J. J. Ruiz-Lorenzo, and D. Yllanes, Janus II: A new generation application-driven com...

work page 2014
[7]

Monasson and R

R. Monasson and R. Zecchina, Statistical mechanics of the randomk-satisfiability model, Phys. Rev. E56, 1357 (1997)

work page 1997
[8]

Mézard, G

M. Mézard, G. Parisi, and R. Zecchina, Analytic and al- gorithmic solution of random satisfiability problems, Sci- ence297, 812 (2002)

work page 2002
[9]

D. J. Amit, H. Gutfreund, and H. Sompolinsky, Spin- glass models of neural networks, Phys. Rev. A32, 1007 (1985)

work page 1985
[10]

D. J. Amit, H. Gutfreund, and H. Sompolinsky, Storing infinite numbers of patterns in a spin-glass model of neu- ral networks, Phys. Rev. Lett.55, 1530 (1985)

work page 1985
[11]

M. E. J. Newman and G. T. Barkema,Monte Carlo Methods in Statistical Physics(Oxford University Press, 1999)

work page 1999
[12]

D. P. Landau and K. Binder,A Guide to Monte Carlo Simulations in Statistical Physics, 4th ed. (Cambridge University Press, 2015)

work page 2015
[13]

Alfaro Miranda, M

G. Alfaro Miranda, M. Zheng, P. Charbonneau, A. Coniglio, L. F. Cugliandolo, and M. Tarzia, Per- colation and criticality of systems with competing in- teractions on Bethe lattices: Limitations and potential strengths of cluster schemes, arXiv:2510.02961 (2025)

work page arXiv 2025
[14]

Carleo, I

G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, and L. Zdeborová, Machine learning and the physical sciences, Rev. Mod. Phys.91, 045002 (2019)

work page 2019
[15]

A. W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, Á. D. Fernández, K. Kel- ley, I. Sillitoe,et al., Improved protein structure predic- tion using potentials from deep learning, Nature577, 706 (2020)

work page 2020
[16]

Dawid, J

A. Dawid, J. Arnold, B. Requena, A. Gresch, M. Płodzień, K. Donatella, K. A. Nicoli, P. Stornati, R. Koch, M. Büttner, R. Okuła, G. Muñoz-Gil, R. A. Vargas-Hernández, A. Cervera-Lierta, J. Carrasquilla, V. Dunjko, M. Gabrié, P. Huembeli, E. van Nieuwenburg, F. Vicentini, L. Wang, S. J. Wetzel, G. Carleo, E. Gre- plová, R. Krems, F. Marquardt, M. Tomza, M....

work page 2025
[17]

L. M. Del Bono, F. Ricci-Tersenghi, and F. Zam- poni, Demonstrating real advantage of machine learning–enhanced monte carlo for combinato- rial optimization, Proceedings of the National Academy of Sciences123, e2534768123 (2026), https://www.pnas.org/doi/pdf/10.1073/pnas.2534768123

work page doi:10.1073/pnas.2534768123 2026
[18]

F. Noé, S. Olsson, J. Köhler, and H. Wu, Boltzmann gen- erators: Sampling equilibrium states of many-body sys- tems with deep learning, Science365, eaaw1147 (2019)

work page 2019
[19]

Invernizzi, A

M. Invernizzi, A. Kramer, C. Clementi, and F. Noé, Skip- ping the replica exchange ladder with normalizing flows, J. Phys. Chem. Lett.13, 11643 (2022)

work page 2022
[20]

Noble, L

M. Noble, L. Grenioux, M. Gabrié, and A. O. Dur- mus, Learned reference-based diffusion sampler for multi- modal distributions, inProceedings of the 14th Interna- tional Conference on Learning Representations (ICLR) (2025)

work page 2025
[21]

Efficient Monte Carlo sampling of metastable systems using non-local collective variable updates

C. Schönle, D. Carbone, M. Gabrié, T. Lelièvre, and G. Stoltz, Efficient Monte-Carlo sampling of metastable systems using non-local collective variable updates, arXiv:2512.16812 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[22]

Grenioux, M

L. Grenioux, M. Noble, and M. Gabrié, Improving the evaluation of samplers on multi-modal targets, inPro- ceedings of the ICLR Workshop on Frontiers in Proba- bilistic Inference: Learning Meets Sampling(2025)

work page 2025
[23]

D. Wu, L. Wang, and P. Zhang, Solving statistical me- chanics using variational autoregressive networks, Phys. Rev. Lett.122, 080602 (2019)

work page 2019
[24]

McNaughton, M

B. McNaughton, M. V. Milošević, A. Perali, and S. Pi- lati, Boosting Monte Carlo simulations of spin glasses using autoregressive neural networks, Phys. Rev. E101, 053312 (2020)

work page 2020
[25]

L. M. Del Bono, F. Ricci-Tersenghi, and F. Zamponi, Nearest-neighbors neural network architecture for ef- ficient sampling of statistical physics models, Mach. Learn.: Sci. Technol.6, 025029 (2025)

work page 2025
[26]

Wang and Z

S. Wang and Z. Liu, Enhancing the efficiency of varia- tional autoregressive networks through renormalization group, Phys. Rev. E112, 035310 (2025)

work page 2025
[27]

M. S. Albergo, G. Kanwar, and P. E. Shanahan, Flow- based generative models for Markov chain Monte Carlo in lattice field theory, Phys. Rev. D100, 034515 (2019)

work page 2019
[28]

Kanwar, M

G. Kanwar, M. S. Albergo, D. Boyda, K. Cranmer, D. C. Hackett, S. Racaniere, D. J. Rezende, and P. E. Shana- han, Equivariant flow-based sampling for lattice gauge theory, Phys. Rev. Lett.125, 121601 (2020). 18

work page 2020
[29]

de Haan, C

P. de Haan, C. Rainone, M. C. N. Cheng, and R. Bonde- san, Scaling up machine learning for quantum field the- ory with equivariant continuous flows, arXiv:2110.02673 (2021)

work page arXiv 2021
[30]

Gabrié, G

M. Gabrié, G. M. Rotskoff, and E. Vanden-Eijnden, Adaptive Monte Carlo augmented with normalizing flows, Proc. Natl. Acad. Sci. U. S. A.119, e2109420119 (2022)

work page 2022
[31]

Gerdes, P

M. Gerdes, P. de Haan, C. Rainone, R. Bondesan, and M. C. Cheng, Learning lattice quantum field theories with equivariant continuous flows, SciPost Phys.15, 238 (2023)

work page 2023
[32]

Chen and E

Y. Chen and E. Vanden-Eijnden, Scale-adaptive genera- tive flows for multiscale scientific data, arXiv:2509.02971 (2025)

work page arXiv 2025
[33]

Potaptchik, L

P. Potaptchik, L. C. Kit, and M. S. Albergo, Tilt match- ing for scalable sampling and fine-tuning, inProceedings of the 14th International Conference on Learning Repre- sentations (ICLR)(2026)

work page 2026
[34]

Sohl-Dickstein, E

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, Deep unsupervised learning using nonequilib- rium thermodynamics, inProceedings of the 32nd Inter- national Conference on Machine Learning(Proceedings of Machine Learning Research, 2015) pp. 2256–2265

work page 2015
[35]

Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, Score-based generative modeling through stochastic differential equations, inProceedings of the 8th International Conference on Learning Repre- sentations (ICLR)(2021)

work page 2021
[36]

J. Ho, A. Jain, and P. Abbeel, Denoising diffusion prob- abilistic models, Adv. Neural Inf. Process. Syst.33, 6840 (2020)

work page 2020
[37]

Rombach, A

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, High-resolution image synthesis with latent diffusion models, inProceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE Computer Society, 2022) pp. 10684–10695

work page 2022
[38]

Dhariwal and A

P. Dhariwal and A. Nichol, Diffusion models beat GANs on image synthesis, Adv. Neural Inf. Process. Syst.34, 8780 (2021)

work page 2021
[39]

Saharia, W

C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans,et al., Photorealistic text-to-image diffusion models with deep language un- derstanding, Adv. Neural Inf. Process. Syst.35, 36479 (2022)

work page 2022
[40]

Sordo, E

Z. Sordo, E. Chagnon, and D. Ushizima, A re- view on generative AI for text-to-image and image-to- image generation and implications to scientific images, arXiv:2502.21151 (2025)

work page arXiv 2025
[41]

Y. Ma, K. Feng, Z. Hu, X. Wang, Y. Wang, M. Zheng, X. He, C. Zhu, H. Liu, Y. He,et al., Controllable video generation: A survey, arXiv:2507.16869 (2025)

work page arXiv 2025
[42]

Biroli and M

G. Biroli and M. Mézard, Generative diffusion in very large dimensions, J. Stat. Mech.2023, 093402 (2023)

work page 2023
[43]

S. Bae, E. Marinari, and F. Ricci-Tersenghi, Diffusion reconstruction for the diluted Ising model, Phys. Rev. E 111, L023301 (2025)

work page 2025
[44]

Sanokowski, W

S. Sanokowski, W. F. Berghammer, H. P. Wang, M. En- nemoser, S. Hochreiter, and S. Lehner, Scalable discrete diffusion samplers: Combinatorial optimization and sta- tistical physics, inProceedings of the 14th International Conference on Learning Representations (ICLR)(2025)

work page 2025
[45]

Matthews, M

A. Matthews, M. Arbel, D. J. Rezende, and A. Doucet, Continual repeated annealed flow transport Monte Carlo, inProceedings of the 39th International Conference on Machine Learning(Proceedings of Machine Learning Re- search, 2022) pp. 15196–15219

work page 2022
[46]

C.B.Tan, J.Bose, C.Lin, L.Klein, M.M.Bronstein,and A. Tong, Scalable equilibrium sampling with sequential Boltzmann generators, inProceedings of the 42nd Inter- national Conference on Machine Learning(Proceedings of Machine Learning Research, 2025) pp. 58467–58498

work page 2025
[47]

D. Ghio, Y. Dandi, F. Krzakala, and L. Zdeborová, Sam- pling with flows, diffusion, and autoregressive neural net- works from a spin-glass perspective, Proc. Natl. Acad. Sci. U. S. A.121, e2311810121 (2024)

work page 2024
[48]

L. M. Del Bono, F. Ricci-Tersenghi, and F. Zamponi, Performance of machine-learning-assisted Monte Carlo in sampling from simple statistical physics models, Phys. Rev. E112, 045307 (2025)

work page 2025
[49]

Catania, A

G. Catania, A. Decelle, C. Furtlehner, and B. Seoane, A theoretical framework for overfitting in energy-based modeling, inProceedings of the 42nd International Con- ference on Machine Learning(Proceedings of Machine Learning Research, 2025) pp. 6891–6919

work page 2025
[50]

Soletskyi, M

R. Soletskyi, M. Gabrié, and B. Loureiro, A theoreti- cal perspective on mode collapse in variational inference, Mach. Learn.: Sci. Technol.6, 025056 (2025)

work page 2025
[51]

Fogliani, B

L. Fogliani, B. Loureiro, and M. Gabrié, Annealing in variational inference mitigates mode collapse: A theoreti- cal study on Gaussian mixtures (2026), arXiv:2602.12923 [stat]

work page arXiv 2026
[52]

Marchand, M

T. Marchand, M. Ozawa, G. Biroli, and S. Mallat, Mul- tiscale data-driven energy estimation and generation, Phys. Rev. X13, 041038 (2023)

work page 2023
[53]

Arora, N

S. Arora, N. Cohen, N. Golowich, and W. Hu, A conver- gence analysis of gradient descent for deep linear neural networks, inProceedings of the 7th International Confer- ence on Learning Representations (ICLR)(2019)

work page 2019
[54]

A. Eftekhari, Training linear neural networks: Non-local convergence and complexity results, inProceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, 2020) pp. 2836–2847

work page 2020
[55]

A.M.Saxe, J.L.McClelland,andS.Ganguli,Exactsolu- tions to the nonlinear dynamics of learning in deep linear neural networks, inProceedings of the 2nd International Conference on Learning Representations (ICLR)(2014)

work page 2014
[56]

Arora, N

S. Arora, N. Cohen, and E. Hazan, On the optimization of deep networks: Implicit acceleration by overparame- terization, inProceedings of the 35th International Con- ference on Machine Learning(Proceedings of Machine Learning Research, 2018) pp. 244–253

work page 2018
[57]

A. M. Saxe, J. L. McClelland, and S. Ganguli, A math- ematical theory of semantic development in deep neu- ral networks, Proc. Natl. Acad. Sci. U. S. A.116, 11537 (2019)

work page 2019
[58]

Tarmoun, G

S. Tarmoun, G. França, B. D. Haeffele, and R. Vidal, Implicit acceleration of gradient flow in overparameter- ized linear models, inProceedings of the 9th International Conference on Learning Representations (ICLR)(2021)

work page 2021
[59]

Labarrière, C

H. Labarrière, C. Molinari, L. Rosasco, C. J. V. Cereño, and S. Villa, Optimization insights into deep diagonal linear networks, inProceedings of the 13th International Conference on Learning Representations (ICLR)(2025). 19

work page 2025
[60]

Gunasekar, J

S. Gunasekar, J. D. Lee, D. Soudry, and N. Srebro, Im- plicit bias of gradient descent on linear convolutional net- works, Adv. Neural Inf. Process. Syst.31(2018)

work page 2018
[61]

Gidel, F

G. Gidel, F. Bach, and S. Lacoste-Julien, Implicit regu- larization of discrete gradient dynamics in linear neural networks, Adv. Neural Inf. Process. Syst.32(2019)

work page 2019
[62]

A. V. Varre, M.-L. Vladarean, L. Pillaud-Vivien, and N. Flammarion, On the spectral bias of two-layer lin- ear networks, Adv. Neural Inf. Process. Syst.36, 64380 (2023)

work page 2023
[63]

Pierret and B

E. Pierret and B. Galerne, Diffusion models for gaus- sian distributions: Exact solutions and Wasserstein er- rors, inProceedings of the 13th International Conference on Learning Representation (ICLR)(2025)

work page 2025
[64]

Lukoianov, C

A. Lukoianov, C. Yuan, J. Solomon, and V. Sitzmann, Locality in image diffusion models emerges from data statistics, inThe Thirty-ninth Annual Conference on Neural Information Processing Systems(2026)

work page 2026
[65]

M.KambandS.Ganguli,Ananalytictheoryofcreativity in convolutional diffusion models, inForty-second Inter- national Conference on Machine Learning(2025)

work page 2025
[66]

Bhatt, M

A. Bhatt, M. Gupta, G. Kolossov, and A. Montanari, Generating from discrete distributions using diffusions: Insights from random constraint satisfaction problems, arXiv:2603.20589 (2026)

work page arXiv 2026
[67]

H. E. Stanley, Dependence of critical properties on di- mensionality of spins, Phys. Rev. Lett.20, 589 (1968)

work page 1968
[68]

Itzykson and J.-M

C. Itzykson and J.-M. Drouffe,Statistical Field Theory: Volume 1, From Brownian Motion to Renormalization and Lattice Gauge Theory(Cambridge University Press, 1991)

work page 1991
[69]

Zinn-Justin,Quantum Field Theory and Critical Phe- nomena, 4th ed

J. Zinn-Justin,Quantum Field Theory and Critical Phe- nomena, 4th ed. (Oxford University Press, Oxford, 2002)

work page 2002
[70]

Mussardo,Statistical Field Theory: An Introduction to Exactly Solved Models in Statistical Physics, Oxford Graduate Texts (Oxford University Press, 2020)

G. Mussardo,Statistical Field Theory: An Introduction to Exactly Solved Models in Statistical Physics, Oxford Graduate Texts (Oxford University Press, 2020)

work page 2020
[71]

Fradkin,Quantum Field Theory: An Integrated Ap- proach(Princeton University Press, 2021)

E. Fradkin,Quantum Field Theory: An Integrated Ap- proach(Princeton University Press, 2021)

work page 2021
[72]

P. C. Hohenberg and B. I. Halperin, Theory of dynamic critical phenomena, Rev. Mod. Phys.49, 435 (1977)

work page 1977
[73]

N. D. Mermin and H. Wagner, Absence of ferromag- netism or antiferromagnetism in one-or two-dimensional isotropic Heisenberg models, Phys. Rev. Lett.17, 1133 (1966)

work page 1966
[74]

B. I. Halperin, On the Hohenberg–Mermin–Wagner the- orem and its limitations, J. Stat. Phys.175, 521 (2019)

work page 2019
[75]

B. D. O. Anderson, Reverse-time diffusion equation mod- els, Stoch. Process. Appl.12, 313 (1982)

work page 1982
[76]

Saad,Iterative Methods for Sparse Linear Systems, 2nd ed

Y. Saad,Iterative Methods for Sparse Linear Systems, 2nd ed. (Society for Industrial and Applied Mathematics, 2003)

work page 2003
[77]

Mehta, M

P. Mehta, M. Bukov, C.-H. Wang, A. G. Day, C. Richard- son, C. K. Fisher, and D. J. Schwab, A high-bias, low- variance introduction to machine learning for physicists, Phys. Rep.810, 1 (2019)

work page 2019
[78]

Del Debbio, J

L. Del Debbio, J. Marsh Rossney, and M. Wilson, Effi- cient modeling of trivializing maps for latticeϕ4 theory using normalizing flows: A first look at scalability, Phys. Rev. D104, 094507 (2021)

work page 2021
[79]

B. T. Polyak, Some methods of speeding up the con- vergence of iteration methods, USSR Comput. Math. & Math. Phys.4, 1 (1964)

work page 1964
[80]

J. C. Duchi, E. Hazan, and Y. Singer, Adaptive subgradi- ent methods for online learning and stochastic optimiza- tion, J. Mach. Learn. Res.12, 2121 (2011)

work page 2011

Showing first 80 references.