Continuous Limits of Coupled Flows in Representation Learning

Weiwei Xu; Xuanbo Lu; Xuanqi Zhao; Xuchun Tong; Zilin Li

arxiv: 2604.16801 · v1 · submitted 2026-04-18 · 💻 cs.LG

Continuous Limits of Coupled Flows in Representation Learning

Zilin Li , Weiwei Xu , Xuchun Tong , Xuanbo Lu , Xuanqi Zhao This is my paper

Pith reviewed 2026-05-10 07:02 UTC · model grok-4.3

classification 💻 cs.LG

keywords decentralized learningrepresentation learningRiemannian manifoldsstochastic differential equationsglobal dissipativitydisentangled featurescoupled flowsLangevin dynamics

0 comments

The pith

Discrete decentralized learning on manifolds converges to orthogonally disentangled features.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models decentralized representation learning as coupled slow-fast dynamical systems on Riemannian manifolds. Discrete local updates are shown to converge uniformly to an overdamped Langevin stochastic differential equation through measure-theoretic limits. Stochastic analysis then establishes global dissipativity of the joint system, proving that weights align with the principal eigenspace and that linearly separable, orthogonally disentangled features arise spontaneously at equilibrium.

Core claim

Formalizing decentralized learning as a coupled slow-fast dynamical system on Riemannian manifolds, the discrete spatial transitions converge uniformly to an overdamped Langevin stochastic differential equation. Via the Itô-Poisson resolvent and a stochastic extension of LaSalle's Invariance Principle, the representation weights unconditionally avoid divergence and align strictly with the principal eigenspace of the spatial measure. A joint Lyapunov functional for the fully coupled spatial-parametric flow proves global dissipativity, with orthogonally disentangled, linearly separable features emerging at the stationary limit.

What carries the argument

The joint Lyapunov functional for the fully coupled spatial-parametric flow, which establishes global dissipativity via the Itô-Poisson resolvent and stochastic LaSalle invariance.

Load-bearing premise

The discrete spatial transitions converge uniformly to an overdamped Langevin stochastic differential equation via measure-theoretic limits on the Riemannian manifold.

What would settle it

A simulation or analytic counterexample in which the discrete coupled flow diverges or fails to align weights with the principal eigenspace of the spatial measure.

Figures

Figures reproduced from arXiv: 2604.16801 by Weiwei Xu, Xuanbo Lu, Xuanqi Zhao, Xuchun Tong, Zilin Li.

**Figure 1.** Figure 1: Coupled Slow-Fast Architecture. The framework operationalizes decentralized learning as intertwined streams regulated by a Global Objective E(X,W). It coordinates system convergence via a joint dissipative mechanism. Top (Fast Kinematics): Xτ undergoes Langevin diffusion. Bottom (Slow Plasticity): Wτ evolves via local Hebbian plasticity. Red dotted lines denote gradient guidance −∇E, enforcing a joint d… view at source ↗

**Figure 2.** Figure 2: Empirical Validation of the Macroscopic Limit. (Left) The fundamental bias-variance tradeoff in local graph operators. While the deterministic geometric bias decays as O(ε 2 N ), the stochastic variance diverges for infinitesimally small neighborhoods. (Middle) The fundamental topological necessity of the asymptotic scaling law (Eq. 1). Violating the spectral regime triggers severe numerical divergence of … view at source ↗

**Figure 3.** Figure 3: Macroscopic Spectral Evolution and Phase Transitions. Temporal trajectories of the latent eigenspectrum over 1.5 × 104 steps. Red markers denote principal structures isolated by the Algebraic Sieve; gray lines indicate suppressed noise. In heavily over-parameterized spaces (e.g., VGG-4096, BERT-768), the flow collapses uninformative dimensions into the null space (y = 0), providing empirical evidence for t… view at source ↗

read the original abstract

While modern representation learning relies heavily on global error signals, decentralized algorithms driven by local interactions offer a fundamental distributed alternative. However, the macroscopic convergence properties of these discrete dynamics on continuous data manifolds remain theoretically unresolved, notoriously suffering from parameter explosion. We bridge this gap by formalizing decentralized learning as a coupled slow-fast dynamical system on Riemannian manifolds. First, using measure-theoretic limits, we prove that the discrete spatial transitions converge uniformly to an overdamped Langevin stochastic differential equation. Second, via the It\^o-Poisson resolvent and a stochastic extension of LaSalle's Invariance Principle, we establish that the representation weights unconditionally avoid divergence and align strictly with the principal eigenspace of the spatial measure. Finally, we construct a joint Lyapunov functional for the fully coupled spatial-parametric flow. This proves global dissipativity and demonstrates that orthogonally disentangled, linearly separable features emerge spontaneously at the stationary limit. Our framework bridges discrete algorithms with continuous stochastic analysis, providing a formal theoretical baseline for decentralized representation learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches a continuous limit for decentralized learning on manifolds but the discrete-to-continuous convergence step is under-supported.

read the letter

The paper's core contribution is a continuous limit analysis for decentralized representation learning on Riemannian manifolds, showing that local interactions lead to global alignment and disentangled features in the limit. They formalize the discrete process as a coupled slow-fast dynamical system and derive convergence to an overdamped Langevin SDE. They do a good job framing the problem and bringing in tools like the Itô-Poisson resolvent and stochastic LaSalle's principle to handle the stochastic flows. The joint Lyapunov functional for proving dissipativity and the emergence of orthogonal disentanglement is a nice touch, and it addresses a real open question in how distributed algorithms behave on continuous data spaces. The soft spot is in the first proof step. The uniform convergence from discrete transitions to the continuous SDE via measure-theoretic limits is asserted, but the abstract gives no details on the required conditions for the interaction kernels or the manifold metric to ensure the limit holds uniformly. If those conditions are not met or not verified, the subsequent results on alignment and feature emergence lose their foundation. This seems like the load-bearing part, and the stress-test concern holds up based on what's presented. This paper is for researchers in theoretical machine learning focused on optimization on manifolds or decentralized methods. It provides a formal baseline that could be built upon, so it deserves a serious referee to check the derivations and assumptions. Recommendation: Send it for peer review, with the expectation that the convergence proof gets fleshed out with explicit bounds and checks.

Referee Report

2 major / 1 minor

Summary. The manuscript formalizes decentralized representation learning as a coupled slow-fast dynamical system on Riemannian manifolds. It claims three main results: (1) using measure-theoretic limits, discrete spatial transitions converge uniformly to an overdamped Langevin SDE; (2) via the Itô-Poisson resolvent and stochastic LaSalle invariance, representation weights unconditionally avoid divergence and align with the principal eigenspace of the spatial measure; (3) a joint Lyapunov functional for the fully coupled flow proves global dissipativity and shows that orthogonally disentangled, linearly separable features emerge spontaneously at the stationary limit.

Significance. If the central derivations hold with the required technical conditions, the work would provide a valuable theoretical bridge between discrete decentralized algorithms and continuous stochastic analysis on manifolds. It offers a formal baseline for understanding convergence and the spontaneous emergence of disentangled representations without global error signals, potentially informing the design of local-interaction methods that avoid parameter explosion.

major comments (2)

[Abstract] Abstract, first proof step: the claim that discrete spatial transitions converge uniformly to the overdamped Langevin SDE via measure-theoretic limits is load-bearing for all subsequent results, yet the abstract invokes only “measure-theoretic limits” without stating the requisite Lipschitz or moment conditions on the coupled interaction kernels, tightness of transition measures, or equicontinuity in local charts with respect to the manifold metric. No error bounds or assumption checks are supplied.
[Abstract] Abstract, second and third steps: the application of the Itô-Poisson resolvent plus stochastic LaSalle to establish unconditional alignment, and the construction of the joint Lyapunov functional for global dissipativity, are derived inside the continuous limit; any gap in the discrete-to-continuous passage therefore propagates directly to the headline claims of spontaneous orthogonal disentanglement and linear separability.

minor comments (1)

[Abstract] The LaTeX fragment “It^o-Poisson” should be rendered as “Itô-Poisson” for typographic correctness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. We address each major comment point by point below, indicating where revisions will be incorporated.

read point-by-point responses

Referee: [Abstract] Abstract, first proof step: the claim that discrete spatial transitions converge uniformly to the overdamped Langevin SDE via measure-theoretic limits is load-bearing for all subsequent results, yet the abstract invokes only “measure-theoretic limits” without stating the requisite Lipschitz or moment conditions on the coupled interaction kernels, tightness of transition measures, or equicontinuity in local charts with respect to the manifold metric. No error bounds or assumption checks are supplied.

Authors: The abstract is a high-level summary. The full technical conditions—Lipschitz continuity and moment bounds on the interaction kernels (Assumption 3.1), tightness of transition measures, and equicontinuity in local charts—are stated explicitly in Section 3. Theorem 3.2 proves uniform convergence to the overdamped Langevin SDE with explicit error bounds of order O(Δt + h), where Δt is the time step and h the spatial discretization parameter. We will revise the abstract to reference these conditions and the error bound for clarity. revision: yes
Referee: [Abstract] Abstract, second and third steps: the application of the Itô-Poisson resolvent plus stochastic LaSalle to establish unconditional alignment, and the construction of the joint Lyapunov functional for global dissipativity, are derived inside the continuous limit; any gap in the discrete-to-continuous passage therefore propagates directly to the headline claims of spontaneous orthogonal disentanglement and linear separability.

Authors: The second and third results are indeed derived for the limiting SDE. However, Section 3 establishes the uniform convergence under the stated assumptions, and the appendix verifies that the discrete system satisfies the prerequisites for the Itô-Poisson resolvent and stochastic LaSalle invariance to carry over. The joint Lyapunov functional is constructed to be preserved in the limit, so the claims on alignment and disentanglement hold for the original discrete dynamics via the convergence theorem. We see no unaddressed gap. revision: no

Circularity Check

0 steps flagged

No circularity: derivations invoke external theorems without self-referential reduction.

full rationale

The paper's chain proceeds by (1) invoking measure-theoretic limits to obtain uniform convergence of discrete transitions to an overdamped Langevin SDE on a Riemannian manifold, (2) applying the Itô-Poisson resolvent together with a stochastic extension of LaSalle's invariance principle to obtain unconditional alignment with the principal eigenspace, and (3) constructing a joint Lyapunov functional to prove global dissipativity and spontaneous orthogonal disentanglement. Each step relies on standard, externally established mathematical machinery (SDE convergence theory, stochastic LaSalle) whose statements and proofs are independent of the present paper's fitted quantities or definitions. No equation is shown to equal its own input by construction, no parameter is fitted on a subset and then relabeled a prediction, and no load-bearing premise reduces to a self-citation whose own justification is internal. The framework therefore remains non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on standard stochastic analysis plus two modeling choices: data on a Riemannian manifold and the slow-fast coupling structure. No free parameters or new entities are introduced in the abstract.

axioms (2)

domain assumption Data points lie on a Riemannian manifold
Invoked to define the continuous spatial transitions and the principal eigenspace alignment.
ad hoc to paper Measure-theoretic limits of discrete transitions exist and are uniform
Central step used to obtain the overdamped Langevin SDE.

pith-pipeline@v0.9.0 · 5477 in / 1278 out tokens · 35008 ms · 2026-05-10T07:02:51.252386+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

136 extracted references · 136 canonical work pages · 1 internal anchor

[1]

Springer Science & Business Media, 2012

Ralph Abraham, Jerrold E Marsden, and Tudor Ratiu.Manifolds, Tensor Analysis, and Applications, volume 75. Springer Science & Business Media, 2012

work page 2012
[2]

Princeton University Press, 2009

P-A Absil, Robert Mahony, and Rodolphe Sepulchre.Optimization algorithms on matrix manifolds. Princeton University Press, 2009

work page 2009
[3]

Birkhäuser, 2008

Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré.Gradient Flows: In Metric Spaces and in the Space of Probability Measures. Birkhäuser, 2008

work page 2008
[4]

A theoretical analysis of contrastive unsupervised representation learning

Sanjeev Arora, Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, and Nikunj Saunshi. A theoretical analysis of contrastive unsupervised representation learning. In International Conference on Machine Learning, pages 5628–5637. PMLR, 2019

work page 2019
[5]

Springer, 2014

Dominique Bakry, Ivan Gentil, and Michel Ledoux.Analysis and Geometry of Markov Diffusion Operators, volume 348. Springer, 2014

work page 2014
[6]

Neural networks and principal component analysis: Learning from examples without local minima.Neural networks, 2(1):53–58, 1989

Pierre Baldi and Kurt Hornik. Neural networks and principal component analysis: Learning from examples without local minima.Neural networks, 2(1):53–58, 1989

work page 1989
[7]

Laplacian eigenmaps and spectral techniques for embedding and clustering

Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. InAdvances in neural information processing systems, pages 585–591, 2001

work page 2001
[8]

Representation learning: A review and new perspectives.IEEE transactions on pattern analysis and machine intelligence, 35(8): 1798–1828, 2013

Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives.IEEE transactions on pattern analysis and machine intelligence, 35(8): 1798–1828, 2013

work page 2013
[9]

Prentice hall, 1989

Dimitri P Bertsekas and John N Tsitsiklis.Parallel and distributed computation: numerical methods. Prentice hall, 1989

work page 1989
[10]

John Wiley & Sons, 3rd edition, 1995

Patrick Billingsley.Probability and Measure. John Wiley & Sons, 3rd edition, 1995

work page 1995
[11]

Springer Science & Business Media, 2007

Vladimir I Bogachev.Measure theory, volume 1. Springer Science & Business Media, 2007

work page 2007
[12]

Springer, 2009

Vivek S Borkar.Stochastic approximation: a dynamical systems viewpoint. Springer, 2009

work page 2009
[13]

Oxford University Press, 2013

Stéphane Boucheron, Gábor Lugosi, and Pascal Massart.Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013

work page 2013
[14]

Geometric deep learning: going beyond euclidean data.IEEE signal processing magazine, 34 (4):18–42, 2017

Michael M Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. Geometric deep learning: going beyond euclidean data.IEEE signal processing magazine, 34 (4):18–42, 2017

work page 2017
[15]

Springer Science & Business Media, 2002

Han-Fu Chen.Stochastic Approximation and Its Applications, volume 64. Springer Science & Business Media, 2002

work page 2002
[16]

Neural ordinary differential equations

Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. InAdvances in neural information processing systems, volume 31, 2018

work page 2018
[17]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PMLR, 2020

work page 2020
[18]

On the global convergence of gradient descent for over- parameterized models using optimal transport

Lénaïc Chizat and Francis Bach. On the global convergence of gradient descent for over- parameterized models using optimal transport. InAdvances in neural information processing systems, volume 31, 2018

work page 2018
[19]

American Mathematical Society, 1997

Fan RK Chung.Spectral Graph Theory, volume 92. American Mathematical Society, 1997

work page 1997
[20]

Diffusion maps.Applied and computational harmonic analysis, 21(1):5–30, 2006

Ronald R Coifman and Stéphane Lafon. Diffusion maps.Applied and computational harmonic analysis, 21(1):5–30, 2006

work page 2006
[21]

Cambridge university press, 2014

Giuseppe Da Prato and Jerzy Zabczyk.Stochastic equations in infinite dimensions. Cambridge university press, 2014. 10

work page 2014
[22]

Random geometric graphs.Physical review E, 66(1): 016121, 2002

Jesper Dall and Michael Christensen. Random geometric graphs.Physical review E, 66(1): 016121, 2002

work page 2002
[23]

The rotation of eigenvectors by a perturbation

Chandler Davis and William M Kahan. The rotation of eigenvectors by a perturbation. iii. SIAM Journal on Numerical Analysis, 7(1):1–46, 1970

work page 1970
[24]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 4171–4186, 2019

work page 2019
[25]

Rajesh PN Rao and Dana H Ballard

James J DiCarlo and David D Cox. Untangling invariant object recognition.Trends in cognitive sciences, 11(8):333–341, 2007. doi: 10.1016/j.tics.2007.06.010

work page doi:10.1016/j.tics.2007.06.010 2007
[26]

American Mathematical Society, 1977

Joseph Diestel and John Jerry Uhl.Vector Measures, volume 15 ofMathematical Surveys and Monographs. American Mathematical Society, 1977

work page 1977
[27]

Springer-Verlag Heidelberg, 5th edition, 2017

Reinhard Diestel.Graph Theory. Springer-Verlag Heidelberg, 5th edition, 2017

work page 2017
[28]

Gossip algorithms for distributed signal processing.Proceedings of the IEEE, 98 (11):1847–1864, 2010

Alexandros G Dimakis, Soummya Kar, José MF Moura, Michael G Rabbat, and Anna Scaglione. Gossip algorithms for distributed signal processing.Proceedings of the IEEE, 98 (11):1847–1864, 2010

work page 2010
[29]

Birkhäuser, 1992

Manfredo Perdigao Do Carmo.Riemannian Geometry. Birkhäuser, 1992

work page 1992
[30]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2021

work page 2021
[31]

Cambridge University Press, 2002

Richard M Dudley.Real Analysis and Probability. Cambridge University Press, 2002

work page 2002
[32]

The geometry of algorithms with orthogonality constraints.SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998

Alan Edelman, Tomás A Arias, and Steven T Smith. The geometry of algorithms with orthogonality constraints.SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998

work page 1998
[33]

John Wiley & Sons, 2009

Stewart N Ethier and Thomas G Kurtz.Markov Processes: Characterization and Convergence. John Wiley & Sons, 2009

work page 2009
[34]

American Mathematical Society, 2nd edition, 2010

Lawrence C Evans.Partial Differential Equations, volume 19. American Mathematical Society, 2nd edition, 2010

work page 2010
[35]

Springer, 1969

Herbert Federer.Geometric Measure Theory. Springer, 1969

work page 1969
[36]

Testing the manifold hypothesis

Charles Fefferman, Sanjoy Mitter, and Hariharan Narayanan. Testing the manifold hypothesis. Journal of the American Mathematical Society, 29(4):983–1049, 2016

work page 2016
[37]

John Wiley & Sons, 2nd edition, 1999

Gerald B Folland.Real Analysis: Modern Techniques and Their Applications. John Wiley & Sons, 2nd edition, 1999

work page 1999
[38]

Springer Science & Business Media, 3rd edition, 2012

Mark I Freidlin and Alexander D Wentzell.Random Perturbations of Dynamical Systems, volume 260. Springer Science & Business Media, 3rd edition, 2012

work page 2012
[39]

Springer Berlin, 3rd edition, 2004

Crispin W Gardiner.Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences, volume 13. Springer Berlin, 3rd edition, 2004

work page 2004
[40]

JHU press, 2013

Gene H Golub and Charles F Van Loan.Matrix computations. JHU press, 2013

work page 2013
[41]

MIT press, 2016

Ian Goodfellow, Yoshua Bengio, and Aaron Courville.Deep learning. MIT press, 2016

work page 2016
[42]

An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

Ian J Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, and Yoshua Bengio. An empirical investigation of catastrophic forgetting in gradient-based neural networks.arXiv preprint arXiv:1312.6211, 2013

work page Pith review arXiv 2013
[43]

Springer Science & Business Media, 2nd edition, 2004

Alfred Gray.Tubes, volume 221. Springer Science & Business Media, 2nd edition, 2004. 11

work page 2004
[44]

The critical power for asymptotic connectivity in wireless networks.Stochastic Analysis and Applications, 16(3):547–566, 1998

Piyush Gupta and Panganamala R Kumar. The critical power for asymptotic connectivity in wireless networks.Stochastic Analysis and Applications, 16(3):547–566, 1998

work page 1998
[45]

Springer, 2015

Brian C Hall.Lie groups, Lie algebras, and representations: an elementary introduction, volume 222. Springer, 2015

work page 2015
[46]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

work page 2016
[47]

Momentum contrast for unsupervised visual representation learning

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020

work page 2020
[48]

From graphs to manifolds–weak and strong pointwise consistency of graph laplacians

Matthias Hein, Jean-Yves Audibert, and Ulrike V on Luxburg. From graphs to manifolds–weak and strong pointwise consistency of graph laplacians. InLearning Theory: 18th Annual Conference on Learning Theory, COLT 2005, pages 470–485. Springer, 2005

work page 2005
[49]

Towards a Definition of Disentangled Representations

Irina Higgins, David Amos, David Pfau, Demis Hassabis, Matthew Botvinick, and Danilo Rezende. Towards a definition of disentangled representations.arXiv preprint arXiv:1812.02230, 2018

work page Pith review arXiv 2018
[50]

Academic press, 2012

Morris W Hirsch, Stephen Smale, and Robert L Devaney.Differential equations, dynamical systems, and an introduction to chaos. Academic press, 2012

work page 2012
[51]

Neural networks and physical systems with emergent collective computational abilities.Proceedings of the national academy of sciences, 79(8):2554–2558, 1982

John J Hopfield. Neural networks and physical systems with emergent collective computational abilities.Proceedings of the national academy of sciences, 79(8):2554–2558, 1982

work page 1982
[52]

Cambridge University Press, 2nd edition, 2012

Roger A Horn and Charles R Johnson.Matrix Analysis. Cambridge University Press, 2nd edition, 2012

work page 2012
[53]

American Mathematical Society, 2002

Elton P Hsu.Stochastic Analysis on Manifolds, volume 38. American Mathematical Society, 2002

work page 2002
[54]

Independent component analysis: algorithms and applications

Aapo Hyvärinen and Erkki Oja. Independent component analysis: algorithms and applications. Neural networks, 13(4-5):411–430, 2000

work page 2000
[55]

Understanding dimensional collapse in contrastive self-supervised learning

Li Jing, Pascal Vincent, Yann LeCun, and Yuandong Tian. Understanding dimensional collapse in contrastive self-supervised learning. InInternational Conference on Learning Representations, 2022

work page 2022
[56]

The variational formulation of the fokker–planck equation.SIAM journal on mathematical analysis, 29(1):1–17, 1998

Richard Jordan, David Kinderlehrer, and Felix Otto. The variational formulation of the fokker–planck equation.SIAM journal on mathematical analysis, 29(1):1–17, 1998

work page 1998
[57]

Springer Science & Business Media, 5th edition, 2008

Jürgen Jost.Riemannian Geometry and Geometric Analysis. Springer Science & Business Media, 5th edition, 2008

work page 2008
[58]

Springer Science & Business Media, 2nd edition, 1991

Ioannis Karatzas and Steven Shreve.Brownian Motion and Stochastic Calculus, volume 113. Springer Science & Business Media, 2nd edition, 1991

work page 1991
[59]

Springer Science & Business Media, 2013

Tosio Kato.Perturbation theory for linear operators, volume 132. Springer Science & Business Media, 2013

work page 2013
[60]

John Wiley & Sons, 1979

Frank P Kelly.Reversibility and Stochastic Networks. John Wiley & Sons, 1979

work page 1979
[61]

Prentice hall Upper Saddle River, NJ, 2002

Hassan K Khalil.Nonlinear systems, volume 3. Prentice hall Upper Saddle River, NJ, 2002

work page 2002
[62]

Springer Science & Business Media, 2011

Rafail Khasminskii.Stochastic stability of differential equations, volume 66. Springer Science & Business Media, 2011

work page 2011
[63]

On the principle of averaging for itô’s stochastic differential equations

Rafail Z Khasminskii. On the principle of averaging for itô’s stochastic differential equations. Kibernetika, 4(3):260–279, 1968

work page 1968
[64]

Springer, 1992

Peter E Kloeden and Eckhard Platen.Numerical solution of stochastic differential equations. Springer, 1992. 12

work page 1992
[65]

MIT Press, 1984

Harold J Kushner.Approximation and Weak Convergence Methods for Random Processes, with Applications to Stochastic Systems Theory. MIT Press, 1984

work page 1984
[66]

Springer, 2nd edition, 2003

Harold J Kushner and G George Yin.Stochastic Approximation and Recursive Algorithms and Applications. Springer, 2nd edition, 2003

work page 2003
[67]

Some extensions of liapunov’s second method.IRE Transactions on circuit theory, 7(4):520–527, 1960

Joseph P LaSalle. Some extensions of liapunov’s second method.IRE Transactions on circuit theory, 7(4):520–527, 1960

work page 1960
[68]

Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

work page 1998
[69]

A tutorial on energy-based learning

Yann LeCun, Sumit Chopra, Raia Hadsell, Marc’Aurelio Ranzato, and Fu-Jie Huang. A tutorial on energy-based learning. In Gökhan H. Bakır, Thomas Hofmann, Bernhard Schölkopf, Alexander J. Smola, and Ben Taskar, editors,Predicting Structured Data, pages 191–246. MIT Press, 2006

work page 2006
[70]

Springer, 2018

John M Lee.Introduction to Riemannian Manifolds. Springer, 2018

work page 2018
[71]

American Mathematical Society, 2017

David A Levin and Yuval Peres.Markov Chains and Mixing Times, volume 107. American Mathematical Society, 2017

work page 2017
[72]

Neuro-vesicles: Neuromodulation should be a dynamical system, not a tensor decoration, 2025

Zilin Li, Weiwei Xu, and Vicki Kane. Neuro-vesicles: Neuromodulation should be a dynamical system, not a tensor decoration, 2025. URLhttps://arxiv.org/abs/2512.06966

work page arXiv 2025
[73]

Backpropagation and the brain.Nature Reviews Neuroscience, 21(6):335–346, 2020

Timothy P Lillicrap, Adam Santoro, Luke Marris, Colin J Akerman, and Geoffrey Hinton. Backpropagation and the brain.Nature Reviews Neuroscience, 21(6):335–346, 2020

work page 2020
[74]

Self-organization in a perceptual network.Computer, 21(3):105–117, 1988

Ralph Linsker. Self-organization in a perceptual network.Computer, 21(3):105–117, 1988

work page 1988
[75]

Challenging common assumptions in the unsupervised learning of disentangled representations

Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Raetsch, Sylvain Gelly, Bernhard Schölkopf, and Olivier Bachem. Challenging common assumptions in the unsupervised learning of disentangled representations. InInternational conference on machine learning, pages 4114–4124. PMLR, 2019

work page 2019
[76]

John Wiley & Sons, 3rd edition, 2019

Jan R Magnus and Heinz Neudecker.Matrix Differential Calculus with Applications in Statistics and Econometrics. John Wiley & Sons, 3rd edition, 2019

work page 2019
[77]

Stochastic versions of the lasalle theorem.Journal of Differential Equations, 153(1):175–195, 1999

Xuerong Mao. Stochastic versions of the lasalle theorem.Journal of Differential Equations, 153(1):175–195, 1999

work page 1999
[78]

A mean field view of the landscape of two-layer neural networks.Proceedings of the National Academy of Sciences, 115(33): E7665–E7671, 2018

Song Mei, Andrea Montanari, and Phan-Minh Nguyen. A mean field view of the landscape of two-layer neural networks.Proceedings of the National Academy of Sciences, 115(33): E7665–E7671, 2018

work page 2018
[79]

SIAM, 2000

Carl D Meyer.Matrix analysis and applied linear algebra, volume 71. SIAM, 2000

work page 2000
[80]

World Scientific Publishing Company, 1987

Marc Mézard, Giorgio Parisi, and Miguel Angel Virasoro.Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications, volume 9. World Scientific Publishing Company, 1987

work page 1987

Showing first 80 references.

[1] [1]

Springer Science & Business Media, 2012

Ralph Abraham, Jerrold E Marsden, and Tudor Ratiu.Manifolds, Tensor Analysis, and Applications, volume 75. Springer Science & Business Media, 2012

work page 2012

[2] [2]

Princeton University Press, 2009

P-A Absil, Robert Mahony, and Rodolphe Sepulchre.Optimization algorithms on matrix manifolds. Princeton University Press, 2009

work page 2009

[3] [3]

Birkhäuser, 2008

Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré.Gradient Flows: In Metric Spaces and in the Space of Probability Measures. Birkhäuser, 2008

work page 2008

[4] [4]

A theoretical analysis of contrastive unsupervised representation learning

Sanjeev Arora, Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, and Nikunj Saunshi. A theoretical analysis of contrastive unsupervised representation learning. In International Conference on Machine Learning, pages 5628–5637. PMLR, 2019

work page 2019

[5] [5]

Springer, 2014

Dominique Bakry, Ivan Gentil, and Michel Ledoux.Analysis and Geometry of Markov Diffusion Operators, volume 348. Springer, 2014

work page 2014

[6] [6]

Neural networks and principal component analysis: Learning from examples without local minima.Neural networks, 2(1):53–58, 1989

Pierre Baldi and Kurt Hornik. Neural networks and principal component analysis: Learning from examples without local minima.Neural networks, 2(1):53–58, 1989

work page 1989

[7] [7]

Laplacian eigenmaps and spectral techniques for embedding and clustering

Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. InAdvances in neural information processing systems, pages 585–591, 2001

work page 2001

[8] [8]

Representation learning: A review and new perspectives.IEEE transactions on pattern analysis and machine intelligence, 35(8): 1798–1828, 2013

Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives.IEEE transactions on pattern analysis and machine intelligence, 35(8): 1798–1828, 2013

work page 2013

[9] [9]

Prentice hall, 1989

Dimitri P Bertsekas and John N Tsitsiklis.Parallel and distributed computation: numerical methods. Prentice hall, 1989

work page 1989

[10] [10]

John Wiley & Sons, 3rd edition, 1995

Patrick Billingsley.Probability and Measure. John Wiley & Sons, 3rd edition, 1995

work page 1995

[11] [11]

Springer Science & Business Media, 2007

Vladimir I Bogachev.Measure theory, volume 1. Springer Science & Business Media, 2007

work page 2007

[12] [12]

Springer, 2009

Vivek S Borkar.Stochastic approximation: a dynamical systems viewpoint. Springer, 2009

work page 2009

[13] [13]

Oxford University Press, 2013

Stéphane Boucheron, Gábor Lugosi, and Pascal Massart.Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013

work page 2013

[14] [14]

Geometric deep learning: going beyond euclidean data.IEEE signal processing magazine, 34 (4):18–42, 2017

Michael M Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. Geometric deep learning: going beyond euclidean data.IEEE signal processing magazine, 34 (4):18–42, 2017

work page 2017

[15] [15]

Springer Science & Business Media, 2002

Han-Fu Chen.Stochastic Approximation and Its Applications, volume 64. Springer Science & Business Media, 2002

work page 2002

[16] [16]

Neural ordinary differential equations

Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. InAdvances in neural information processing systems, volume 31, 2018

work page 2018

[17] [17]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PMLR, 2020

work page 2020

[18] [18]

On the global convergence of gradient descent for over- parameterized models using optimal transport

Lénaïc Chizat and Francis Bach. On the global convergence of gradient descent for over- parameterized models using optimal transport. InAdvances in neural information processing systems, volume 31, 2018

work page 2018

[19] [19]

American Mathematical Society, 1997

Fan RK Chung.Spectral Graph Theory, volume 92. American Mathematical Society, 1997

work page 1997

[20] [20]

Diffusion maps.Applied and computational harmonic analysis, 21(1):5–30, 2006

Ronald R Coifman and Stéphane Lafon. Diffusion maps.Applied and computational harmonic analysis, 21(1):5–30, 2006

work page 2006

[21] [21]

Cambridge university press, 2014

Giuseppe Da Prato and Jerzy Zabczyk.Stochastic equations in infinite dimensions. Cambridge university press, 2014. 10

work page 2014

[22] [22]

Random geometric graphs.Physical review E, 66(1): 016121, 2002

Jesper Dall and Michael Christensen. Random geometric graphs.Physical review E, 66(1): 016121, 2002

work page 2002

[23] [23]

The rotation of eigenvectors by a perturbation

Chandler Davis and William M Kahan. The rotation of eigenvectors by a perturbation. iii. SIAM Journal on Numerical Analysis, 7(1):1–46, 1970

work page 1970

[24] [24]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 4171–4186, 2019

work page 2019

[25] [25]

Rajesh PN Rao and Dana H Ballard

James J DiCarlo and David D Cox. Untangling invariant object recognition.Trends in cognitive sciences, 11(8):333–341, 2007. doi: 10.1016/j.tics.2007.06.010

work page doi:10.1016/j.tics.2007.06.010 2007

[26] [26]

American Mathematical Society, 1977

Joseph Diestel and John Jerry Uhl.Vector Measures, volume 15 ofMathematical Surveys and Monographs. American Mathematical Society, 1977

work page 1977

[27] [27]

Springer-Verlag Heidelberg, 5th edition, 2017

Reinhard Diestel.Graph Theory. Springer-Verlag Heidelberg, 5th edition, 2017

work page 2017

[28] [28]

Gossip algorithms for distributed signal processing.Proceedings of the IEEE, 98 (11):1847–1864, 2010

Alexandros G Dimakis, Soummya Kar, José MF Moura, Michael G Rabbat, and Anna Scaglione. Gossip algorithms for distributed signal processing.Proceedings of the IEEE, 98 (11):1847–1864, 2010

work page 2010

[29] [29]

Birkhäuser, 1992

Manfredo Perdigao Do Carmo.Riemannian Geometry. Birkhäuser, 1992

work page 1992

[30] [30]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2021

work page 2021

[31] [31]

Cambridge University Press, 2002

Richard M Dudley.Real Analysis and Probability. Cambridge University Press, 2002

work page 2002

[32] [32]

The geometry of algorithms with orthogonality constraints.SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998

Alan Edelman, Tomás A Arias, and Steven T Smith. The geometry of algorithms with orthogonality constraints.SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998

work page 1998

[33] [33]

John Wiley & Sons, 2009

Stewart N Ethier and Thomas G Kurtz.Markov Processes: Characterization and Convergence. John Wiley & Sons, 2009

work page 2009

[34] [34]

American Mathematical Society, 2nd edition, 2010

Lawrence C Evans.Partial Differential Equations, volume 19. American Mathematical Society, 2nd edition, 2010

work page 2010

[35] [35]

Springer, 1969

Herbert Federer.Geometric Measure Theory. Springer, 1969

work page 1969

[36] [36]

Testing the manifold hypothesis

Charles Fefferman, Sanjoy Mitter, and Hariharan Narayanan. Testing the manifold hypothesis. Journal of the American Mathematical Society, 29(4):983–1049, 2016

work page 2016

[37] [37]

John Wiley & Sons, 2nd edition, 1999

Gerald B Folland.Real Analysis: Modern Techniques and Their Applications. John Wiley & Sons, 2nd edition, 1999

work page 1999

[38] [38]

Springer Science & Business Media, 3rd edition, 2012

Mark I Freidlin and Alexander D Wentzell.Random Perturbations of Dynamical Systems, volume 260. Springer Science & Business Media, 3rd edition, 2012

work page 2012

[39] [39]

Springer Berlin, 3rd edition, 2004

Crispin W Gardiner.Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences, volume 13. Springer Berlin, 3rd edition, 2004

work page 2004

[40] [40]

JHU press, 2013

Gene H Golub and Charles F Van Loan.Matrix computations. JHU press, 2013

work page 2013

[41] [41]

MIT press, 2016

Ian Goodfellow, Yoshua Bengio, and Aaron Courville.Deep learning. MIT press, 2016

work page 2016

[42] [42]

An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

Ian J Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, and Yoshua Bengio. An empirical investigation of catastrophic forgetting in gradient-based neural networks.arXiv preprint arXiv:1312.6211, 2013

work page Pith review arXiv 2013

[43] [43]

Springer Science & Business Media, 2nd edition, 2004

Alfred Gray.Tubes, volume 221. Springer Science & Business Media, 2nd edition, 2004. 11

work page 2004

[44] [44]

The critical power for asymptotic connectivity in wireless networks.Stochastic Analysis and Applications, 16(3):547–566, 1998

Piyush Gupta and Panganamala R Kumar. The critical power for asymptotic connectivity in wireless networks.Stochastic Analysis and Applications, 16(3):547–566, 1998

work page 1998

[45] [45]

Springer, 2015

Brian C Hall.Lie groups, Lie algebras, and representations: an elementary introduction, volume 222. Springer, 2015

work page 2015

[46] [46]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

work page 2016

[47] [47]

Momentum contrast for unsupervised visual representation learning

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020

work page 2020

[48] [48]

From graphs to manifolds–weak and strong pointwise consistency of graph laplacians

Matthias Hein, Jean-Yves Audibert, and Ulrike V on Luxburg. From graphs to manifolds–weak and strong pointwise consistency of graph laplacians. InLearning Theory: 18th Annual Conference on Learning Theory, COLT 2005, pages 470–485. Springer, 2005

work page 2005

[49] [49]

Towards a Definition of Disentangled Representations

Irina Higgins, David Amos, David Pfau, Demis Hassabis, Matthew Botvinick, and Danilo Rezende. Towards a definition of disentangled representations.arXiv preprint arXiv:1812.02230, 2018

work page Pith review arXiv 2018

[50] [50]

Academic press, 2012

Morris W Hirsch, Stephen Smale, and Robert L Devaney.Differential equations, dynamical systems, and an introduction to chaos. Academic press, 2012

work page 2012

[51] [51]

Neural networks and physical systems with emergent collective computational abilities.Proceedings of the national academy of sciences, 79(8):2554–2558, 1982

John J Hopfield. Neural networks and physical systems with emergent collective computational abilities.Proceedings of the national academy of sciences, 79(8):2554–2558, 1982

work page 1982

[52] [52]

Cambridge University Press, 2nd edition, 2012

Roger A Horn and Charles R Johnson.Matrix Analysis. Cambridge University Press, 2nd edition, 2012

work page 2012

[53] [53]

American Mathematical Society, 2002

Elton P Hsu.Stochastic Analysis on Manifolds, volume 38. American Mathematical Society, 2002

work page 2002

[54] [54]

Independent component analysis: algorithms and applications

Aapo Hyvärinen and Erkki Oja. Independent component analysis: algorithms and applications. Neural networks, 13(4-5):411–430, 2000

work page 2000

[55] [55]

Understanding dimensional collapse in contrastive self-supervised learning

Li Jing, Pascal Vincent, Yann LeCun, and Yuandong Tian. Understanding dimensional collapse in contrastive self-supervised learning. InInternational Conference on Learning Representations, 2022

work page 2022

[56] [56]

The variational formulation of the fokker–planck equation.SIAM journal on mathematical analysis, 29(1):1–17, 1998

Richard Jordan, David Kinderlehrer, and Felix Otto. The variational formulation of the fokker–planck equation.SIAM journal on mathematical analysis, 29(1):1–17, 1998

work page 1998

[57] [57]

Springer Science & Business Media, 5th edition, 2008

Jürgen Jost.Riemannian Geometry and Geometric Analysis. Springer Science & Business Media, 5th edition, 2008

work page 2008

[58] [58]

Springer Science & Business Media, 2nd edition, 1991

Ioannis Karatzas and Steven Shreve.Brownian Motion and Stochastic Calculus, volume 113. Springer Science & Business Media, 2nd edition, 1991

work page 1991

[59] [59]

Springer Science & Business Media, 2013

Tosio Kato.Perturbation theory for linear operators, volume 132. Springer Science & Business Media, 2013

work page 2013

[60] [60]

John Wiley & Sons, 1979

Frank P Kelly.Reversibility and Stochastic Networks. John Wiley & Sons, 1979

work page 1979

[61] [61]

Prentice hall Upper Saddle River, NJ, 2002

Hassan K Khalil.Nonlinear systems, volume 3. Prentice hall Upper Saddle River, NJ, 2002

work page 2002

[62] [62]

Springer Science & Business Media, 2011

Rafail Khasminskii.Stochastic stability of differential equations, volume 66. Springer Science & Business Media, 2011

work page 2011

[63] [63]

On the principle of averaging for itô’s stochastic differential equations

Rafail Z Khasminskii. On the principle of averaging for itô’s stochastic differential equations. Kibernetika, 4(3):260–279, 1968

work page 1968

[64] [64]

Springer, 1992

Peter E Kloeden and Eckhard Platen.Numerical solution of stochastic differential equations. Springer, 1992. 12

work page 1992

[65] [65]

MIT Press, 1984

Harold J Kushner.Approximation and Weak Convergence Methods for Random Processes, with Applications to Stochastic Systems Theory. MIT Press, 1984

work page 1984

[66] [66]

Springer, 2nd edition, 2003

Harold J Kushner and G George Yin.Stochastic Approximation and Recursive Algorithms and Applications. Springer, 2nd edition, 2003

work page 2003

[67] [67]

Some extensions of liapunov’s second method.IRE Transactions on circuit theory, 7(4):520–527, 1960

Joseph P LaSalle. Some extensions of liapunov’s second method.IRE Transactions on circuit theory, 7(4):520–527, 1960

work page 1960

[68] [68]

Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

work page 1998

[69] [69]

A tutorial on energy-based learning

Yann LeCun, Sumit Chopra, Raia Hadsell, Marc’Aurelio Ranzato, and Fu-Jie Huang. A tutorial on energy-based learning. In Gökhan H. Bakır, Thomas Hofmann, Bernhard Schölkopf, Alexander J. Smola, and Ben Taskar, editors,Predicting Structured Data, pages 191–246. MIT Press, 2006

work page 2006

[70] [70]

Springer, 2018

John M Lee.Introduction to Riemannian Manifolds. Springer, 2018

work page 2018

[71] [71]

American Mathematical Society, 2017

David A Levin and Yuval Peres.Markov Chains and Mixing Times, volume 107. American Mathematical Society, 2017

work page 2017

[72] [72]

Neuro-vesicles: Neuromodulation should be a dynamical system, not a tensor decoration, 2025

Zilin Li, Weiwei Xu, and Vicki Kane. Neuro-vesicles: Neuromodulation should be a dynamical system, not a tensor decoration, 2025. URLhttps://arxiv.org/abs/2512.06966

work page arXiv 2025

[73] [73]

Backpropagation and the brain.Nature Reviews Neuroscience, 21(6):335–346, 2020

Timothy P Lillicrap, Adam Santoro, Luke Marris, Colin J Akerman, and Geoffrey Hinton. Backpropagation and the brain.Nature Reviews Neuroscience, 21(6):335–346, 2020

work page 2020

[74] [74]

Self-organization in a perceptual network.Computer, 21(3):105–117, 1988

Ralph Linsker. Self-organization in a perceptual network.Computer, 21(3):105–117, 1988

work page 1988

[75] [75]

Challenging common assumptions in the unsupervised learning of disentangled representations

Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Raetsch, Sylvain Gelly, Bernhard Schölkopf, and Olivier Bachem. Challenging common assumptions in the unsupervised learning of disentangled representations. InInternational conference on machine learning, pages 4114–4124. PMLR, 2019

work page 2019

[76] [76]

John Wiley & Sons, 3rd edition, 2019

Jan R Magnus and Heinz Neudecker.Matrix Differential Calculus with Applications in Statistics and Econometrics. John Wiley & Sons, 3rd edition, 2019

work page 2019

[77] [77]

Stochastic versions of the lasalle theorem.Journal of Differential Equations, 153(1):175–195, 1999

Xuerong Mao. Stochastic versions of the lasalle theorem.Journal of Differential Equations, 153(1):175–195, 1999

work page 1999

[78] [78]

A mean field view of the landscape of two-layer neural networks.Proceedings of the National Academy of Sciences, 115(33): E7665–E7671, 2018

Song Mei, Andrea Montanari, and Phan-Minh Nguyen. A mean field view of the landscape of two-layer neural networks.Proceedings of the National Academy of Sciences, 115(33): E7665–E7671, 2018

work page 2018

[79] [79]

SIAM, 2000

Carl D Meyer.Matrix analysis and applied linear algebra, volume 71. SIAM, 2000

work page 2000

[80] [80]

World Scientific Publishing Company, 1987

Marc Mézard, Giorgio Parisi, and Miguel Angel Virasoro.Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications, volume 9. World Scientific Publishing Company, 1987

work page 1987