An entropy formula for the Deep Linear Network

Govind Menon; Tianmin Yu

arxiv: 2509.09088 · v3 · pith:K4KXYQM3new · submitted 2025-09-11 · 💻 cs.LG · math.DG· math.DS

An entropy formula for the Deep Linear Network

Govind Menon , Tianmin Yu This is my paper

Pith reviewed 2026-05-22 12:50 UTC · model grok-4.3

classification 💻 cs.LG math.DGmath.DS

keywords deep linear networksRiemannian geometryBoltzmann entropygroup orbitsbalanced manifoldRiemannian submersionJacobi matrices

0 comments

The pith

A Boltzmann entropy for deep linear networks is defined from group orbits on the balanced manifold.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a thermodynamic description of learning in deep linear networks by studying their Riemannian geometry. Group actions are used to analyze overparametrization, and a Boltzmann entropy is defined and computed from the foliation of the balanced manifold by group orbits. Riemannian submersion from the balanced manifold produces the geometry on the space of observables. The central technical step is an explicit orthonormal basis for the tangent space of the balanced manifold constructed via the theory of Jacobi matrices.

Core claim

The Riemannian geometry on the space of observables is obtained by Riemannian submersion of the balanced manifold, and a Boltzmann entropy is defined and computed using the foliation of the balanced manifold by group orbits. The explicit construction of an orthonormal basis for the tangent space of the balanced manifold using Jacobi matrices enables the entropy calculation.

What carries the argument

Balanced manifold in parameter space, foliated by group orbits, with Riemannian submersion to the space of observables and tangent-space orthonormal basis from Jacobi matrices.

If this is right

The entropy admits an explicit formula for any depth and width of the deep linear network.
Overparametrization is characterized by the dimension of the group orbits.
The geometry of observables matches the earlier definition obtained by direct construction.
A thermodynamic account of the learning dynamics becomes available through this entropy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same orbit-foliation construction may supply entropy measures for other overparametrized models whose parameter space admits a balanced manifold.
Entropy gradients along this foliation could be compared with the loss landscape to test whether entropy increase tracks generalization.
Numerical evaluation of the Jacobi-matrix basis on small networks would give concrete entropy values that can be tracked during gradient descent.

Load-bearing premise

The foliation of the balanced manifold by group orbits supplies a natural and physically meaningful definition of Boltzmann entropy for the learning process.

What would settle it

Direct computation of the entropy formula on a trained deep linear network that shows no consistent relation to overparametrization degree or to changes in the learning trajectory would falsify the definition.

read the original abstract

We study the Riemannian geometry of the Deep Linear Network (DLN) as a foundation for a thermodynamic description of the learning process. The main tools are the use of group actions to analyze overparametrization and the use of Riemannian submersion from the space of parameters to the space of observables. The foliation of the balanced manifold in the parameter space by group orbits is used to define and compute a Boltzmann entropy. We also show that the Riemannian geometry on the space of observables defined in [2] is obtained by Riemannian submersion of the balanced manifold. The main technical step is an explicit construction of an orthonormal basis for the tangent space of the balanced manifold using the theory of Jacobi matrices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives an explicit Boltzmann entropy for deep linear networks from group-orbit volumes on the balanced manifold, but the Jacobi-matrix tangent basis needs checking for general depth and width.

read the letter

The paper supplies an explicit Boltzmann entropy for deep linear networks derived from the volumes of group orbits on the balanced manifold. It also recovers the Riemannian geometry on the space of observables through a submersion of that manifold. What is new is the construction of an orthonormal basis for the tangent space using the theory of Jacobi matrices. This allows a direct computation of the entropy from the foliation geometry without relying on prior fitted quantities. The paper does well in giving a thermodynamic interpretation that links overparametrization to orbit volumes and connects optimization to statistical mechanics ideas. The approach stays within the geometric program for DLNs and avoids circularity by defining entropy geometrically. The main soft spot is whether the proposed Jacobi-matrix basis remains orthonormal for arbitrary depth and width. The tangent space identification involves products of weight matrices, and while Jacobi recurrences are standard, it is not obvious they preserve orthonormality once the balanced condition is imposed on those products. The paper should provide a check for L greater than 2 or unequal widths to confirm this. If the basis works, the entropy formula and submersion result are on solid ground. Otherwise the thermodynamic picture needs revision. This paper is for people studying geometric and statistical mechanics views of neural network training. Readers already working on Riemannian geometry of deep linear networks will find the explicit formulas and the basis construction useful. It deserves a serious referee because the claims are concrete and build directly on existing work in the area. I recommend sending it for peer review with attention to verifying the orthonormality for general network architectures.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a Riemannian geometric framework for deep linear networks (DLNs) as a foundation for a thermodynamic description of learning. It analyzes overparametrization via group actions on the parameter space, defines a Boltzmann entropy from the foliation of the balanced manifold by group orbits, and shows that the Riemannian geometry on the space of observables arises via Riemannian submersion from the balanced manifold. The central technical contribution is an explicit construction of an orthonormal basis for the tangent space to the balanced manifold using the theory of Jacobi matrices.

Significance. If the technical claims are established, the work supplies a concrete geometric and entropic toolset for DLNs that could connect optimization dynamics to thermodynamic principles. The explicit orthonormal basis and submersion result are strengths that enable parameter-free derivations of entropy and geometry; these could support falsifiable predictions about learning trajectories once the construction is verified for general architectures.

major comments (2)

[Main technical section on tangent-space basis construction] The central claim that an explicit orthonormal basis for T_p M_bal is obtained via Jacobi matrices (main technical step) must be shown to hold independently of depth L and width. The recurrence relations for tridiagonal Jacobi operators do not automatically guarantee orthonormality once the tangent-space identification involves products of weight matrices whose singular values are constrained only by the balanced condition; this must be verified explicitly for L>2 and unequal widths, as it is load-bearing for both the Riemannian submersion and the well-definedness of the entropy as log-volume of orbits.
[Section defining and computing the Boltzmann entropy] The definition of Boltzmann entropy via the foliation by group orbits assumes this geometric volume supplies a physically meaningful entropy for the learning process. The manuscript should supply a concrete check (e.g., reduction to a known case for L=2 or comparison with an alternative entropy) showing that the resulting formula is not an artifact of the chosen foliation.

minor comments (2)

[Introduction / Preliminaries] Notation for the balanced manifold and the group action should be introduced with a short diagram or explicit coordinate chart in the first technical section to aid readability.
[Abstract] The abstract states that the geometry on observables is recovered by submersion but does not preview the explicit entropy formula; adding one sentence would orient readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and the constructive comments. We address each major comment below and will revise the manuscript accordingly to strengthen the technical claims and interpretations.

read point-by-point responses

Referee: [Main technical section on tangent-space basis construction] The central claim that an explicit orthonormal basis for T_p M_bal is obtained via Jacobi matrices (main technical step) must be shown to hold independently of depth L and width. The recurrence relations for tridiagonal Jacobi operators do not automatically guarantee orthonormality once the tangent-space identification involves products of weight matrices whose singular values are constrained only by the balanced condition; this must be verified explicitly for L>2 and unequal widths, as it is load-bearing for both the Riemannian submersion and the well-definedness of the entropy as log-volume of orbits.

Authors: We thank the referee for this observation. The construction in the manuscript uses the theory of Jacobi matrices to produce the orthonormal basis for the tangent space to the balanced manifold, with the balanced condition ensuring the singular values satisfy the recurrence relations that preserve orthonormality for arbitrary depth and widths. However, to address the request for explicit verification, the revised manuscript will include an appendix with direct computations for L=3 and unequal widths (e.g., 4-3-5), confirming that the inner products remain zero off-diagonal and unity on-diagonal under the balanced constraint. This will make the independence from specific L and widths fully explicit. revision: yes
Referee: [Section defining and computing the Boltzmann entropy] The definition of Boltzmann entropy via the foliation by group orbits assumes this geometric volume supplies a physically meaningful entropy for the learning process. The manuscript should supply a concrete check (e.g., reduction to a known case for L=2 or comparison with an alternative entropy) showing that the resulting formula is not an artifact of the chosen foliation.

Authors: We agree that an explicit consistency check strengthens the interpretation. The revised manuscript will add a subsection reducing the general entropy formula to the L=2 case. In this reduction the balanced manifold and group orbits recover the standard matrix factorization geometry, and the entropy matches the known expression for the volume of orbits in the two-layer setting. This demonstrates that the formula is consistent with prior results rather than an artifact of the foliation. A brief comparison with the entropy induced by the loss level sets will also be included to further support its relevance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained via explicit construction

full rationale

The paper constructs an explicit orthonormal basis for the tangent space to the balanced manifold using Jacobi matrix theory and defines Boltzmann entropy directly from the volume of group orbits in the foliation. It then derives the observable-space geometry as a Riemannian submersion of this manifold, referencing [2] only for comparison rather than as a load-bearing premise. No step reduces a claimed prediction or result to a fitted parameter, self-definition, or unverified self-citation chain; the central claims rest on the provided geometric constructions and are independent of the target entropy formula.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on standard differential geometry and Lie group theory with the novel elements being the entropy definition and the explicit tangent-space basis.

axioms (1)

standard math Standard axioms of Riemannian manifolds and Lie group actions on parameter space.
Invoked to analyze overparametrization and to define the balanced manifold and its foliation.

pith-pipeline@v0.9.0 · 5639 in / 1287 out tokens · 36042 ms · 2026-05-22T12:50:05.025972+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The main technical step is an explicit construction of an orthonormal basis for the tangent space of the balanced manifold using the theory of Jacobi matrices.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

S(X) = (N−1) log c_d + ½ log van(Σ²)/van(Σ^{2N})

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Geometric and Spectral Alignment for Deep Neural Network I
cs.LG 2026-05 unverdicted novelty 6.0

Residual network Jacobians under Frobenius normalization have singular spectra that form trace-normalized Cartan orbits satisfying slack-aware margin inequalities bounding exponent drift to order (log M)/L in zero-sla...

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · cited by 1 Pith paper

[1]

Arora, N

S. Arora, N. Cohen, and E. Hazan , On the optimization of deep networks: Implicit acceleration by overparameterization , in International Conference on Machine Learning, PMLR, 2018, pp. 244--253

work page 2018
[2]

B. Bah, H. Rauhut, U. Terstiege, and M. Westdickenberg , Learning deep linear neural networks: R iemannian gradient flows and convergence to global minimizers , Information and Inference: A Journal of the IMA, 11 (2022), pp. 307--353

work page 2022
[3]

Baldi and K

P. Baldi and K. Hornik , Learning in linear neural networks: a survey , IEEE Transactions on Neural Networks, 6 (1995), pp. 837--858

work page 1995
[4]

Belkin, D

M. Belkin, D. Hsu, S. Ma, and S. Mandal , Reconciling modern machine-learning practice and the classical bias--variance trade-off , Proceedings of the National Academy of Sciences, 116 (2019), pp. 15849--15854

work page 2019
[5]

Belkin, D

M. Belkin, D. Hsu, and J. Xu , Two models of double descent for weak features , SIAM Journal on Mathematics of Data Science, 2 (2020), pp. 1167--1180

work page 2020
[6]

Br \'e chet, K

P. Br \'e chet, K. Papagiannouli, J. An, and G. Mont \'u far , Critical points and convergence analysis of generative deep linear networks trained with B ures- W asserstein loss , in International Conference on Machine Learning, PMLR, 2023, pp. 3106--3147

work page 2023
[7]

Brockett , Modeling the transient behavior of stochastic gradient algorithms , in 2011 50th IEEE Conference on Decision and Control and European Control Conference, IEEE, 2011, pp

R. Brockett , Modeling the transient behavior of stochastic gradient algorithms , in 2011 50th IEEE Conference on Decision and Control and European Control Conference, IEEE, 2011, pp. 4461--4466

work page 2011
[8]

R. W. Brockett , Dynamical systems that sort lists, diagonalize matrices, and solve linear programming problems , Linear Algebra Appl., 146 (1991), pp. 79--91

work page 1991
[9]

77 of Math

height 2pt depth -1.6pt width 23pt, Dynamical systems and their associated automata , in Systems and networks: mathematical theory and applications, V ol.\ I ( R egensburg, 1993), vol. 77 of Math. Res., Akademie-Verlag, Berlin, 1994, pp. 49--69

work page 1993
[10]

Chen , Geodesics in the deep linear network , Preprint, (2025)

A. Chen , Geodesics in the deep linear network , Preprint, (2025)

work page 2025
[11]

A. Chen, T. S. Kotwal, and G. Menon , Equilibrium measures in the deep linear network , Preprint, (2025)

work page 2025
[12]

Chen and P

T. Chen and P. M. Ewald , Geometric structure of D eep L earning networks and construction of global L ^2 minimizers , arXiv:2309.10639, (2024)

work page arXiv 2024
[13]

Chizat, M

L. Chizat, M. Colombo, X. Fern \'a ndez-Real, and A. Figalli , Infinite-width limit of deep linear neural networks , Communications on Pure and Applied Mathematics, 77 (2024), pp. 3958--4007

work page 2024
[14]

H. T. M. Chu, S. Ghosh, C. T. Lam, and S. S. Mukherjee , Implicit regularization via spectral neural networks and non-linear matrix sensing , arXiv:2402.17595, (2024)

work page arXiv 2024
[15]

Cohen, G

N. Cohen, G. Menon, and Z. Veraszto , Deep linear networks for matrix completion—an infinite depth limit , SIAM Journal on Applied Dynamical Systems, 22 (2023), pp. 3208--3232

work page 2023
[16]

R. Ge, C. Jin, and Y. Zheng , No spurious local minima in nonconvex low rank problems: A unified geometric analysis , in International Conference on Machine Learning, PMLR, 2017, pp. 1233--1242

work page 2017
[17]

G. H. Golub and C. F. Van Loan , Matrix computations , Johns Hopkins Studies in the Mathematical Sciences, Johns Hopkins University Press, Baltimore, MD, fourth ed., 2013

work page 2013
[18]

Gunasekar, J

S. Gunasekar, J. D. Lee, D. Soudry, and N. Srebro , Implicit bias of gradient descent on linear convolutional networks , Advances in Neural Information Processing Systems, 31 (2018)

work page 2018
[19]

Gunasekar, B

S. Gunasekar, B. E. Woodworth, S. Bhojanapalli, B. Neyshabur, and N. Srebro , Implicit regularization in matrix factorization , in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA , 2017, pp. 6151--6159

work page 2017
[20]

Ji and M

Z. Ji and M. Telgarsky , Directional convergence and alignment in deep learning , Advances in Neural Information Processing Systems, 33 (2020), pp. 17176--17186

work page 2020
[21]

Kato , Perturbation theory for linear operators , vol

T. Kato , Perturbation theory for linear operators , vol. Band 132 of Die Grundlehren der mathematischen Wissenschaften, Springer-Verlag New York, Inc., New York, 1966

work page 1966
[22]

K. Kohn, T. Merkh, G. Mont\'ufar, and M. Trager , Geometry of linear convolutional networks , SIAM J. Appl. Algebra Geom., 6 (2022), pp. 368--406

work page 2022
[23]

K. Kohn, G. Mont\'ufar, V. Shahverdi, and M. Trager , Function space and critical points of linear convolutional networks , SIAM J. Appl. Algebra Geom., 8 (2024), pp. 333--362

work page 2024
[24]

Lindsey and G

K. Lindsey and G. Menon , Regularization implies balancedness in the deep linear network , Preprint, (2025)

work page 2025
[25]

Menon , The geometry of the deep linear network , in XIV Symposium on Probability and Stochastic Processes, C

G. Menon , The geometry of the deep linear network , in XIV Symposium on Probability and Stochastic Processes, C. G. H. Chan, J. A. L. Mimbela, and C. G. P. Sergio I. L\' o pez, eds., Progress in Probability, Birkh\" a user Cham, 2025

work page 2025
[26]

Menon and T

G. Menon and T. Yu , The R iemannian L angevin equation and conic programs , Bulletin of the Institute of Mathematics Academia Sinica (New Series), 20 (2025), pp. 197--213

work page 2025
[27]

height 2pt depth -1.6pt width 23pt, A R iemannian L angevin equation for the deep linear network , Preprint, (2025)

work page 2025
[28]

G. M. Nguegnang, H. Rauhut, and U. Terstiege , Convergence of gradient descent for learning linear neural networks , Adv. Contin. Discrete Models, (2024), pp. Paper No. 23, 28

work page 2024
[29]

F. W. Ponting and H. S. A. Potter , The volume of orthogonal and unitary space , The Quarterly Journal of Mathematics, os-20 (1949), pp. 146--154

work page 1949
[30]

Vardi , On the implicit bias in deep-learning algorithms , Communications of the ACM, 66 (2023), pp

G. Vardi , On the implicit bias in deep-learning algorithms , Communications of the ACM, 66 (2023), pp. 86--93

work page 2023

[1] [1]

Arora, N

S. Arora, N. Cohen, and E. Hazan , On the optimization of deep networks: Implicit acceleration by overparameterization , in International Conference on Machine Learning, PMLR, 2018, pp. 244--253

work page 2018

[2] [2]

B. Bah, H. Rauhut, U. Terstiege, and M. Westdickenberg , Learning deep linear neural networks: R iemannian gradient flows and convergence to global minimizers , Information and Inference: A Journal of the IMA, 11 (2022), pp. 307--353

work page 2022

[3] [3]

Baldi and K

P. Baldi and K. Hornik , Learning in linear neural networks: a survey , IEEE Transactions on Neural Networks, 6 (1995), pp. 837--858

work page 1995

[4] [4]

Belkin, D

M. Belkin, D. Hsu, S. Ma, and S. Mandal , Reconciling modern machine-learning practice and the classical bias--variance trade-off , Proceedings of the National Academy of Sciences, 116 (2019), pp. 15849--15854

work page 2019

[5] [5]

Belkin, D

M. Belkin, D. Hsu, and J. Xu , Two models of double descent for weak features , SIAM Journal on Mathematics of Data Science, 2 (2020), pp. 1167--1180

work page 2020

[6] [6]

Br \'e chet, K

P. Br \'e chet, K. Papagiannouli, J. An, and G. Mont \'u far , Critical points and convergence analysis of generative deep linear networks trained with B ures- W asserstein loss , in International Conference on Machine Learning, PMLR, 2023, pp. 3106--3147

work page 2023

[7] [7]

Brockett , Modeling the transient behavior of stochastic gradient algorithms , in 2011 50th IEEE Conference on Decision and Control and European Control Conference, IEEE, 2011, pp

R. Brockett , Modeling the transient behavior of stochastic gradient algorithms , in 2011 50th IEEE Conference on Decision and Control and European Control Conference, IEEE, 2011, pp. 4461--4466

work page 2011

[8] [8]

R. W. Brockett , Dynamical systems that sort lists, diagonalize matrices, and solve linear programming problems , Linear Algebra Appl., 146 (1991), pp. 79--91

work page 1991

[9] [9]

77 of Math

height 2pt depth -1.6pt width 23pt, Dynamical systems and their associated automata , in Systems and networks: mathematical theory and applications, V ol.\ I ( R egensburg, 1993), vol. 77 of Math. Res., Akademie-Verlag, Berlin, 1994, pp. 49--69

work page 1993

[10] [10]

Chen , Geodesics in the deep linear network , Preprint, (2025)

A. Chen , Geodesics in the deep linear network , Preprint, (2025)

work page 2025

[11] [11]

A. Chen, T. S. Kotwal, and G. Menon , Equilibrium measures in the deep linear network , Preprint, (2025)

work page 2025

[12] [12]

Chen and P

T. Chen and P. M. Ewald , Geometric structure of D eep L earning networks and construction of global L ^2 minimizers , arXiv:2309.10639, (2024)

work page arXiv 2024

[13] [13]

Chizat, M

L. Chizat, M. Colombo, X. Fern \'a ndez-Real, and A. Figalli , Infinite-width limit of deep linear neural networks , Communications on Pure and Applied Mathematics, 77 (2024), pp. 3958--4007

work page 2024

[14] [14]

H. T. M. Chu, S. Ghosh, C. T. Lam, and S. S. Mukherjee , Implicit regularization via spectral neural networks and non-linear matrix sensing , arXiv:2402.17595, (2024)

work page arXiv 2024

[15] [15]

Cohen, G

N. Cohen, G. Menon, and Z. Veraszto , Deep linear networks for matrix completion—an infinite depth limit , SIAM Journal on Applied Dynamical Systems, 22 (2023), pp. 3208--3232

work page 2023

[16] [16]

R. Ge, C. Jin, and Y. Zheng , No spurious local minima in nonconvex low rank problems: A unified geometric analysis , in International Conference on Machine Learning, PMLR, 2017, pp. 1233--1242

work page 2017

[17] [17]

G. H. Golub and C. F. Van Loan , Matrix computations , Johns Hopkins Studies in the Mathematical Sciences, Johns Hopkins University Press, Baltimore, MD, fourth ed., 2013

work page 2013

[18] [18]

Gunasekar, J

S. Gunasekar, J. D. Lee, D. Soudry, and N. Srebro , Implicit bias of gradient descent on linear convolutional networks , Advances in Neural Information Processing Systems, 31 (2018)

work page 2018

[19] [19]

Gunasekar, B

S. Gunasekar, B. E. Woodworth, S. Bhojanapalli, B. Neyshabur, and N. Srebro , Implicit regularization in matrix factorization , in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA , 2017, pp. 6151--6159

work page 2017

[20] [20]

Ji and M

Z. Ji and M. Telgarsky , Directional convergence and alignment in deep learning , Advances in Neural Information Processing Systems, 33 (2020), pp. 17176--17186

work page 2020

[21] [21]

Kato , Perturbation theory for linear operators , vol

T. Kato , Perturbation theory for linear operators , vol. Band 132 of Die Grundlehren der mathematischen Wissenschaften, Springer-Verlag New York, Inc., New York, 1966

work page 1966

[22] [22]

K. Kohn, T. Merkh, G. Mont\'ufar, and M. Trager , Geometry of linear convolutional networks , SIAM J. Appl. Algebra Geom., 6 (2022), pp. 368--406

work page 2022

[23] [23]

K. Kohn, G. Mont\'ufar, V. Shahverdi, and M. Trager , Function space and critical points of linear convolutional networks , SIAM J. Appl. Algebra Geom., 8 (2024), pp. 333--362

work page 2024

[24] [24]

Lindsey and G

K. Lindsey and G. Menon , Regularization implies balancedness in the deep linear network , Preprint, (2025)

work page 2025

[25] [25]

Menon , The geometry of the deep linear network , in XIV Symposium on Probability and Stochastic Processes, C

G. Menon , The geometry of the deep linear network , in XIV Symposium on Probability and Stochastic Processes, C. G. H. Chan, J. A. L. Mimbela, and C. G. P. Sergio I. L\' o pez, eds., Progress in Probability, Birkh\" a user Cham, 2025

work page 2025

[26] [26]

Menon and T

G. Menon and T. Yu , The R iemannian L angevin equation and conic programs , Bulletin of the Institute of Mathematics Academia Sinica (New Series), 20 (2025), pp. 197--213

work page 2025

[27] [27]

height 2pt depth -1.6pt width 23pt, A R iemannian L angevin equation for the deep linear network , Preprint, (2025)

work page 2025

[28] [28]

G. M. Nguegnang, H. Rauhut, and U. Terstiege , Convergence of gradient descent for learning linear neural networks , Adv. Contin. Discrete Models, (2024), pp. Paper No. 23, 28

work page 2024

[29] [29]

F. W. Ponting and H. S. A. Potter , The volume of orthogonal and unitary space , The Quarterly Journal of Mathematics, os-20 (1949), pp. 146--154

work page 1949

[30] [30]

Vardi , On the implicit bias in deep-learning algorithms , Communications of the ACM, 66 (2023), pp

G. Vardi , On the implicit bias in deep-learning algorithms , Communications of the ACM, 66 (2023), pp. 86--93

work page 2023