In double asymptotic limits, the squared singular value process of non-square matrix products obeys geometric Dyson Brownian motion whose T-transform solves a Burgers equation, producing the free log-normal law via free multiplicative convolution.
arXiv preprint arXiv:2402.10127 , year=
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
SGD on neural network weights induces a BBP phase transition that detaches signal eigenvalues from the random bulk, yielding an analytically solvable phase diagram for trainability in a linear teacher-student model.
Double preconditioning (DoPr) improves downstream task performance in test-time feedback settings without consistent gains in validation loss.
In the LP/N = Θ(1) regime, Bayesian predictive posteriors for deep MLPs equal those of data-dependent kernels to first order, with a criterion identifying data processes that benefit from larger effective depth.
citing papers explorer
-
Geometric Dyson Brownian Motions and the Free Log-Normal Limit for a Non-Square Product of Random Matrices
In double asymptotic limits, the squared singular value process of non-square matrix products obeys geometric Dyson Brownian motion whose T-transform solves a Burgers equation, producing the free log-normal law via free multiplicative convolution.
-
Spectral phase transitions and trainability in neural network learning dynamics
SGD on neural network weights induces a BBP phase transition that detaches signal eigenvalues from the random bulk, yielding an analytically solvable phase diagram for trainability in a linear teacher-student model.
-
Double Preconditioning (DoPr): Optimization for Test-Time Performance, not Validation Loss
Double preconditioning (DoPr) improves downstream task performance in test-time feedback settings without consistent gains in validation loss.
-
Bayesian Inference with Shaped Deep Non-linear MLPs
In the LP/N = Θ(1) regime, Bayesian predictive posteriors for deep MLPs equal those of data-dependent kernels to first order, with a criterion identifying data processes that benefit from larger effective depth.