From sgd to spectra: A theory of neural network weight dynamics.arXiv preprint arXiv:2507.12709

Brian Richard Olsen, Sam Fatehmanesh, Frank Xiao, Adarsh Kumarappan, Anirudh Gajula · 2025 · arXiv 2507.12709

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

The Spectral Lifecycle of Transformer Training: Transient Compression Waves, Persistent Spectral Gradients, and the Q/K--V Asymmetry

cs.LG · 2026-04-03 · unverdicted · novelty 8.0

Transformer weight spectra exhibit transient compression waves that propagate layer-wise, persistent non-monotonic depth gradients in power-law exponents, and Q/K-V asymmetry, with the spectral exponent alpha predicting layer importance and enabling pruning gains of 1.1x-3.6x over Last-N baselines.

Random Matrix Theory of Early-Stopped Gradient Flow: A Transient BBP Scenario

stat.ML · 2026-04-20 · unverdicted · novelty 7.0

In an anisotropic random-matrix model of gradient flow, the teacher signal produces a transient BBP transition where the outlier eigenvalue emerges only in an intermediate time window before overfitting.

SpectralLoRA: Is Low-Frequency Structure Sufficient for LoRA Adaptation? A Spectral Analysis of Weight Updates

cs.LG · 2026-04-12 · unverdicted · novelty 7.0

LoRA weight updates are spectrally sparse, with 33% of DCT coefficients capturing 90% of energy on average, enabling 10x storage reduction and occasional gains by masking high frequencies.

citing papers explorer

Showing 3 of 3 citing papers.

The Spectral Lifecycle of Transformer Training: Transient Compression Waves, Persistent Spectral Gradients, and the Q/K--V Asymmetry cs.LG · 2026-04-03 · unverdicted · none · ref 11
Transformer weight spectra exhibit transient compression waves that propagate layer-wise, persistent non-monotonic depth gradients in power-law exponents, and Q/K-V asymmetry, with the spectral exponent alpha predicting layer importance and enabling pruning gains of 1.1x-3.6x over Last-N baselines.
Random Matrix Theory of Early-Stopped Gradient Flow: A Transient BBP Scenario stat.ML · 2026-04-20 · unverdicted · none · ref 10
In an anisotropic random-matrix model of gradient flow, the teacher signal produces a transient BBP transition where the outlier eigenvalue emerges only in an intermediate time window before overfitting.
SpectralLoRA: Is Low-Frequency Structure Sufficient for LoRA Adaptation? A Spectral Analysis of Weight Updates cs.LG · 2026-04-12 · unverdicted · none · ref 8
LoRA weight updates are spectrally sparse, with 33% of DCT coefficients capturing 90% of energy on average, enabling 10x storage reduction and occasional gains by masking high frequencies.

From sgd to spectra: A theory of neural network weight dynamics.arXiv preprint arXiv:2507.12709

fields

years

verdicts

representative citing papers

citing papers explorer