Low-rank pre-training methods converge to geometrically and spectrally distinct basins and show diverging activations compared to full-rank training at 60M-350M scales.
Similarity of neural network representations revisited
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
method 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2representative citing papers
PRISM supplies a geometric upper bound on LLM variant risk that splits drift into scale, shape, and head axes and doubles as a differentiable regularizer against forgetting.
citing papers explorer
-
Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training
Low-rank pre-training methods converge to geometrically and spectrally distinct basins and show diverging activations compared to full-rank training at 60M-350M scales.
-
PRISM: A Geometric Risk Bound that Decomposes Drift into Scale, Shape, and Head
PRISM supplies a geometric upper bound on LLM variant risk that splits drift into scale, shape, and head axes and doubles as a differentiable regularizer against forgetting.