Scale vectors in Pre-Norm LLMs aid optimization via preconditioning on linear layers rather than expressivity, and three lightweight modifications to them reduce terminal loss across model scales.
Seednorm: Self-rescaled dynamic normalization.arXiv preprint arXiv:2510.22777, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models
Scale vectors in Pre-Norm LLMs aid optimization via preconditioning on linear layers rather than expressivity, and three lightweight modifications to them reduce terminal loss across model scales.