Representation geometry in language models aligns with the unembedding readout subspace in a scale-dependent manner, preserved throughout training in large models but progressively lost in late layers of small models despite continued loss improvement.
Yamini Bansal, Preetum Nakkiran, and Boaz Barak
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Scale Determines Whether Language Models Organize Representation Geometry for Prediction
Representation geometry in language models aligns with the unembedding readout subspace in a scale-dependent manner, preserved throughout training in large models but progressively lost in late layers of small models despite continued loss improvement.