Symmetry in language statistics shapes the geometry of model representations

Andres Nava; Daniel J. Korchinski; Dhruva Karkada; Matthieu Wyart; Yasaman Bahri

arxiv: 2602.15029 · v3 · pith:CDNVENGAnew · submitted 2026-02-16 · 💻 cs.LG · cond-mat.dis-nn· cs.CL

Symmetry in language statistics shapes the geometry of model representations

Dhruva Karkada , Daniel J. Korchinski , Andres Nava , Matthieu Wyart , Yasaman Bahri This is my paper

classification 💻 cs.LG cond-mat.dis-nncs.CL

keywords languagemodelsstatisticsembeddinggeometrymonthsrepresentationssymmetry

0 comments

read the original abstract

The internal representations learned by language models consistently exhibit striking geometric structure: calendar months organize into a circle, historical years form a smooth one-dimensional manifold, and cities' latitudes and longitudes can be decoded using a linear probe. To explain this neural code, we first show that language statistics exhibit translation symmetry (for example, the frequency with which any two months co-occur in text depends only on the time interval between them). We prove that this symmetry governs these geometric structures in high-dimensional word embedding models, and we analytically derive the manifold geometry of word representations. These predictions empirically match large text embedding models and large language models. Moreover, the representational geometry persists at moderate embedding dimension even when the relevant statistics are perturbed (e.g., by removing all sentences in which two months co-occur). We prove that this robustness emerges naturally when the co-occurrence statistics are controlled by an underlying latent variable. Our results indicate that these representational manifolds originate in the statistical symmetries of natural language.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 10 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence
cs.CL 2026-05 unverdicted novelty 7.0

Hierarchical concept geometry in embeddings emerges from the spectral properties of word co-occurrence statistics mirroring WordNet hypernym trees.
Uncovering Symmetry Transfer in Large Language Models via Layer-Peeled Optimization
math.OC 2026-05 conditional novelty 7.0

Symmetries in next-token prediction targets induce corresponding geometric symmetries such as circulant matrices and equiangular tight frames in the optimal weights and embeddings of a layer-peeled LLM surrogate model.
ToxiREX: A Dataset on Toxic REasoning in ConteXt
cs.CL 2026-06 unverdicted novelty 6.0

ToxiREX is a new dataset of 128k Reddit comments in six languages with hierarchical annotations for implicit toxicity in conversational context based on an existing reasoning schema.
Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate
cs.LG 2026-05 unverdicted novelty 6.0

A framework quantifies hyperparameter transfer via scaling-law fit quality, extrapolation robustness, and loss penalty, with ablations showing that μP's advantage over standard parameterization stems from maximizing t...
RSD: Moving Local Triangular Charts for Auditing Language-Model Hidden States
cs.CL 2026-05 unverdicted novelty 6.0

RSD fits shared three-anchor charts S_t to GPT-2 hidden states for target words, derives co-membership readouts M_t, and audits against WiC same-sense labels, passing 16 of 53 words as diagnostic coverage.
Convergent Evolution: How Different Language Models Learn Similar Number Representations
cs.CL 2026-04 unverdicted novelty 6.0

Diverse language models converge on similar periodic number features with a two-tier hierarchy of Fourier sparsity and geometric separability, acquired via language co-occurrences or multi-token arithmetic.
Probing for Representation Manifolds in Superposition
cs.LG 2026-05 unverdicted novelty 5.0

Introduces the Manifold Probe to discover representation manifolds in superposition and demonstrates causal steering on time concepts in Llama 2-7b.
RSD: Moving Local Triangular Charts for Auditing Language-Model Hidden States
cs.CL 2026-05 unverdicted novelty 5.0

RSD performs recursive binary neural decomposition on word embeddings to extract local semantic axes, with residuals used as qualitative diagnostics for ambiguous words in context tests.
Geometry of Human Perceptual Domains Emerges Transiently in LLM Representations
cs.AI 2026-05 unverdicted novelty 4.0

Perceptual geometry for color, pitch, emotion and taste emerges transiently in intermediate layers of transformer LLMs despite purely textual training.
There Will Be a Scientific Theory of Deep Learning
stat.ML 2026-04 unverdicted novelty 2.0

A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universa...