Geometric Organization of Cognitive States in Transformer Embedding Spaces

Sophie Zhao

arxiv: 2512.22227 · v2 · submitted 2025-12-23 · 💻 cs.CL · cs.LG

Geometric Organization of Cognitive States in Transformer Embedding Spaces

Sophie Zhao This is my paper

Pith reviewed 2026-05-16 20:29 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords sentence embeddingstransformer modelscognitive annotationsgeometric organizationlinear probespermutation testsenergy scores

0 comments

The pith

Transformer embeddings recover annotated cognitive energy scores and seven-tier labels above chance

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests whether sentence embeddings from transformers contain geometric structure aligned with human-annotated cognitive attributes. It assembles 480 natural-language sentences labeled with continuous energy scores from -5 to +5 and seven ordered tiers that progress from constricted or reactive states to more coherent and integrative ones. Linear probes applied to fixed embeddings from multiple models decode both the scores and the tiers, and nonparametric permutation tests that randomize the labels confirm performance exceeds chance under both regression and classification baselines. UMAP projections display a coherent low-to-high gradient, while confusion matrices show errors occur mostly between adjacent tiers.

Core claim

Fixed sentence embeddings from transformer models contain recoverable information about continuous energy scores and discrete seven-tier cognitive labels; linear and shallow nonlinear probes decode these annotations reliably, with performance exceeding randomized-label baselines in permutation tests.

What carries the argument

Linear and shallow nonlinear probes trained on fixed transformer sentence embeddings, evaluated against label-permutation null distributions and visualized with UMAP to show tier gradients.

If this is right

Both continuous energy scores and discrete tier labels are linearly decodable from the embeddings.
The organization forms a smooth gradient with predominantly adjacent-tier confusions.
The structure appears across multiple transformer models and survives basic linear readout.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the geometric alignment is real, training objectives in language models may implicitly induce representations ordered by cognitive coherence.
The same probing approach could be applied to other psychological or linguistic annotation schemes to test generality.
Disrupting the alignment through targeted fine-tuning would provide a direct test of whether the structure is a byproduct of pretraining.

Load-bearing premise

The human annotations of energy scores and seven-tier cognitive progression accurately capture stable cognitive attributes rather than surface linguistic features or annotator bias.

What would settle it

A permutation test in which probe accuracy on the true labels becomes statistically indistinguishable from accuracy on fully randomized labels would falsify the claim of significant geometric alignment.

Figures

Figures reproduced from arXiv: 2512.22227 by Sophie Zhao.

**Figure 2.** Figure 2: 3D UMAP visualization of sentence embeddings colored by energy scores. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Permutation test results. Vertical lines indicate observed probe performance. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Confusion matrices for tier classification across three embedding models. [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: 2D UMAP visualization of sentence embeddings colored by continuous energy scores. [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

read the original abstract

Recent work has shown that transformer-based language models learn rich geometric structure in their embedding spaces. In this work, we investigate whether sentence embeddings exhibit structured geometric organization aligned with human-interpretable cognitive or psychological attributes. We construct a dataset of 480 natural-language sentences annotated with both continuous energy scores (ranging from -5 to +5) and discrete tier labels spanning seven ordered cognitive annotation tiers, intended to capture a graded progression from highly constricted or reactive expressions toward more coherent and integrative cognitive states. Using fixed sentence embeddings from multiple transformer models, we evaluate the recoverability of these annotations via linear and shallow nonlinear probes. Across models, both continuous energy scores and tier labels are reliably decodable, with linear probes already capturing substantial structure. To assess statistical significance, we conduct nonparametric permutation tests that randomize labels, showing that probe performance exceeds chance under both regression and classification null hypotheses. Qualitative analyses using UMAP visualizations and tier-level confusion matrices further reveal a coherent low-to-high gradient and predominantly local (adjacent-tier) confusions. Together, these results indicate that transformer embedding spaces exhibit statistically significant geometric organization aligned with the annotated cognitive structure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a new 480-sentence dataset with energy scores and seven-tier cognitive labels that are linearly decodable from transformer embeddings above permutation-test chance, but the labels may simply track surface features already present in the embeddings.

read the letter

The main thing to know is that the authors built a dataset of 480 sentences annotated with continuous energy scores from -5 to +5 and seven ordered cognitive tiers, then showed that fixed sentence embeddings from multiple transformers recover both the scores and the tiers with linear probes at levels that beat randomized-label baselines. UMAP visualizations display a low-to-high gradient and confusion matrices show mostly adjacent-tier errors, which together give a coherent picture of geometric structure aligned with the annotations.

Referee Report

3 major / 2 minor

Summary. The paper claims that sentence embeddings from transformer models exhibit statistically significant geometric organization aligned with human-annotated cognitive attributes. On a dataset of 480 natural-language sentences labeled with continuous energy scores (-5 to +5) and seven ordered cognitive tiers, linear and shallow nonlinear probes recover both annotations above chance, as confirmed by permutation tests; UMAP visualizations show coherent low-to-high gradients and confusion matrices indicate predominantly adjacent-tier errors.

Significance. If the annotations prove independent of surface linguistic features, the result would indicate that transformer embedding spaces encode structured, human-interpretable representations of cognitive progression. This could inform interpretability research at the intersection of NLP and cognitive science. The use of multiple models, permutation baselines, and both regression and classification probes provides a reproducible empirical foundation, though the claim's strength depends on ruling out confounds.

major comments (3)

[Methods] Methods section: The manuscript provides no details on the specific transformer models (sizes, layers from which embeddings are taken), probe architectures (e.g., hidden-layer dimensions or activation functions for the shallow nonlinear probes), or data-splitting procedures (train/test ratios, stratification, or cross-validation). These omissions prevent assessment of reproducibility and potential overfitting.
[Results] Results and Analysis sections: No control analyses are reported to test whether energy scores or tier labels correlate with surface features such as sentence length, lexical sentiment, or polarity. Because such features are known to be linearly decodable in transformer embeddings, their presence would explain the observed probe accuracies and UMAP gradients without requiring a cognitive interpretation.
[Statistical Analysis] Statistical Analysis subsection: The permutation tests are described only qualitatively; the number of permutations, exact p-value distributions, and correction for multiple comparisons across models and probe types are not reported, weakening the ability to evaluate the strength of the 'statistically significant' claim.

minor comments (2)

[Abstract] Abstract: Include the number of models tested and a one-sentence summary of dataset construction for completeness.
[Figures] Figure captions: UMAP visualizations and confusion matrices would benefit from explicit color-bar labels and tier-number legends to improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which identifies key areas for improving the manuscript's clarity, reproducibility, and rigor. We address each major comment below and will incorporate the necessary revisions.

read point-by-point responses

Referee: [Methods] Methods section: The manuscript provides no details on the specific transformer models (sizes, layers from which embeddings are taken), probe architectures (e.g., hidden-layer dimensions or activation functions for the shallow nonlinear probes), or data-splitting procedures (train/test ratios, stratification, or cross-validation). These omissions prevent assessment of reproducibility and potential overfitting.

Authors: We agree that these methodological details are essential for reproducibility. In the revised manuscript, we will expand the Methods section to fully specify the transformer models used (including model names, sizes, and the exact layers from which sentence embeddings were extracted), describe the probe architectures in detail (including hidden-layer dimensions, activation functions, and training procedures for both linear and shallow nonlinear probes), and outline the data-splitting approach, including train/test ratios, any stratification by tier or energy score, and whether cross-validation was employed. revision: yes
Referee: [Results] Results and Analysis sections: No control analyses are reported to test whether energy scores or tier labels correlate with surface features such as sentence length, lexical sentiment, or polarity. Because such features are known to be linearly decodable in transformer embeddings, their presence would explain the observed probe accuracies and UMAP gradients without requiring a cognitive interpretation.

Authors: This is a valid and important concern. To strengthen the cognitive interpretation, the revised manuscript will include new control analyses examining correlations between the annotated energy scores and tier labels with surface features such as sentence length, lexical sentiment, and polarity. We will also report probe performance when trained solely on these surface features and compare it against performance on the cognitive annotations to evaluate whether the observed structure exceeds what surface features alone can explain. revision: yes
Referee: [Statistical Analysis] Statistical Analysis subsection: The permutation tests are described only qualitatively; the number of permutations, exact p-value distributions, and correction for multiple comparisons across models and probe types are not reported, weakening the ability to evaluate the strength of the 'statistically significant' claim.

Authors: We acknowledge that more quantitative details are required. In the revision, we will report the precise number of permutations performed for each test, provide summary statistics or visualizations of the p-value distributions under the null, and specify any corrections applied for multiple comparisons across models and probe types (regression and classification). revision: yes

Circularity Check

0 steps flagged

No significant circularity; results are empirical measurements

full rationale

The paper reports probe accuracies for recovering held-out human annotations (energy scores and seven-tier labels) from fixed transformer embeddings, with significance established via external permutation tests that randomize labels independently of the embeddings. No equations, derivations, or self-citations reduce the reported performance to the annotations by construction; the central claim is a statistical measurement of recoverability rather than a tautological prediction. The permutation baseline and held-out evaluation are independent of the fitted probe parameters.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on the assumption that the provided annotations constitute valid cognitive ground truth and that above-chance linear probe accuracy implies geometric organization in the embedding space; no free parameters beyond standard probe weights are introduced, and no new entities are postulated.

axioms (1)

domain assumption Human annotations of energy and tier labels reflect stable cognitive attributes independent of surface lexical cues.
Invoked when interpreting probe success as evidence of cognitive geometry rather than linguistic correlation.

pith-pipeline@v0.9.0 · 5484 in / 1226 out tokens · 27192 ms · 2026-05-16T20:29:55.797425+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We employ UMAP to examine the global geometry of embedding spaces, linear and shallow nonlinear probes to quantify the decodability of the annotated hierarchy, and nonparametric permutation tests to assess whether observed patterns can be explained by chance or surface lexical cues.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Across models, both continuous energy scores and tier labels are reliably decodable, with linear probes already capturing substantial structure.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages · 2 internal anchors

[1]

How contextual are contextualized word representations? InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP),

Kawin Ethayarajh. How contextual are contextualized word representations? InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP),

work page 2019
[2]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Sgpt: Gpt sentence embeddings for semantic search

Niklas Muennighoff, Nils Reimers, Andreas Rücklé, and Iryna Gurevych. Sgpt: Gpt sentence embeddings for semantic search. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 5881–5896. Association for Computational Linguistics,

work page 2022
[4]

Sentence-bert: Sentence embeddings using siamese bert-networks

Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing,

work page 2019
[5]

MINILM: Deep Self-Attention Distillation for Task- Agnostic Compression of Pre-Trained Transformers

Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers.arXiv preprint arXiv:2002.10957,

work page arXiv 2002
[6]

C-Pack: Packed Resources For General Chinese Embeddings

Liang Xiao et al. C-pack: Packaged resources to advance general chinese embedding.arXiv preprint arXiv:2309.07597,

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

How contextual are contextualized word representations? InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP),

Kawin Ethayarajh. How contextual are contextualized word representations? InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP),

work page 2019

[2] [2]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Sgpt: Gpt sentence embeddings for semantic search

Niklas Muennighoff, Nils Reimers, Andreas Rücklé, and Iryna Gurevych. Sgpt: Gpt sentence embeddings for semantic search. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 5881–5896. Association for Computational Linguistics,

work page 2022

[4] [4]

Sentence-bert: Sentence embeddings using siamese bert-networks

Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing,

work page 2019

[5] [5]

MINILM: Deep Self-Attention Distillation for Task- Agnostic Compression of Pre-Trained Transformers

Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers.arXiv preprint arXiv:2002.10957,

work page arXiv 2002

[6] [6]

C-Pack: Packed Resources For General Chinese Embeddings

Liang Xiao et al. C-pack: Packaged resources to advance general chinese embedding.arXiv preprint arXiv:2309.07597,

work page internal anchor Pith review Pith/arXiv arXiv