arxiv: 2604.11050 · v1 · submitted 2026-04-13 · 💻 cs.CL · cs.AI

Recognition: unknown

Shared Emotion Geometry Across Small Language Models: A Cross-Architecture Study of Representation, Behavior, and Methodological Confounds

Jihoon Jeong

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:07 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords small language modelsemotion representationrepresentational similarity analysisRLHF effectscross-architecture comparisonmethodological confoundsemotion geometryRDM correlations

0 comments

The pith

Mature small language models from different architectures share nearly identical 21-emotion geometries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that five mature small language model architectures converge on the same geometry for 21 emotions, shown by high Spearman correlations between their raw cosine representational dissimilarity matrices. This shared structure holds even when the models display opposite behavioral tendencies on compliance facets, indicating that behavioral differences arise from processing above the emotion layer. The study finds that reinforcement learning from human feedback reorganizes emotion representations only in immature models, while mature families remain stable between base and instruct versions. Prior comparisons of emotion vectors across methods are confounded by four distinct effects that must be separated for valid interpretation.

Core claim

We extract 21-emotion vector sets from twelve small language models under a unified comprehension-mode pipeline at fp16 precision and compare the resulting geometries via representational similarity analysis on raw cosine RDMs. The five mature architectures share nearly identical 21-emotion geometry, with pairwise RDM Spearman correlations of 0.74-0.92. This universality persists across diametrically opposed behavioral profiles: Qwen 2.5 and Llama 3.2 occupy opposite poles of MTI Compliance facets yet produce nearly identical emotion RDMs (rho = 0.81), so behavioral facet differences arise above the shared emotion representation. Gemma-3 1B base exhibits extreme residual-stream anisotropy (0

What carries the argument

21-emotion vector sets extracted via a unified comprehension-mode pipeline at fp16 precision, compared through representational similarity analysis on cosine RDMs

If this is right

Emotion geometries in mature models remain stable between base and instruct versions with RDM correlations of at least 0.92.
Behavioral differences between models arise from layers above the shared emotion representation.
Only immature models undergo geometric restructuring from RLHF.
Cross-study comparisons of emotion vectors require decomposing method effects into coarse dissociation, sub-parameter sensitivity, precision effects, and cross-experiment biases.
The shared geometry is largely determined by pretraining rather than later fine-tuning in mature architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This convergence may imply that emotion geometry emerges as a stable byproduct of next-token prediction on large text corpora.
The finding could extend to testing whether the same geometry appears in models trained on non-English data or different tokenizers.
Researchers could examine if the shared emotion RDMs predict similar patterns of emotional bias or response in downstream applications across model families.
The four-layer decomposition of method effects suggests that single-pipeline studies risk misattributing similarities between any two models.

Load-bearing premise

The unified comprehension-mode pipeline at fp16 precision extracts emotion vectors in a comparable way across architectures without introducing architecture-specific biases that could artifactually inflate the reported similarities.

What would settle it

Extracting emotion vectors from a new mature-architecture model using the same pipeline and observing RDM Spearman correlations below 0.7 with the existing mature set, or finding that an immature model shows no geometric restructuring after RLHF.

Figures

Figures reproduced from arXiv: 2604.11050 by Jihoon Jeong.

**Figure 1.** Figure 1: Cross-model 21-emotion vector geometry across twelve small language models. 12 × 12 RDM-of-RDMs matrix. Each cell shows the Spearman correlation between the off-diagonal entries of two models’ 21 × 21 cosine RDMs at their respective best layers (Section 2.5). Models are grouped by family with base and instruct variants adjacent. The five mature architectures form a single high-correlation block (ρ = 0.74–0… view at source ↗

**Figure 2.** Figure 2: Behavioral–representational dissociation: Qwen 2.5 × Llama 3.2 case study. Side-byside raw 21 × 21 cosine distance RDMs (rows and columns ordered alphabetically by emotion label) for Qwen 2.5 1.5B Instruct (left) and Llama 3.2 3B Instruct (right), each computed at the respective model’s best layer (Qwen layer 15 of 28, anisotropy 0.848; Llama 3.2 layer 11 of 28, anisotropy 0.627). The two models occupy op… view at source ↗

**Figure 4.** Figure 4: Tier 1 cross-model RDM alignment by pair group (instruct-only). Boxplot of pairwise RDM Spearman correlations across the C(6,2) = 15 instruct model pairs in our dataset, grouped by the size class and family composition of each pair. Original Tier 1 (n = 3) comprises the three sub-4B-parameter mature pairs — Qwen 2.5 × SmolLM2, Qwen 2.5 × Llama 3.2, SmolLM2 × Llama 3.2 — with mean ρ = 0.825 (range 0.813–0.8… view at source ↗

**Figure 3.** Figure 3: Representation maturity gradient scales with model size (n = 12). Scatter plots showing three of the four size–geometry correlations of Section 3.5, with each of the twelve models plotted as a single point. (a) Anisotropy at best layer vs hidden dimension d_model: Spearman ρ = −0.882, p = 0.0001. (b) RDM standard deviation at best layer vs parameter count (B): ρ = −0.891, p = 0.0001. (c) Best-layer percent… view at source ↗

**Figure 5.** Figure 5: Four-layer decomposition of the apparent comprehension-vs-generation method effect: 4-way pipeline comparison matrix. For each of Mistral 7B Instruct (top row) and Llama 3.1 8B Instruct (bottom row), the 4 × 4 cell grid shows the Spearman correlation between the 21 × 21 cosine RDMs produced by four pipeline configurations: A = fp16 comprehension, B = fp16 generation with Jeong (2026a)’s original sub-parame… view at source ↗

**Figure 6.** Figure 6: Four-layer decomposition as grouped bar chart. Per-model Spearman ρ for the four isolating contrasts of [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗

read the original abstract

We extract 21-emotion vector sets from twelve small language models (six architectures x base/instruct, 1B-8B parameters) under a unified comprehension-mode pipeline at fp16 precision, and compare the resulting geometries via representational similarity analysis on raw cosine RDMs. The five mature architectures (Qwen 2.5 1.5B, SmolLM2 1.7B, Llama 3.2 3B, Mistral 7B v0.3, Llama 3.1 8B) share nearly identical 21-emotion geometry, with pairwise RDM Spearman correlations of 0.74-0.92. This universality persists across diametrically opposed behavioral profiles: Qwen 2.5 and Llama 3.2 occupy opposite poles of MTI Compliance facets yet produce nearly identical emotion RDMs (rho = 0.81), so behavioral facet differences arise above the shared emotion representation. Gemma-3 1B base, the one immature case in our dataset, exhibits extreme residual-stream anisotropy (0.997) and is restructured by RLHF across all geometric descriptors, whereas the five already-mature families show within-family base x instruct RDM correlations of rho >= 0.92 (Mistral 7B v0.3 at rho = 0.985), suggesting RLHF restructures only representations that are not yet organized. Methodologically, we show that what prior work has read as a single comprehension-vs-generation method effect in fact decomposes into four distinct layers -- a coarse method-dependent dissociation, robust sub-parameter sensitivity within generation, a true precision (fp16 vs INT8) effect, and a conflated cross-experiment bias that distorts in opposite directions for different models -- so that a single rho between two prior emotion-vector studies is not a safe basis for interpretation without the layered decomposition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Mature small models converge on similar 21-emotion geometries despite behavioral differences, and the four-layer confound breakdown is the clearest new piece.

read the letter

The main point is that five mature architectures (Qwen, SmolLM, Llama variants, Mistral) produce nearly identical cosine RDMs for the same 21 emotions, with Spearman correlations 0.74-0.92, and this holds even when the models sit at opposite ends of behavioral compliance measures. The paper also shows that RLHF only reshapes geometry in the one immature case (Gemma), while mature families stay stable from base to instruct. That separation between representation and behavior is the cleanest empirical result here. The four-layer decomposition of prior method effects is genuinely useful: it splits what looked like one comprehension-versus-generation difference into coarse dissociation, sub-parameter sensitivity, precision effects, and cross-experiment bias. Anyone who has tried to compare emotion vectors across papers will see why a single rho is not enough. The work is narrow by design—small models only, fixed 21 emotions, fp16 comprehension pipeline—but the controls on base versus instruct and the explicit anisotropy check for Gemma give it more structure than most single-model RSA studies. The soft spot is the extraction pipeline itself. The high cross-model correlations could still be inflated if tokenizer differences or residual-stream normalization are not fully neutralized, and the abstract does not show explicit ablation on those points. The stress-test concern lands because the universality claim depends on the vectors being extracted from functionally comparable positions. If that assumption slips, the geometry similarity shrinks from evidence of convergence to evidence of shared measurement artifact. This paper is for people doing interpretability work on small open models who care about representational similarity methods. It is not broad enough for general alignment audiences, but the confound decomposition and the behavior-representation split are concrete enough that a methods-focused reader can use them directly. It deserves peer review because the empirical design is reproducible in principle and the new decomposition addresses a real gap in how prior results have been read.

Referee Report

2 major / 2 minor

Summary. The manuscript extracts 21-emotion vector sets from twelve small language models (six architectures, base/instruct variants, 1B-8B parameters) under a single comprehension-mode pipeline at fp16 precision. It compares the resulting geometries via representational similarity analysis on raw cosine RDMs, reporting that five mature architectures (Qwen 2.5 1.5B, SmolLM2 1.7B, Llama 3.2 3B, Mistral 7B v0.3, Llama 3.1 8B) share nearly identical emotion geometry with pairwise RDM Spearman correlations of 0.74-0.92. This similarity holds across opposed behavioral profiles (e.g., Qwen 2.5 and Llama 3.2 at rho=0.81). Gemma-3 1B (immature) shows extreme residual anisotropy (0.997) and is restructured by RLHF, while mature families show high base-instruct stability (rho >= 0.92). The paper further decomposes prior method confounds into four layers: coarse method dissociation, sub-parameter sensitivity, precision effects, and cross-experiment bias.

Significance. If the results hold after addressing pipeline controls, the work would demonstrate architecture-independent 21-emotion geometry in mature SLMs, with behavioral differences arising above this shared representation. The four-layer confound decomposition is a clear methodological strength, providing a falsifiable framework for interpreting prior RSA studies and avoiding over-reliance on single rho values. Direct empirical RDM comparisons and within-family base/instruct correlations (e.g., Mistral at rho=0.985) add reproducibility value. Significance is limited by the absence of explicit bias controls, which could affect the universality claim.

major comments (2)

[Abstract / Methods] Abstract / Methods (unified pipeline description): The central universality claim for the five mature architectures depends on the fp16 comprehension pipeline producing functionally equivalent emotion vectors across models. No controls are reported for tokenizer differences, layer selection/normalization, or residual anisotropy (explicitly noted only for Gemma). High RDM Spearman values (0.74-0.92) could therefore arise from shared extraction artifacts rather than intrinsic geometric convergence, even after the four-layer decomposition. Explicit ablation or normalization steps are needed to secure this link.
[Abstract] Abstract (mature vs. immature distinction): The classification of models as 'mature' (high base-instruct stability, low anisotropy) versus 'immature' (Gemma) appears derived from the observed geometric outcomes rather than a pre-specified criterion. This risks post-hoc interpretation when claiming that RLHF restructures only immature representations. A clearer a priori definition or additional models would make the distinction load-bearing for the RLHF restructuring claim.

minor comments (2)

[Abstract] The abstract packs multiple distinct claims (geometry universality, behavioral independence, RLHF effects, and four-layer decomposition) into one paragraph; separating the methodological contribution would improve readability.
The 21 emotions are treated as a fixed set; briefly noting their selection rationale or source (even if standard) would clarify this free parameter.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments identify important methodological considerations for strengthening the universality claim and the mature/immature distinction. We respond to each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract / Methods] Abstract / Methods (unified pipeline description): The central universality claim for the five mature architectures depends on the fp16 comprehension pipeline producing functionally equivalent emotion vectors across models. No controls are reported for tokenizer differences, layer selection/normalization, or residual anisotropy (explicitly noted only for Gemma). High RDM Spearman values (0.74-0.92) could therefore arise from shared extraction artifacts rather than intrinsic geometric convergence, even after the four-layer decomposition. Explicit ablation or normalization steps are needed to secure this link.

Authors: We acknowledge that the unified pipeline, while consistent in prompt template and precision, does not include explicit ablations for tokenizer effects or layer-specific normalization, and reports residual anisotropy only for Gemma. The four-layer confound decomposition and the observation that high correlations persist across models with divergent tokenizers and opposed behavioral profiles provide supporting evidence that the geometry is not solely artifactual. Nevertheless, to directly address the concern, we will revise the Methods and Results sections to add: (i) residual anisotropy metrics for all twelve models, (ii) a sensitivity analysis comparing last-layer versus averaged final-layer vectors, and (iii) vector normalization prior to RDM computation, with the corresponding RDM correlations reported in a new supplementary table. These controls will be presented as robustness checks rather than altering the primary findings. revision: yes
Referee: [Abstract] Abstract (mature vs. immature distinction): The classification of models as 'mature' (high base-instruct stability, low anisotropy) versus 'immature' (Gemma) appears derived from the observed geometric outcomes rather than a pre-specified criterion. This risks post-hoc interpretation when claiming that RLHF restructures only immature representations. A clearer a priori definition or additional models would make the distinction load-bearing for the RLHF restructuring claim.

Authors: We agree that the mature/immature label was informed by the observed base-instruct stability and anisotropy values, creating a potential circularity for the RLHF claim. In the revision we will replace the post-hoc framing with an a priori operational definition: models are classified as mature if they belong to families with documented multi-stage post-training and exceed 1B parameters (with Gemma-3 1B noted as the exception due to its limited training scale). We will also add an explicit limitations paragraph stating that the RLHF-restructuring observation is based on a single immature case and requires replication with additional immature models before it can be treated as general. This change preserves the empirical pattern while removing the risk of post-hoc interpretation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on direct empirical RDM comparisons without reduction to inputs by construction.

full rationale

The paper extracts 21-emotion vectors from multiple models under a single described pipeline, computes raw cosine RDMs, and reports Spearman correlations (0.74-0.92) as observed empirical outcomes. No equations or steps define a quantity in terms of itself, fit parameters on a subset then relabel the fit as a prediction, or invoke self-citations to establish uniqueness or force an ansatz. The methodological decomposition into four confound layers is likewise presented as an empirical finding from the data rather than a definitional necessity. The central universality claim is therefore a measured pattern, not a tautology or self-referential derivation.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

This empirical study introduces no new theoretical entities. It relies on standard assumptions in representational analysis and experimental choices for models and emotions as free parameters.

free parameters (2)

Selection of 21 emotions
The specific set of emotions chosen for vector extraction is a modeling choice that defines the geometry being compared.
Model selection across 12 specific models
Which architectures and sizes are included affects the universality claim and the mature/immature distinction.

axioms (1)

domain assumption Cosine similarity is an appropriate measure for comparing emotion vectors in residual streams.
Used in constructing the RDMs for Spearman correlations.

pith-pipeline@v0.9.0 · 5656 in / 1590 out tokens · 62129 ms · 2026-05-10T16:07:07.651934+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages · 7 internal anchors

[1]

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Werra, L., & Wolf, T. (2025).SmolLM2: When Smol Goes Big — Data-Centric Training of a Small Language Model.arXiv:2502.02737. Anthropic. (2026).Emotion Concepts and their Function in a Large Language Model.Trans- former Circuits Thread. https://transformer-circuits.pub/2026/emotions/

work page internal anchor Pith review arXiv 2025
[2]

The Llama 3 Herd of Models

Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., et al. (2024).The Llama 3 Herd of Models.arXiv:2407.21783. Gemma Team. (2025).Gemma 3 Technical Report.arXiv:2503.19786

work page internal anchor Pith review Pith/arXiv arXiv 2024
[3]

Extracting and Steering Emotion Representations in Small Language Models: A Methodological Comparison

Jeong, J. (2026a).Extracting and Steering Emotion Representations in Small Language Models: A Methodological Comparison.arXiv:2604.04064

work page internal anchor Pith review Pith/arXiv arXiv
[4]

(2026b).MTI: A Behavior-Based Temperament Profiling System for AI Agents

Jeong, J. (2026b).MTI: A Behavior-Based Temperament Profiling System for AI Agents. arXiv:2604.02145

work page arXiv
[5]

Mistral 7B

Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Le Scao, T., Lavril, T., Wang, T., Lacroix, T., & El Sayed, W. (2023).Mistral 7B. arXiv:2310.06825

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

Meng, K., Bau, D., Andonian, A., and Belinkov, Y

Lin, B. Y., Ravichander, A., Lu, X., Dziri, N., Sclar, M., Chandu, K., Bhagavatula, C., & Choi, Y. (2024). The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learn- ing. InInternational Conference on Learning Representations (ICLR).arXiv:2312.01552

work page arXiv 2024
[7]

Mu, J., Bhat, S., & Viswanath, P. (2018). All-but-the-Top: Simple and Effective Postprocessing for Word Representations. InInternational Conference on Learning Representations (ICLR). arXiv:1702.01417

work page arXiv 2018
[8]

The Linear Representation Hypothesis and the Geometry of Large Language Models

Park, K., Choe, Y. J., & Veitch, V. (2024). The Linear Representation Hypothesis and the Geometry of Large Language Models. InProceedings of the 41st International Conference on Machine Learning (ICML), PMLR 235:39643–39666. arXiv:2311.03658

work page internal anchor Pith review arXiv 2024
[9]

Rimsky, N., Gabrieli, N., Schulz, J., Tong, M., Hubinger, E., & Turner, A. M. (2024). Steering Llama 2 via Contrastive Activation Addition. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL), pages 15504–15522. arXiv:2312.06681

work page internal anchor Pith review arXiv 2024
[10]

Qwen2.5 Technical Report

Yang, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Li, C., Liu, D., Huang, F., Wei, H., et al. (2025).Qwen2.5 Technical Report.arXiv:2412.15115

work page internal anchor Pith review Pith/arXiv arXiv 2025
[11]

arXiv preprint arXiv:2305.11206 , year=

Ghosh, G., Lewis, M., Zettlemoyer, L., &Levy, O.(2023). LIMA:LessIsMoreforAlignment. InAdvances in Neural Information Processing Systems (NeurIPS).arXiv:2305.11206. 34

work page arXiv 2023