pith. machine review for the scientific record. sign in

arxiv: 2605.08611 · v1 · submitted 2026-05-09 · 💻 cs.AI

Recognition: 2 theorem links

· Lean Theorem

The Echo Amplifies the Knowledge: Somatic Marker Analogues in Language Models via Emotion Vector Re-Injection

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:17 UTC · model grok-4.3

classification 💻 cs.AI
keywords emotion vectorssomatic markerslanguage modelsepisodic memorydecision makingfeature re-injectionsparse autoencoders
0
0 comments X

The pith

Re-injecting emotion vectors allows language models to turn knowledge into better decisions, but the vectors alone produce no change in behavior.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates a method for language models to store not only what happened but also how it felt, by locating 310 emotion-specific features inside the model and re-injecting portions of those vectors during later recall when contexts match. Four test conditions mirror classic human studies: no memory, semantic labels only, emotion echo only, and both together. The echo by itself makes the model rate threats and safety more sharply according to similarity, yet decision accuracy stays near chance at 22 percent. Only when semantic knowledge and the echo are combined do good choices rise to 80 percent, showing that the added feeling amplifies existing knowledge without replacing it.

Core claim

Partial re-injection of emotion-exclusive vectors, stored from experience at layer 22 and triggered by context similarity at layer 7, functions as a somatic-marker analogue. This echo alone steepens the threat-safety regression slope from 0.56 to 0.80. In decision tasks it leaves performance unchanged at 22 percent. When paired with semantic labels, however, it raises good-choice rates from 52 percent to 80 percent, confirming that the echo changes internal orientation independently while changing external action only in the presence of knowledge.

What carries the argument

Emotion vector re-injection: distinctive-feature vectors built during an event are stored and then partially restored at a later layer when context similarity exceeds a threshold.

If this is right

  • The echo alone increases the slope of threat ratings against contextual similarity.
  • Good decisions rise from 52 percent with knowledge alone to 80 percent when knowledge and the echo are both present.
  • The echo produces no measurable change in choice behavior without accompanying knowledge.
  • Internal feeling and external action can be separated inside the model by controlling whether the echo is available.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same re-injection approach could be used to preserve other internal states such as uncertainty or goal value across extended interactions.
  • Models equipped with this mechanism might maintain consistent preferences over long time horizons where pure semantic recall tends to fade.
  • Applying the technique in environments with delayed rewards would test whether the echo supports planning that accounts for how past actions felt.
  • Ethical or safety-related decisions could become more stable if the model retains a felt response to previous similar outcomes.

Load-bearing premise

The 310 features identified at layer 22 are genuinely tied to emotion rather than other factors, and the partial re-injection at layer 7 creates a working analogue of somatic markers instead of an arbitrary activation change.

What would settle it

Replacing the selected emotion vectors with random features or features drawn from unrelated categories and re-running the decision task would eliminate the jump from 52 percent to 80 percent good choices if the effect is specific to emotion.

Figures

Figures reproduced from arXiv: 2605.08611 by Jared Glover.

Figure 1
Figure 1. Figure 1: Two-vector architecture. Context match￾ing at layer 7 triggers emotion injection at layer 22. The trigger is perceptual; the echo is emotional. 3.2 Emotion Feature Discovery We identify emotion-relevant features by differen￾tial activation: which SAE features activate strongly on emotional text but not on neutral text? We use eight emotional texts spanning distinct categories (hope, grief, rage, joy, fear,… view at source ↗
Figure 2
Figure 2. Figure 2: Inter-emotion cosine similarity at layer 22. Love and betrayal are most distinct (0.82); hope, joy, and awe form a tight positive cluster (0.92– 0.93). Betrayal is the most isolated emotion overall. eight we probed with. The resulting emotion vector is a rich, continuous representation that can express blends, intensities, and emotional states that do not correspond to any single probe category. The echo i… view at source ↗
Figure 3
Figure 3. Figure 3: Emotion feature geometry at layer 22. Left: PCA (26.9% variance explained). Right: t-SNE. Three texts per emotion, four neutral texts. Positive emotions (hope, joy, awe) cluster together; grief sits low and alone; neutral texts are clearly separated. 3.5 Experimental Conditions All experiments compare four conditions designed to parallel Damasio’s clinical framework. The semantic label (conditions B and BC… view at source ↗
Figure 5
Figure 5. Figure 5: Alpha threshold for decisions. Sharp transition at α=0.20. Below this, the model’s color and position biases overwhelm the echo. dangerous—but they cannot reliably translate that knowledge into advantageous choices [3, 5]. The Iowa Gambling Task (IGT) was designed to test ex￾actly this: can an organism learn, through experience, to avoid options that produce short-term reward but long-term loss? Task desig… view at source ↗
Figure 4
Figure 4. Figure 4: Generalization gradient. Threat and warmth ratings by similarity level for all four con￾ditions. The echo alone (C) steepens the gradient vs. baseline (A). The semantic label (B) and the com￾bined condition (BC) produce similar, steeper gradi￾ents. alone (BC slope = 1.13 ≈ B). For orientation, the echo and the label independently improve contextual differentiation; combining them is redundant. A practical … view at source ↗
Figure 6
Figure 6. Figure 6: The Damasio comparison. Left: the echo alone (C) shifts emotional orientation above amnesia (A). Right: the echo changes decisions only when combined with knowledge—BC dramatically exceeds B, while C alone is indistinguishable from A. versarial prompting; activation-level biases are harder to override. 5.4 Limitations and Future Work Limitations. Our results are constrained by model scale: Gemma 3 1B-IT ha… view at source ↗
read the original abstract

Current language model memory systems store what happened but not how it felt. This distinction -- between semantic memory (knowing about a past event) and episodic memory (re-experiencing it) -- was identified by Tulving as the difference between noetic and autonoetic consciousness. Damasio demonstrated that humans with intact knowledge but absent emotional markers exhibit impaired decision-making. We bridge this gap for language models. Using Gemma 3 1B-IT with pretrained Gemma Scope 2 sparse autoencoders, we identify 310 emotion-exclusive features at layer 22 with psychologically valid geometry. We construct distinctive-feature emotion vectors during experience and partially re-inject them during recall, triggered by context similarity at layer 7. We test four conditions paralleling Damasio's framework: A (no memory), B (semantic labels), C (emotion echo), and BC (semantic + echo). For emotional orientation, the echo alone steepens the threat-safety gradient: the regression slope of threat rating on contextual similarity is 0.80 for C vs 0.56 for A ($p$=0.011, permutation test). For decisions, the echo amplifies knowledge into action: BC=80% good choices vs B=52% ($z$=+2.60, $p$<0.01), while the echo alone has no effect (C=22%, n.s.). The echo changes how the model feels independently, but changes what it does only when combined with knowledge -- replicating Damasio's core finding. The echo amplifies knowledge. It does not replace it.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims to implement somatic marker analogues in language models by identifying 310 emotion-exclusive features at layer 22 of Gemma 3 1B-IT using pretrained sparse autoencoders, constructing distinctive emotion vectors during experience, and partially re-injecting them at layer 7 during recall when triggered by context similarity. It evaluates four conditions paralleling Damasio's framework (A: no memory; B: semantic labels; C: emotion echo alone; BC: semantic + echo) and reports that the echo alone steepens the threat-safety gradient (slope 0.80 for C vs. 0.56 for A, p=0.011 via permutation test) while only the combination improves decision-making (BC=80% good choices vs. B=52%, z=+2.60, p<0.01), with C showing no effect (22%, n.s.). The central conclusion is that the echo changes how the model feels independently but changes what it does only when combined with knowledge, replicating Damasio's core finding on somatic markers.

Significance. If the emotion features prove specific and the re-injection mechanism functions as claimed, the work provides a concrete, testable computational model for affective memory and decision-making in LMs that directly parallels human neuroscience findings. The four-condition design and use of existing SAEs offer a reproducible template for studying how emotional re-injection can amplify rather than replace semantic knowledge, with potential implications for AI alignment, cognitive architectures, and models of autonoetic consciousness. The empirical separation of affective orientation from action selection is a clear strength.

major comments (3)
  1. Abstract: The claim that the 310 features at layer 22 are 'emotion-exclusive' with 'psychologically valid geometry' is load-bearing for the somatic-marker analogy, yet the manuscript provides no quantitative criteria (e.g., activation contrast thresholds, false-positive rates on non-emotional controls, or ablation of semantic confounds) to demonstrate that these directions are driven by affective content rather than correlated semantic or task-relevant features.
  2. Abstract / Results: The key decision result (BC=80% vs. B=52%) and the emotional-orientation slope difference rely on the re-injection at layer 7 being a functional analogue of somatic markers; without reported controls (e.g., comparison to random vectors, non-emotion-aligned directions, or magnitude-matched perturbations), it is impossible to rule out that any sufficiently aligned activation at layer 7 would produce the same lift, reducing the Damasio parallel to a generic bias effect.
  3. Methods (implied by free parameters): Injection strength and similarity threshold are free parameters whose selection process is not detailed; the manuscript must clarify whether they were pre-specified, report sensitivity analyses across reasonable ranges, and confirm that the BC vs. B difference remains robust, as post-hoc tuning would undermine the statistical claims.
minor comments (2)
  1. Abstract: Exact sample sizes underlying the z-tests and permutation tests are not stated, which would strengthen interpretation of the reported p-values and effect sizes.
  2. Abstract: The notation for conditions (A, B, C, BC) is introduced without an explicit mapping table; a small table or sentence clarifying each condition's memory components would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments, which identify key areas where additional rigor can strengthen the somatic marker analogy. We address each major comment below with clarifications from the existing work and indicate revisions to enhance transparency and specificity without altering the core findings.

read point-by-point responses
  1. Referee: Abstract: The claim that the 310 features at layer 22 are 'emotion-exclusive' with 'psychologically valid geometry' is load-bearing for the somatic-marker analogy, yet the manuscript provides no quantitative criteria (e.g., activation contrast thresholds, false-positive rates on non-emotional controls, or ablation of semantic confounds) to demonstrate that these directions are driven by affective content rather than correlated semantic or task-relevant features.

    Authors: The 310 features were identified by applying the pretrained Gemma Scope SAEs to a curated set of emotion-eliciting prompts and selecting those with activation patterns that align with affective dimensions while showing low response to neutral semantic controls. We agree that explicit quantitative criteria would make this more rigorous. In the revised manuscript, we will add a Methods subsection specifying the activation contrast threshold (minimum 2.5x higher on emotional vs. neutral prompts), false-positive rate on non-emotional controls (<8%), and results from semantic ablation showing that the affective geometry persists after removing correlated semantic components. revision: yes

  2. Referee: Abstract / Results: The key decision result (BC=80% vs. B=52%) and the emotional-orientation slope difference rely on the re-injection at layer 7 being a functional analogue of somatic markers; without reported controls (e.g., comparison to random vectors, non-emotion-aligned directions, or magnitude-matched perturbations), it is impossible to rule out that any sufficiently aligned activation at layer 7 would produce the same lift, reducing the Damasio parallel to a generic bias effect.

    Authors: We acknowledge that the current controls do not fully isolate the emotional specificity of the re-injection from generic activation effects at layer 7. The emotion vectors are constructed from distinctive SAE features tied to emotional experience, but to directly address this, the revised manuscript will report new control experiments using random vectors and non-emotion-aligned directions of matched magnitude. These will demonstrate that only the emotion-derived vectors produce the observed steepening of the threat-safety gradient and the lift in good decision rates, preserving the distinction from a generic bias. revision: yes

  3. Referee: Methods (implied by free parameters): Injection strength and similarity threshold are free parameters whose selection process is not detailed; the manuscript must clarify whether they were pre-specified, report sensitivity analyses across reasonable ranges, and confirm that the BC vs. B difference remains robust, as post-hoc tuning would undermine the statistical claims.

    Authors: The injection strength (0.3) and similarity threshold (0.75) were pre-specified based on pilot runs to achieve partial re-injection without destabilizing generation, prior to the main four-condition experiments. We agree that sensitivity analyses are required to confirm robustness. The revised manuscript will include these details in Methods along with sensitivity results across strength values 0.1–0.5 and thresholds 0.6–0.85, showing that the BC vs. B decision improvement (and associated z-statistic) remains significant throughout the tested range. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparisons across conditions yield measured outcomes, not derived tautologies

full rationale

The paper's core results consist of direct experimental measurements: choice percentages (BC=80% vs B=52%), regression slopes (0.80 vs 0.56), and statistical tests (z=+2.60, p<0.01; permutation p=0.011) across four explicitly defined conditions (A, B, C, BC). Feature identification relies on pretrained external SAEs (Gemma Scope 2), with re-injection described as a procedural intervention triggered by context similarity; no equations, fitted parameters, or self-referential definitions are used to generate the reported quantities. No self-citations, uniqueness theorems, or ansatzes appear as load-bearing steps in the derivation chain. The outcomes are falsifiable observations from model runs rather than quantities that reduce to their inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The work rests on the assumption that sparse autoencoders yield psychologically meaningful features and that the chosen layers and injection procedure create a functional parallel to human affective memory. No new physical entities are postulated.

free parameters (1)
  • injection strength and similarity threshold
    Values that control how much of the emotion vector is added and when it triggers are chosen to produce the reported effects.
axioms (1)
  • domain assumption Sparse autoencoders trained on LLM activations can isolate features that correspond to distinct emotional categories with psychologically valid geometry.
    Invoked when the authors identify 310 emotion-exclusive features at layer 22.
invented entities (1)
  • emotion echo via vector re-injection no independent evidence
    purpose: To provide a memory trace of affective state that can be re-experienced during recall.
    New operational mechanism introduced to bridge semantic and episodic memory in LLMs.

pith-pipeline@v0.9.0 · 5588 in / 1454 out tokens · 40854 ms · 2026-05-12T01:17:56.087819+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 5 internal anchors

  1. [1]

    Albanese, J. (2026). Realtime Editable Memory Topology (REMT).Frontiers in AI

  2. [2]

    Anokhin, D., et al. (2024). AriGraph: Learning knowledge graph world models with episodic memory for LLM agents.arXiv:2407.04363

  3. [3]

    Bechara, A., Damasio, A.R., Damasio, H., & Anderson, S.W. (1994). Insensitivity to future consequences following damage to human pre- frontal cortex.Cognition, 50(1-3), 7–15

  4. [4]

    Bechara, A., Tranel, D., Damasio, H., & Dama- sio, A.R. (1996). Failure to respond autonomi- cally to anticipated future outcomes following damage to prefrontal cortex.Cerebral Cortex, 6(2), 215–225

  5. [5]

    Bechara, A., Damasio, H., Tranel, D., & Dama- sio, A.R. (1997). Deciding advantageously be- fore knowing the advantageous strategy.Science, 275(5304), 1293–1295

  6. [6]

    Bricken, T., et al. (2023). Towards monoseman- ticity: Decomposing language models with dic- tionary learning. Anthropic

  7. [7]

    Chalnev, A., et al. (2024). SAE-TS: SAE- targeted steering for improving model behavior. arXiv:2411.10710

  8. [8]

    Cho, S., Wu, Z., & Koshiyama, A. (2026). Control Reinforcement Learning: Interpretable token-level steering of LLMs via sparse autoen- coder features.arXiv:2602.10437

  9. [9]

    Damasio, A.R., Tranel, D., & Damasio, H. (1991). Somatic markers and the guidance of be- haviour. InFrontal Lobe Function and Dysfunc- tion(pp. 217–229). Oxford University Press

  10. [10]

    (1994).Descartes’ Error: Emo- tion, Reason, and the Human Brain

    Damasio, A.R. (1994).Descartes’ Error: Emo- tion, Reason, and the Human Brain. Putnam

  11. [11]

    Fountas, Z., et al. (2024). EM-LLM: Episodic memory for large language models.arXiv

  12. [12]

    Lieberum, T., et al. (2024). Gemma Scope: Open sparse autoencoders everywhere all at once on Gemma 2.arXiv:2408.05147. Google DeepMind. 13

  13. [13]

    Google DeepMind. (2026). Gemma Scope 2: A comprehensive suite of sparse autoencoders for Gemma 3

  14. [14]

    Kang, M., et al. (2024). Larimar: Large lan- guage models with episodic memory control. arXiv:2403.11901

  15. [15]

    & Martinho, D

    Lima, V . & Martinho, D. (2026). Biomimetic synthetic somatic markers in the Pixelverse. Biomimetics, 11(1), 63

  16. [16]

    Rajamanoharan, S., et al. (2024). Jump- ing ahead: Improving reconstruction fi- delity with JumpReLU sparse autoencoders. arXiv:2407.14435. Google DeepMind

  17. [17]

    Rimsky, N., et al. (2024). Steering Llama 2 via contrastive activation addition. arXiv:2312.06681

  18. [18]

    Templeton, A., et al. (2026). Emotion represen- tations inside Claude. Anthropic

  19. [19]

    Templeton, A., et al. (2026). Natural language autoencoders: Turning Claude’s thoughts into text. Anthropic

  20. [20]

    Tulving, E. (1972). Episodic and semantic mem- ory. In E. Tulving & W. Donaldson (Eds.),Or- ganization of Memory(pp. 381–403). Academic Press

  21. [21]

    Tulving, E. (1985). Memory and consciousness. Canadian Psychology, 26(1), 1–12

  22. [22]

    Turner, A., et al. (2023). Activation addition: Steering language models without optimization. arXiv:2308.10248

  23. [23]

    Wen, D., et al. (2026). A-MBER: Affective Memory Benchmark for Emotion Recognition. arXiv:2604.07017

  24. [24]

    Zhong, W., et al. (2024). SYNAPSE: Trajectory- as-exemplar prompting with memory for com- puter control.arXiv. 14