Mechanistic interpretability of emotion inference in large language models

Mechanistic Interpretability of Emotion Inference in Large Language Models · 2025 · arXiv 2502.05489

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Emotion Concepts and their Function in a Large Language Model

cs.AI · 2026-04-09 · unverdicted · novelty 7.0

Claude Sonnet 4.5 exhibits functional emotions via abstract internal representations of emotion concepts that causally influence its preferences and misaligned behaviors without implying subjective experience.

Relational Intervention During Functional Collapse in Large Language Models: A Lexical-Statistical Ablation and a Structure x Register Factorial

cs.AI · 2026-05-31 · unverdicted · novelty 6.0

A 2x2 factorial experiment on Qwen3.5-4B shows that relational structure and first-person register interact to drive behavioral persistence after functional collapse, while attention tracks lexical surprise and emotion probes track structure alone.

Faithful by Definition: Emotion Analysis via Natural Semantic Metalanguage Explications

cs.CL · 2026-07-01 · unverdicted · novelty 5.0

An NSM-based explication parser with fixed semantic rules produces emotion labels for events, achieving 0.33 accuracy on held-out crowd-sourced data while shifting empirical risk to an inspectable parser.

A Navigable Manifold of Hypothesized Consciousness-Spectrum States in Language Model Representations

cs.LG · 2026-06-04 · unverdicted · novelty 5.0

Language model embeddings encode a globally organized, navigable manifold corresponding to a consciousness-spectrum taxonomy, with trajectories moving from lower- to higher-level regions.

AIPsy-Affect: A Keyword-Free Clinical Stimulus Battery for Mechanistic Interpretability of Emotion in Language Models

cs.CL · 2026-04-26 · unverdicted · novelty 5.0

AIPsy-Affect supplies 480 keyword-free clinical vignettes and matched neutral controls for mechanistic interpretability studies of emotion in language models.

citing papers explorer

Showing 5 of 5 citing papers.

Emotion Concepts and their Function in a Large Language Model cs.AI · 2026-04-09 · unverdicted · none · ref 24
Claude Sonnet 4.5 exhibits functional emotions via abstract internal representations of emotion concepts that causally influence its preferences and misaligned behaviors without implying subjective experience.
Relational Intervention During Functional Collapse in Large Language Models: A Lexical-Statistical Ablation and a Structure x Register Factorial cs.AI · 2026-05-31 · unverdicted · none · ref 11
A 2x2 factorial experiment on Qwen3.5-4B shows that relational structure and first-person register interact to drive behavioral persistence after functional collapse, while attention tracks lexical surprise and emotion probes track structure alone.
Faithful by Definition: Emotion Analysis via Natural Semantic Metalanguage Explications cs.CL · 2026-07-01 · unverdicted · none · ref 11
An NSM-based explication parser with fixed semantic rules produces emotion labels for events, achieving 0.33 accuracy on held-out crowd-sourced data while shifting empirical risk to an inspectable parser.
A Navigable Manifold of Hypothesized Consciousness-Spectrum States in Language Model Representations cs.LG · 2026-06-04 · unverdicted · none · ref 13
Language model embeddings encode a globally organized, navigable manifold corresponding to a consciousness-spectrum taxonomy, with trajectories moving from lower- to higher-level regions.
AIPsy-Affect: A Keyword-Free Clinical Stimulus Battery for Mechanistic Interpretability of Emotion in Language Models cs.CL · 2026-04-26 · unverdicted · none · ref 9
AIPsy-Affect supplies 480 keyword-free clinical vignettes and matched neutral controls for mechanistic interpretability studies of emotion in language models.

Mechanistic interpretability of emotion inference in large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer