Transformer Circuits Thread , year=

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning , author=

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

citation-role summary

method 2

citation-polarity summary

use method 2

representative citing papers

WriteSAE: Sparse Autoencoders for Recurrent State

cs.LG · 2026-05-12 · unverdicted · novelty 8.0 · 3 refs

WriteSAE introduces sparse autoencoders with rank-1 matrix atoms for recurrent state updates, allowing replacement tests that outperform deletion on 92.4% of positions and a formula predicting logit changes with R²=0.98.

Concepts Whisper While Syntax Shouts: Spectral Anti-Concentration and the Dual Geometry of Transformer Representations

cs.LG · 2026-05-02 · unverdicted · novelty 7.0

Transformer activations show spectral anti-concentration for concepts in the tail while syntax prefers high-variance directions, forming a dual geometry.

Emergent Symbolic Structure in Health Foundation Models: Extraction, Alignment, and Cross-Modal Transfer

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Health foundation model embeddings contain an interpretable symbolic organization shared across modalities that supports cross-domain transfer without joint training.

Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

Sparse autoencoder analysis of PatchTST FFN activations shows sparse, stable representations with no empirical support for superposition on standard time series forecasting tasks.

Emergent Semantic Role Understanding in Language Models

cs.AI · 2026-05-09 · unverdicted · novelty 5.0

Semantic role understanding partially emerges during language model pre-training, with linear probes on frozen representations achieving substantial performance that improves with scale but does not match fine-tuned models, and representations shifting toward more distributed forms at larger scales.

citing papers explorer

Showing 5 of 5 citing papers.

WriteSAE: Sparse Autoencoders for Recurrent State cs.LG · 2026-05-12 · unverdicted · none · ref 1 · 3 links
WriteSAE introduces sparse autoencoders with rank-1 matrix atoms for recurrent state updates, allowing replacement tests that outperform deletion on 92.4% of positions and a formula predicting logit changes with R²=0.98.
Concepts Whisper While Syntax Shouts: Spectral Anti-Concentration and the Dual Geometry of Transformer Representations cs.LG · 2026-05-02 · unverdicted · none · ref 17
Transformer activations show spectral anti-concentration for concepts in the tail while syntax prefers high-variance directions, forming a dual geometry.
Emergent Symbolic Structure in Health Foundation Models: Extraction, Alignment, and Cross-Modal Transfer cs.LG · 2026-05-08 · unverdicted · none · ref 17
Health foundation model embeddings contain an interpretable symbolic organization shared across modalities that supports cross-domain transfer without joint training.
Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting cs.LG · 2026-05-06 · unverdicted · none · ref 9
Sparse autoencoder analysis of PatchTST FFN activations shows sparse, stable representations with no empirical support for superposition on standard time series forecasting tasks.
Emergent Semantic Role Understanding in Language Models cs.AI · 2026-05-09 · unverdicted · none · ref 42
Semantic role understanding partially emerges during language model pre-training, with linear probes on frozen representations achieving substantial performance that improves with scale but does not match fine-tuned models, and representations shifting toward more distributed forms at larger scales.

Transformer Circuits Thread , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer