Towards monosemanticity: Decomposing language mod- els with dictionary learning.Transformer Circuits Thread

Trenton Bricken, Adly Templeton, Joshua Batson, Brian Chen, Adam Jermyn, Tom Conerly, Nick Turner, Cem Anil, Carson Denison, Amanda Askell, Robert Lasenby, Yifan Wu, Shauna Krave

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Beyond Semantics: Disentangling Information Scope in Sparse Autoencoders for CLIP

cs.CV · 2026-04-07 · unverdicted · novelty 7.0

The paper proposes information scope as a new interpretability axis for SAE features in CLIP and introduces the Contextual Dependency Score to separate local from global scope features, showing they influence model predictions differently.

SPG: Sparse-Projected Guides with Sparse Autoencoders for Zero-Shot Anomaly Detection

cs.CV · 2026-04-03 · unverdicted · novelty 7.0

SPG uses sparse autoencoders to learn guide coefficients that generate normal and anomalous reference vectors, achieving competitive zero-shot anomaly detection and strong segmentation on MVTec AD and VisA without target adaptation.

citing papers explorer

Showing 2 of 2 citing papers.

Beyond Semantics: Disentangling Information Scope in Sparse Autoencoders for CLIP cs.CV · 2026-04-07 · unverdicted · none · ref 4
The paper proposes information scope as a new interpretability axis for SAE features in CLIP and introduces the Contextual Dependency Score to separate local from global scope features, showing they influence model predictions differently.
SPG: Sparse-Projected Guides with Sparse Autoencoders for Zero-Shot Anomaly Detection cs.CV · 2026-04-03 · unverdicted · none · ref 3
SPG uses sparse autoencoders to learn guide coefficients that generate normal and anomalous reference vectors, achieving competitive zero-shot anomaly detection and strong segmentation on MVTec AD and VisA without target adaptation.

Towards monosemanticity: Decomposing language mod- els with dictionary learning.Transformer Circuits Thread

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer