Towards monosemanticity: Decomposing language models with dictionary learning

Trenton Bricken, Adly Templeton, Joshua Batson, Brian Chen, Adam Jermyn, Tom Conerly, Nicholas L Turner, Cem Anil, Carson Denison, Amanda Askell, Robert Lasenby, Yifan Wu, Shauna Kravec, Nicholas Schiefer, Tim Maxwell, Nicholas Joseph, Alex · 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Mechanistic Interpretability of ASR models using Sparse Autoencoders

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

Sparse autoencoders applied to Whisper ASR reveal monosemantic features across linguistic boundaries and demonstrate cross-lingual feature steering.

citing papers explorer

Showing 1 of 1 citing paper.

Mechanistic Interpretability of ASR models using Sparse Autoencoders cs.CL · 2026-05-12 · unverdicted · none · ref 2
Sparse autoencoders applied to Whisper ASR reveal monosemantic features across linguistic boundaries and demonstrate cross-lingual feature steering.

Towards monosemanticity: Decomposing language models with dictionary learning

fields

years

verdicts

representative citing papers

citing papers explorer