Sparse autoencoders applied to Whisper ASR reveal monosemantic features across linguistic boundaries and demonstrate cross-lingual feature steering.
Towards monosemanticity: Decomposing language models with dictionary learning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Mechanistic Interpretability of ASR models using Sparse Autoencoders
Sparse autoencoders applied to Whisper ASR reveal monosemantic features across linguistic boundaries and demonstrate cross-lingual feature steering.