Aligned training reparameterizes SAEs to enforce unit inner product between encoder and decoder directions, eliminating dead features and enhancing stability without hyperparameters.
Taking features out of superposition with sparse autoencoders.Alignment Forum, 2023
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Aligned Training: A Parameter-Free Method to Improve Feature Quality and Stability of Sparse Autoencoders (SAE)
Aligned training reparameterizes SAEs to enforce unit inner product between encoder and decoder directions, eliminating dead features and enhancing stability without hyperparameters.