SASA replaces single-vector decoders in SAEs with learned subspaces plus block sparsity and nuclear-norm regularization, proving that a single group becomes the global minimizer once block size meets intrinsic dimension and yielding polynomial rather than exponential sample complexity.
Inference-time decomposition of activations (itda): A scalable approach to interpreting large language models.arXiv preprint arXiv:2505.17769,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2representative citing papers
ICALens applies an optimized ICA workflow to LLM activations and recovers compact interpretable directions that match or exceed public SAEs on SAEBench probing and perturbation tasks without per-layer dictionary training.
citing papers explorer
-
Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability
SASA replaces single-vector decoders in SAEs with learned subspaces plus block sparsity and nuclear-norm regularization, proving that a single group becomes the global minimizer once block size meets intrinsic dimension and yielding polynomial rather than exponential sample complexity.