Expander SAEs apply left-d-regular expander masks to TopK SAEs, learning only dn decoder parameters instead of mn and tracing a storage-fidelity frontier that reaches 293x compression with 84% retained performance on Qwen2.5-3B.
From superposition to sparse codes: interpretable representations in neural networks , shorttitle =
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5representative citing papers
LeJEPA achieves linear identifiability of latent variables uniquely when the latents are Gaussian in worlds with stationary additive-noise transitions.
Block-sparse featurizers recover visual concepts as two- to four-dimensional manifolds and describe activations more compactly than direction-based methods via minimum-description-length comparison.
SRF factorizes similarity matrices into low-dimensional non-negative interpretable dimensions, shown to work on sparse data and match task-specific models across simulations and real datasets.
Introduces the Manifold Probe to discover representation manifolds in superposition and demonstrates causal steering on time concepts in Llama 2-7b.
citing papers explorer
No citing papers match the current filters.