From superposition to sparse codes: interpretable representations in neural networks , shorttitle =

Klindt, D · 2025 · arXiv 2503.01824

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

Expander Sparse Autoencoders: Parameter-Efficient Dictionaries for Mechanistic Interpretability

cs.LG · 2026-07-02 · conditional · novelty 8.0

Expander SAEs apply left-d-regular expander masks to TopK SAEs, learning only dn decoder parameters instead of mn and tracing a storage-fidelity frontier that reaches 293x compression with 84% retained performance on Qwen2.5-3B.

When Does LeJEPA Learn a World Model?

stat.ML · 2026-05-25 · unverdicted · novelty 8.0

LeJEPA achieves linear identifiability of latent variables uniquely when the latents are Gaussian in worlds with stationary additive-noise transitions.

Structuring Sparsity: Block-Sparse Featurizers Capture Visual Concept Manifolds

cs.CV · 2026-06-23 · unverdicted · novelty 7.0

Block-sparse featurizers recover visual concepts as two- to four-dimensional manifolds and describe activations more compactly than direction-based methods via minimum-description-length comparison.

cs.CV · 2026-05-26 · unverdicted · novelty 6.0

SRF factorizes similarity matrices into low-dimensional non-negative interpretable dimensions, shown to work on sparse data and match task-specific models across simulations and real datasets.

Probing for Representation Manifolds in Superposition

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

Introduces the Manifold Probe to discover representation manifolds in superposition and demonstrates causal steering on time concepts in Llama 2-7b.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

From superposition to sparse codes: interpretable representations in neural networks , shorttitle =

fields

years

verdicts

representative citing papers

citing papers explorer