pith. sign in

arxiv: 2606.25234 · v1 · pith:WCHWDGAHnew · submitted 2026-06-23 · 💻 cs.CV

Structuring Sparsity: Block-Sparse Featurizers Capture Visual Concept Manifolds

Pith reviewed 2026-06-25 23:52 UTC · model grok-4.3

classification 💻 cs.CV
keywords block sparsityvisual concept manifoldsneural interpretabilitystructured sparsityactivation space geometrymanifold steeringdiffusion models
0
0 comments X

The pith

Block-sparse featurizers recover visual concepts as low-dimensional manifolds instead of isolated directions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes that treating concepts as single directions in neural activations overlooks their structure as low-dimensional manifolds. Block sparsity provides a prior that groups directions into blocks, allowing representations to be modeled as sparse sums of these manifolds. This aligns with the idea from visual neuroscience that features are carried by coordinated groups of neurons. Using minimum description length analysis, the authors demonstrate that block-sparse featurizers describe the activations more compactly, with most concepts spanning two to four dimensions. They apply this to reinterpret existing detectors, find new ones, and steer generation in diffusion models.

Core claim

Block sparsity, which groups directions into blocks, is the prior matched to a generative model in which a representation is a sparse sum of low-dimensional manifolds. This is the modern, learned form of a classical idea in visual neuroscience, where a visual feature is carried by a coordinated group of neurons rather than a single tuned one. All three variants of block-sparse featurizers describe activations more compactly than direction-based featurizers, with the recovered concepts typically two- to four-dimensional.

What carries the argument

Block-sparse featurizers that enforce sparsity over blocks of directions to model low-dimensional manifolds in activation space.

If this is right

  • Prior curve detectors in InceptionV1 are shown to read from a single continuous curve manifold.
  • New manifolds for shadows and lighting are discovered in DINOv3.
  • Manifold steering provides interpretable control over image generation in SDXL diffusion models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This suggests that interpretability methods should shift from individual directions to structured groups.
  • Similar structured sparsity approaches might improve analysis in non-visual domains.
  • Testing other block sizes or sparsity patterns could yield even more efficient descriptions of activations.

Load-bearing premise

The minimum-description-length comparison between block-sparse and direction-based featurizers directly measures which generative model better matches the true structure of the activations.

What would settle it

If direction-based featurizers were found to yield lower minimum description lengths than block-sparse featurizers across the tested models and layers, the superiority of the block-sparse approach would be challenged.

Figures

Figures reproduced from arXiv: 2606.25234 by Aiden Swann, Atticus Geiger, Can Rager, Curt Tigges, Daniel Wurgaft, Dron Hazra, Ekdeep Singh Lubana, Fenil Doshi, Jack Merullo, Lee Sharkey, Lucius Bushnaq, Matthew Kowal, Michael Pearce, Mozes Jacobs, Nick Cammarata, Owen Lewis, Satchel Grant, Siddharth Boppana, Tal Haklay, Thomas Fel, Thomas Icard, Thomas McGrath, Thomas Serre, Usha Bhalla, Vasudev Shyam.

Figure 1
Figure 1. Figure 1: Block-sparse featurizers (BSF) capture the internal geometry of concepts. Sparse autoencoders (SAEs) are a popular method for decomposing neural representations using isolated directions as primitive atoms. On the right, we show how two SAE features activate over the image of a rabbit where darker red means a higher activation; observe that one feature picks out the rabbit’s ears and the other its face. Ye… view at source ↗
Figure 2
Figure 2. Figure 2: Activation x as a sparse sum of points drawn from a few low￾dimensional regions Mi , x lives in a Minkowski sum of manifolds (Def. 1). There is a deep duality between the architecture of a fea￾turizer and the assumed geometry of neural representations (Hindupur et al., 2025). To design a new featurizer, we first hypothesize a data-generating process (DGP) based on recent work on representation manifolds (F… view at source ↗
Figure 3
Figure 3. Figure 3: The block soft-threshold. The shrinkage operator (applied on the norm of the group) against ReLU and JumpReLU (Rajamanoha￾ran et al., 2024). A few remarks follow. First, the penalty sees each block only through its norm ∥zg∥2 because the same point mg = zgDg can be written in any rotated frame, zgDg = (zgQ⊤)(QDg) for Q ∈ O(b), so a penalty intrinsic to the feature must be “blind” to the choice of basis, wh… view at source ↗
Figure 4
Figure 4. Figure 4: Block sparsity recovers an additive manifold superposition. A controlled instance of Def. 1: M known low-dimensional manifolds embedded in R d and summed |S| at a time (here six factors, one per row). The leftmost column is the ground-truth contribution mi ; each remaining column is the contribution recovered by a featurizer, projected into the true factor’s 3-D principal frame and colored by that frame (h… view at source ↗
Figure 5
Figure 5. Figure 5: Per-block recovery. Per-block R2 , averaged over factors. The block-sparse fea￾turizers (purple) sit near the oracle ceiling; the two repaired variants (bottom) recover well above their original forms. The same principle repairs SMixAE and MFA. The toy also lets us probe two recent featurizers that, like ours, take a multi-dimensional unit but are built on different assumptions: SMixAE (Francel, 2026) and … view at source ↗
Figure 6
Figure 6. Figure 6: Distortion estimation and reconstruction sparsity trade-off. (top) Estimating the recon￾struction fidelity each task requires. DINOv3 activations are degraded by progressive quantization and passed to a frozen linear probe; the relative task metric is plotted against the fidelity R2 (x, xq) of the corrupted features. Performance is essentially unaffected down to R2 ≈ 0.8 for classification and segmentation… view at source ↗
Figure 7
Figure 7. Figure 7: Block structure yields a lower description length than SAEs. Description length Lδ(x) (Eq. 5) for the Grassmannian BSF at the 20% distortion floor, against block dimension b, with one panel per dictionary width G and one curve per block sparsity k. Codes with b>1 describe DINOv3 activations in fewer bits than the b=1 SAE across every width and sparsity, the minimum falling at a moderate b between 2 and 4 t… view at source ↗
Figure 8
Figure 8. Figure 8: Block dimensionality stabilizes between two and four. Mean stable rank of the per-block code, against the block dimension b the featurizer was granted. Although blocks are allotted up to b = 16 coordinates, the dimension they occupy saturates near 3, indicating that DINOv3 concepts are on average between two- and four-dimensional regardless of the room made available to them [PITH_FULL_IMAGE:figures/full_… view at source ↗
Figure 9
Figure 9. Figure 9: A block carries a concept together with its intrinsic geometry. Six blocks from a Grassmann featurizer on DINOv3. For every block we overlay the patches it fires on a sample of images, coloring each patch by the first three principal coordinates of its contribution ug mapped to the red, green, and blue channels, so that hue reports where on the concept a patch lies rather than merely how strongly it is pre… view at source ↗
Figure 10
Figure 10. Figure 10: Feature visualization on one recovered concept manifold. A single Grassmannian BSF block trained on DINOv3 recovers a cabbage/cauliflower manifold. The central colored point cloud is a projection of the block’s active contributions, with hue encoding the first three principal coordinates of the block’s intrinsic geometry. White points mark locations sampled along paths through this cloud, and the images a… view at source ↗
Figure 11
Figure 11. Figure 11: A single block recovers the InceptionV1 curve manifold, and discovers higher-order Fourier modes. (a) Radial tuning curves on synthetic oriented stimuli: individual neurons (Cam￾marata et al., 2020) and Top-k SAE atoms (Gorton, 2024) each fire for a narrow wedge of orientations, shattering the family into many petals, whereas a single BSF block covers all orientations as one connected region. (b) Decompos… view at source ↗
Figure 12
Figure 12. Figure 12: Block structure yields class selective, spatially coherent concepts. Best single-concept detector F1 (left) and concept-map smoothness measured by total variation (center) and Dirichlet energy (right), against sparsity k on the native W=32,768 grid; across every k the block-SAEs (b>1) reach higher F1 and lower total variation and Dirichlet energy than the b=1 SAE, the gain widening with b [PITH_FULL_IMAG… view at source ↗
Figure 13
Figure 13. Figure 13: Recovered concepts track scene illumination, consistently across objects. A Grass￾mannian BSF on DINOv3 is probed on a series of twelve Blender models rendered while the sun sweeps in azimuth and elevation. (Top left) The block norm ∥zg∥ of the luminance block against sun azimuth, one curve per elevation in {15◦ , . . . , 90◦}; (Top middle) the ground-truth luminance over the same sweep, which the block n… view at source ↗
Figure 14
Figure 14. Figure 14: Steering the pretzel manifold uncovered in SDXL by BSF. We steer an SDXL block from the component that represents the concept of a pretzel. We prompt SDXL for an image of a pretzel and then run four steps of diffusion while fixing the pretzel block to a particular value. We visualize the points within a block with UMAP because the b = 16, but color the plot according to the principal components of the blo… view at source ↗
Figure 15
Figure 15. Figure 15: Steering blocks in layer down.2.1 of SDXL. Generations produced by setting the block’s contribution and fitting a Kohonen map to fit the natural geometry of the subspace ( [PITH_FULL_IMAGE:figures/full_fig_p015_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: A gallery of feature visualizations on recovered concept manifolds. Representative blocks from a Grassmannian BSF trained on DINOv3, visualized with the same convention as [PITH_FULL_IMAGE:figures/full_fig_p026_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: A gallery of feature visualizations on recovered concept manifolds. Follow up of Fig.16. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Reconstruction improves monotonically with every structural parameter, for all three featurizers. Held-out R2 against block sparsity ℓ, one row per featurizer (top to bottom: Block, Grassmannian, Group Lasso), one column per dictionary width G (expansions 8× to 64×), with curves colored by block dimension k (k=1, in black, is the directional SAE). Quality rises with ℓ, with G, and with k throughout, and t… view at source ↗
Figure 19
Figure 19. Figure 19: Grassmannian featurizer, full description-length landscape. Description length Lδ(x) (Eq. 5) against block dimension k, with one panel per dictionary width G and one curve per block sparsity ℓ, at each distortion level. The minimum of each curve marks the description-optimal block dimension, which sits at an interior k ≈ 3 and eases downward as G widens. C.3 Description length favours structure at every d… view at source ↗
Figure 20
Figure 20. Figure 20: Block featurizer, full description-length landscape. As in [PITH_FULL_IMAGE:figures/full_fig_p030_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Group Lasso featurizer, full description-length landscape. As in [PITH_FULL_IMAGE:figures/full_fig_p030_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Rank measures and utilization against block dimension. (top) Mean stable rank, participation ratio, and effective rank of the per-block code, against the allotted block dimension k, for the three featurizers; the dashed line is full occupancy (rank = k). All three measures grow far more slowly than the diagonal, so a block uses only a fraction of the dimensions it is given. (bottom) Utilization, the effec… view at source ↗
Figure 23
Figure 23. Figure 23: The higher harmonics survive a matched null. Share of the AC variance of the recovered curve manifold mi(θ) carried by each Fourier mode k as the stimulus orientation sweeps [0, 360◦ ). The measured manifold (blue) retains 18.3% and 12.0% of its variance at k=2 and k=3, while the two nulls leave these modes empty: a pure first-harmonic input (pink, k=1 only) places all of its power on the orientation circ… view at source ↗
Figure 24
Figure 24. Figure 24: Linear Probe Evaluation. We train linear probes on codes extracted by Vanilla SAEs and BSFs (Block-SAEs) on ImageNet-1k (classification), ADE20k (semantic segmentation) and NYUv2 (monocular depth estimation). We report validation set performance on each dataset. 1 2 3 4 group size K 0.1 0.2 0.3 0.4 0.5 0.6 0.7 cos(probe direction, k-atom subspace) Classification 1 2 3 4 group size K Segmentation 1 2 3 4 g… view at source ↗
Figure 25
Figure 25. Figure 25: Probe Recovery. We treat linear probes as activations and measure how well different SAEs can reconstruct them. 8 16 32 64 sparsity L0 (active groups/token) 4 5 6 7 Total Variation 4,096 concepts 8 16 32 64 sparsity L0 (active groups/token) 8,192 concepts 8 16 32 64 sparsity L0 (active groups/token) 16,384 concepts 8 16 32 64 sparsity L0 (active groups/token) 32,768 concepts SAE (K=1) K=2 K=3 K=4 K=6 K=8 … view at source ↗
Figure 26
Figure 26. Figure 26: BSFs learn more spatially coherent concepts. We show total variation as a function of number of concepts predicted for a sweep of 96 SAEs. 36 [PITH_FULL_IMAGE:figures/full_fig_p036_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: Minimal description length comparison between Grassmannian BSFs and MFAs. MFAs are naturally dense, resulting in high description lengths during inference. Hard top-k thresh￾olding can be applied to MFAs to induce sparsity, as shown by the open circles (sparsity of k=1), significantly reducing the MDL without impacting R2 significantly. Adapting SMixAE and MFA. SMixAE encodes through a rectified expert sp… view at source ↗
read the original abstract

What is the geometry of a visual percept? The most widely used protocols for decomposing neural network representations into interpretable parts treat concepts as isolated directions, yet recent work shows that concepts are often realized as geometric structures in low dimensional regions of activation space. We turn to the literature of Structured sparsity to close this gap, and show that block sparsity, which groups directions into blocks, is the prior matched to a generative model in which a representation is a sparse sum of low-dimensional manifolds: the modern, learned form of a classical idea in visual neuroscience, where a visual feature is carried by a coordinated group of neurons rather than a single tuned one. We implement three variants of block-sparse featurizers (BSFs) and, through a minimum-description-length analysis, show that all three describe activations more compactly than direction-based featurizers, with the recovered concepts typically two- to four-dimensional. We then use BSFs to (i) recontextualize prior work, showing that curve detectors in InceptionV1 actually read from a single continuous curve manifold, (ii) discover novel manifolds including shadows and lighting in DINOv3, and (iii) support interpretable control of image generation in diffusion models (SDXL) via manifold steering.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper claims that block sparsity is the prior matched to a generative model of neural activations as sparse sums of low-dimensional manifolds. It implements three block-sparse featurizer (BSF) variants and shows via minimum-description-length (MDL) analysis that they yield lower description lengths than direction-based featurizers, with recovered blocks typically 2-4 dimensional. Applications include recontextualizing curve detectors in InceptionV1 as reading from a single continuous manifold, discovering novel manifolds (e.g., shadows/lighting) in DINOv3, and enabling manifold-based steering in SDXL diffusion models.

Significance. If the central MDL result holds and the generative-model interpretation is supported, the work would advance interpretability by supplying a structured-sparsity prior aligned with manifold geometry rather than isolated directions, extending classical neuroscience ideas to modern networks. The cross-model applications and control demonstration would be concrete strengths.

major comments (1)
  1. [MDL analysis] MDL analysis section: the claim that lower description length establishes that block sparsity is 'the prior matched to' the sparse-sum-of-manifolds generative model is not isolated by the reported experiments. The comparison is consistent with BSFs simply capturing local correlations or clustered directions more efficiently; a controlled test on synthetic data generated from each process is needed to distinguish the two stories.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment below.

read point-by-point responses
  1. Referee: MDL analysis section: the claim that lower description length establishes that block sparsity is 'the prior matched to' the sparse-sum-of-manifolds generative model is not isolated by the reported experiments. The comparison is consistent with BSFs simply capturing local correlations or clustered directions more efficiently; a controlled test on synthetic data generated from each process is needed to distinguish the two stories.

    Authors: We agree that the MDL comparison performed on real network activations does not isolate the sparse-sum-of-manifolds generative model from alternative explanations such as efficient capture of local correlations or clustered directions. The manuscript's theoretical section motivates block sparsity from the manifold model, and the recovered blocks are observed to be low-dimensional (typically 2-4D) with applications that align with manifold geometry, but these observations remain correlational. A controlled synthetic experiment comparing description lengths under data generated from each process would strengthen the causal claim. We will add such an experiment to the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity; central claim rests on independent empirical MDL comparison

full rationale

The paper advances its claim that block sparsity matches a sparse-sum-of-manifolds generative model solely via an empirical minimum-description-length analysis comparing BSF variants to direction-based featurizers on real activations. No equations, fitting procedures, or self-citations are shown that would reduce any prediction or uniqueness result to the inputs by construction. The MDL metric functions as an external benchmark rather than a self-referential definition, and the recovered block dimensionalities (2-4) are reported outcomes rather than fitted inputs renamed as predictions. The derivation chain is therefore self-contained against external data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central modeling choice is the generative story of sparse sums of low-dimensional manifolds; no free parameters or invented entities are named in the abstract.

axioms (1)
  • domain assumption Concepts are realized as geometric structures in low dimensional regions of activation space rather than isolated directions
    Stated as the premise drawn from recent work that the block-sparse prior is intended to match.

pith-pipeline@v0.9.1-grok · 5855 in / 1088 out tokens · 13239 ms · 2026-06-25T23:52:22.312789+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

152 extracted references · 14 canonical work pages

  1. [1]

    and Gillis, N

    Abdolali, M. and Gillis, N. Beyond linear subspace clustering: A comparative study of nonlinear manifold clustering algorithms. Computer Science Review, 2021

  2. [2]

    Sanity checks for saliency maps

    Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., and Kim, B. Sanity checks for saliency maps. Advances in Neural Information Processing Systems (NIPS), 2018

  3. [3]

    Adelson, E. H. and Bergen, J. R. Spatiotemporal energy models for the perception of motion. Journal of the Optical Society of America A, 2 0 (2): 0 284--299, 1985. doi:10.1364/JOSAA.2.000284

  4. [4]

    T., and Sharkey, L

    Ayonrinde, K., Pearce, M. T., and Sharkey, L. Interpretability as compression: Reconsidering sae explanations of neural activations with mdl-saes. arXiv preprint arXiv:2410.11179, 2024

  5. [5]

    Structured sparsity through convex optimization

    Bach, F., Jenatton, R., Mairal, J., and Obozinski, G. Structured sparsity through convex optimization. Statistical Science, 2012

  6. [6]

    On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation

    Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., and Samek, W. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. Public Library of Science (PloS One), 2015

  7. [7]

    Bao, P., She, L., McGill, M., and Tsao, D. Y. A map of object space in primate inferotemporal cortex. Nature, 583 0 (7814): 0 103--108, 2020. doi:10.1038/s41586-020-2350-5

  8. [8]

    G., Cevher, V., Duarte, M

    Baraniuk, R. G., Cevher, V., Duarte, M. F., and Hegde, C. Model-based compressive sensing. IEEE Transactions on information theory, 2010

  9. [9]

    Barlow, H. B. et al. Possible principles underlying the transformation of sensory messages. Sensory communication, 1961

  10. [10]

    and Niyogi, P

    Belkin, M. and Niyogi, P. Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in Neural Information Processing Systems (NeurIPS), 2001

  11. [11]

    Do sparse autoencoders capture concept manifolds? A r X iv e-print , 2026 a

    Bhalla, U., Fel, T., Rager, C., Feucht, S., Haklay, T., Wurgaft, D., Boppana, S., Kowal, M., Shyam, V., Merullo, J., et al. Do sparse autoencoders capture concept manifolds? A r X iv e-print , 2026 a

  12. [12]

    M., Lakkaraju, H., and Calmon, F

    Bhalla, U., Oesterling, A., Verdun, C. M., Lakkaraju, H., and Calmon, F. P. Temporal sparse autoencoders: Leveraging the sequential nature of language for interpretability. 2026 b . URL https://arxiv.org/abs/2511.05541

  13. [13]

    E., Hume, T., Carter, S., Henighan, T., and Olah, C

    Bricken, T., Templeton, A., Batson, J., Chen, B., Jermyn, A., Conerly, T., Turner, N., Anil, C., Denison, C., Askell, A., Lasenby, R., Wu, Y., Kravec, S., Schiefer, N., Maxwell, T., Joseph, N., Hatfield-Dodds, Z., Tamkin, A., Nguyen, K., McLean, B., Burke, J. E., Hume, T., Carter, S., Henighan, T., and Olah, C. Towards monosemanticity: Decomposing languag...

  14. [14]

    Batchtopk sparse autoencoders

    Bussmann, B., Leask, P., and Nanda, N. Batchtopk sparse autoencoders. A r X iv e-print , 2024

  15. [15]

    Cadieu, C. F. and Olshausen, B. A. Learning intermediate-level representations of form and motion from natural movies. Neural Computation, 24 0 (4): 0 827--866, 2012. doi:10.1162/NECO_a_00247

  16. [16]

    Curve detectors

    Cammarata, N., Goh, G., Carter, S., Schubert, L., Petrov, M., and Olah, C. Curve detectors. Distill.pub, 2020

  17. [17]

    Penalized regression, standard errors, and bayesian lassos

    Casella, G., Ghosh, M., Gill, J., and Kyung, M. Penalized regression, standard errors, and bayesian lassos. 2010

  18. [18]

    and Tsao, D

    Chang, L. and Tsao, D. Y. The code for facial identity in the primate brain. Cell, 169 0 (6): 0 1013--1028, 2017. doi:10.1016/j.cell.2017.05.011

  19. [19]

    and Abbott, L

    Chung, S. and Abbott, L. F. Neural population geometry: An approach for understanding biological and artificial neural networks. Current opinion in neurobiology, 70: 0 137--144, 2021

  20. [20]

    D., and Sompolinsky, H

    Chung, S., Lee, D. D., and Sompolinsky, H. Classification and geometry of general perceptual manifolds. Physical Review X, 8 0 (3): 0 031003, 2018

  21. [21]

    M., Cunningham, J

    Churchland, M. M., Cunningham, J. P., Kaufman, M. T., Foster, J. D., Nuyujukian, P., Ryu, S. I., and Shenoy, K. V. Neural population dynamics during reaching. Nature, 2012

  22. [22]

    Coifman, R. R. and Lafon, S. Diffusion maps. Applied and computational harmonic analysis, 2006

  23. [23]

    What i cannot predict, i do not understand: A human-centered evaluation framework for explainability methods

    Colin, J., Fel, T., Cad \`e ne, R., and Serre, T. What i cannot predict, i do not understand: A human-centered evaluation framework for explainability methods. Advances in Neural Information Processing Systems (NeurIPS), 2021

  24. [24]

    S., Tolooshams, B., and Ba, D

    Costa, V., Fel, T., Lubana, E. S., Tolooshams, B., and Ba, D. From flat to hierarchical: Extracting sparse representations with matching pursuit. arXiv preprint arXiv:2506.03093, 2025

  25. [25]

    Sparse autoencoders find highly interpretable features in language models

    Cunningham, H., Ewart, A., Riggs, L., Huben, R., and Sharkey, L. Sparse autoencoders find highly interpretable features in language models. arXiv preprint arXiv:2309.08600, 2023

  26. [26]

    Dalili, S. A. and Mahdavi, M. Subspace-aware sparse autoencoders for effective mechanistic interpretability. A r X iv e-print , 2026

  27. [27]

    Donoho, D. L. and Grimes, C. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Sciences, 2003

  28. [28]

    Doshi, F. R. and Konkle, T. Cortical topographic motifs emerge in a self-organized map of object space. Science Advances, 2023

  29. [29]

    and Irofti, P

    Dumitrescu, B. and Irofti, P. Dictionary learning algorithms and applications. 2018

  30. [30]

    Ebitz, R. B. and Hayden, B. Y. The population doctrine in cognitive neuroscience. Neuron, 2021

  31. [31]

    Sparse and redundant representations: from theory to applications in signal and image processing

    Elad, M. Sparse and redundant representations: from theory to applications in signal and image processing. Springer Science & Business Media, 2010

  32. [32]

    Eldar, Y. C. and Bolcskei, H. Block-sparsity: Coherence and efficient recovery. 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009

  33. [33]

    and Vidal, R

    Elhamifar, E. and Vidal, R. Sparse manifold clustering and embedding. Advances in Neural Information Processing Systems (NeurIPS), 2011

  34. [34]

    and Vidal, R

    Elhamifar, E. and Vidal, R. Sparse subspace clustering: Algorithm, theory, and applications. IEEE transactions on pattern analysis and machine intelligence, 2013

  35. [35]

    J., Liao, I., Gurnee, W., and Tegmark, M

    Engels, J., Michaud, E. J., Liao, I., Gurnee, W., and Tegmark, M. Not all language model features are one-dimensionally linear. arXiv preprint arXiv:2405.14860, 2024

  36. [36]

    Look at the variance! efficient black-box explanations with sobol-based sensitivity analysis

    Fel, T., Cadene, R., Chalvidal, M., Cord, M., Vigouroux, D., and Serre, T. Look at the variance! efficient black-box explanations with sobol-based sensitivity analysis. Advances in Neural Information Processing Systems (NeurIPS), 2021

  37. [37]

    Unlocking feature visualization for deeper networks with magnitude constrained optimization

    Fel, T., Boissin, T., Boutin, V., Picard, A., Novello, P., Colin, J., Linsley, D., Rousseau, T., Cadène, R., Gardes, L., and Serre, T. Unlocking feature visualization for deeper networks with magnitude constrained optimization. Advances in Neural Information Processing Systems (NeurIPS), 2023 a

  38. [38]

    A holistic approach to unifying automatic concept extraction and concept importance estimation

    Fel, T., Boutin, V., Moayeri, M., Cadene, R., Bethune, L., Chalvidal, M., and Serre, T. A holistic approach to unifying automatic concept extraction and concept importance estimation. Advances in Neural Information Processing Systems (NeurIPS), 2023 b

  39. [39]

    Craft: Concept recursive activation factorization for explainability

    Fel, T., Picard, A., Bethune, L., Boissin, T., Vigouroux, D., Colin, J., Cadène, R., and Serre, T. Craft: Concept recursive activation factorization for explainability. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023 c

  40. [40]

    A., Kowal, M., Lee, A., Balestriero, R., Joseph, S., Lubana, E

    Fel, T., Wang, B., Lepori, M. A., Kowal, M., Lee, A., Balestriero, R., Joseph, S., Lubana, E. S., Konkle, T., Ba, D., et al. Into the rabbit hull: From task-relevant concepts in dino to minkowski geometry. arXiv preprint arXiv:2510.08638, 2025

  41. [41]

    S., et al

    Feucht, S., Haklay, T., Bhalla, U., Wurgaft, D., Rager, C., Sarfati, R., Merullo, J., McGrath, T., Lewis, O., Lubana, E. S., et al. Arithmetic in the wild: Llama uses base-10 addition to reason about cyclic concepts. A r X iv e-print , 2026

  42. [42]

    and Endres, D

    Foldiak, P. and Endres, D. M. Sparse coding. A r X iv e-print , 2008

  43. [43]

    Fong, R. C. and Vedaldi, A. Interpretable explanations of black boxes by meaningful perturbation. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017

  44. [44]

    Smixae: Towards unsupervised manifold discovery in language models

    Francel, C. Smixae: Towards unsupervised manifold discovery in language models. A r X iv e-print , 2026

  45. [45]

    Convergent evolution: How different language models learn similar number representations

    Fu, D., Zhou, T., Belkin, M., Sharan, V., and Jia, R. Convergent evolution: How different language models learn similar number representations. A r X iv e-print , 2026

  46. [46]

    K., and Rigotti, M

    Fusi, S., Miller, E. K., and Rigotti, M. Why neurons mix: high dimensionality for higher cognition. Current Opinion in Neurobiology, 37: 0 66--74, 2016. doi:10.1016/j.conb.2016.01.010

  47. [47]

    A., Perich, M

    Gallego, J. A., Perich, M. G., Miller, L. E., and Solla, S. A. Neural manifolds for the control of movement. Neuron, 2017

  48. [48]

    D., Tillman, H., Goh, G., Troll, R., Radford, A., Sutskever, I., Leike, J., and Wu, J

    Gao, L., la Tour, T. D., Tillman, H., Goh, G., Troll, R., Radford, A., Sutskever, I., Leike, J., and Wu, J. Scaling and evaluating sparse autoencoders. arXiv preprint arXiv:2406.04093, 2024

  49. [49]

    S., Fel, T., Merullo, J., Lewis, O., and McGrath, T

    Geiger, A., Lubana, E. S., Fel, T., Merullo, J., Lewis, O., and McGrath, T. The world inside neural networks: How neural geometry will unlock understanding and control of ai. Goodfire, May 2026

  50. [50]

    P., Schwartz, A

    Georgopoulos, A. P., Schwartz, A. B., and Kettner, R. E. Neuronal population coding of movement direction. Science, 233 0 (4771): 0 1416--1419, 1986

  51. [51]

    Interpretation of neural networks is fragile

    Ghorbani, A., Abid, A., and Zou, J. Interpretation of neural networks is fragile. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2017

  52. [52]

    Y., and Kim, B

    Ghorbani, A., Wexler, J., Zou, J. Y., and Kim, B. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems (NeurIPS), 2019

  53. [53]

    The missing curve detectors of inceptionv1: Applying sparse autoencoders to inceptionv1 early vision

    Gorton, L. The missing curve detectors of inceptionv1: Applying sparse autoencoders to inceptionv1 early vision. A r X iv e-print , 2024

  54. [54]

    and LeCun, Y

    Gregor, K. and LeCun, Y. Learning fast approximations of sparse coding. Proceedings of the International Conference on Machine Learning (ICML), 2010

  55. [55]

    When models manipulate manifolds: The geometry of a counting task

    Gurnee, W., Ameisen, E., Kauvar, I., Tarng, J., Pearce, A., Olah, C., and Batson, J. When models manipulate manifolds: The geometry of a counting task. Transformer Circuits Thread, 2025. URL https://transformer-circuits.pub/2025/linebreaks/index.html

  56. [56]

    Guthikonda, S. M. Kohonen self-organizing maps. Wittenberg University, 2005

  57. [57]

    The out-of-distribution problem in explainability and search methods for feature importance explanations

    Hase, P., Xie, H., and Bansal, M. The out-of-distribution problem in explainability and search methods for feature importance explanations. Advances in Neural Information Processing Systems (NeurIPS), 2021

  58. [58]

    Hindupur, S. S. R., Lubana, E. S., Fel, T., and Ba, D. Projecting assumptions: The duality between sparse autoencoders and concept geometry. arXiv preprint arXiv:2503.01822, 2025

  59. [59]

    Evaluations and methods for explanation through robustness analysis

    Hsieh, C.-Y., Yeh, C.-K., Liu, X., Ravikumar, P., Kim, S., Kumar, S., and Hsieh, C.-J. Evaluations and methods for explanation through robustness analysis. Proceedings of the International Conference on Learning Representations (ICLR), 2021

  60. [60]

    Learning with structured sparsity

    Huang, J., Zhang, T., and Metaxas, D. Learning with structured sparsity. In Proceedings of the 26th Annual International Conference on Machine Learning, pp.\ 417--424, 2009

  61. [61]

    and Hoyer, P

    Hyv \"a rinen, A. and Hoyer, P. Emergence of phase- and shift-invariant features by decomposition of natural images into independent feature subspaces. Neural Computation, 12 0 (7): 0 1705--1720, 2000. doi:10.1162/089976600300015312

  62. [62]

    O., and Inki, M

    Hyv \"a rinen, A., Hoyer, P. O., and Inki, M. Topographic independent component analysis. Neural Computation, 13 0 (7): 0 1527--1558, 2001. doi:10.1162/089976601750264992

  63. [63]

    Structured sparse principal component analysis

    Jenatton, R., Obozinski, G., and Bach, F. Structured sparse principal component analysis. International Conference on Artificial Intelligence and Statistics, 2010

  64. [64]

    Deep subspace clustering networks

    Ji, P., Zhang, T., Li, H., Salzmann, M., and Reid, I. Deep subspace clustering networks. Advances in Neural Information Processing Systems (NeurIPS), 2017

  65. [65]

    and Tegmark, M

    Kantamneni, S. and Tegmark, M. Language models use trigonometry to do addition. arXiv preprint arXiv:2502.00873, 2025 a

  66. [66]

    and Tegmark, M

    Kantamneni, S. and Tegmark, M. Language models use trigonometry to do addition. A r X iv e-print , 2025 b

  67. [67]

    J., Nava, A., Wyart, M., and Bahri, Y

    Karkada, D., Korchinski, D. J., Nava, A., Wyart, M., and Bahri, Y. Symmetry in language statistics shapes the geometry of model representations. arXiv preprint arXiv:2602.15029, 2026

  68. [68]

    and Lewicki, M

    Karklin, Y. and Lewicki, M. S. Emergence of complex cell properties by learning to generalize in natural scenes. Nature, 457 0 (7225): 0 83--86, 2009. doi:10.1038/nature07481

  69. [69]

    Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav)

    Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J., Viegas, F., et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pp.\ 2668--2677. PMLR, 2018

  70. [70]

    Identifying interpretable visual features in artificial and biological neural systems

    Klindt, D., Sanborn, S., Acosta, F., Poitevin, F., and Miolane, N. Identifying interpretable visual features in artificial and biological neural systems. A r X iv e-print , 2023

  71. [71]

    From superposition to sparse codes: interpretable representations in neural networks

    Klindt, D., O'Neill, C., Reizinger, P., Maurer, H., and Miolane, N. From superposition to sparse codes: interpretable representations in neural networks. arXiv preprint arXiv:2503.01824, 2025

  72. [72]

    Emergent organization of multiple visuotopic maps without a feature hierarchy

    Konkle, T. Emergent organization of multiple visuotopic maps without a feature hierarchy. bioRxiv, 2021

  73. [73]

    G., and Tokmakov, P

    Kowal, M., Dave, A., Ambrus, R., Gaidon, A., Derpanis, K. G., and Tokmakov, P. Understanding video transformers via universal concept discovery. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024 a

  74. [74]

    P., and Derpanis, K

    Kowal, M., Wildes, R. P., and Derpanis, K. G. Visual concept connectome (vcc): Open world concept discovery and their interlayer connections in deep models. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024 b

  75. [75]

    Structured sparse subspace clustering: A joint affinity learning and subspace clustering framework

    Li, C.-G., You, C., and Vidal, R. Structured sparse subspace clustering: A joint affinity learning and subspace clustering framework. IEEE Transactions on Image Processing, 2017

  76. [76]

    Li, Z., Chen, Y., LeCun, Y., and Sommer, F. T. Neural manifold clustering and embedding. A r X iv e-print , 2022

  77. [77]

    Robust subspace segmentation by low-rank representation

    Liu, G., Lin, Z., and Yu, Y. Robust subspace segmentation by low-rank representation. Proceedings of the International Conference on Machine Learning (ICML), 2010

  78. [78]

    Robust recovery of subspace structures by low-rank representation

    Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., and Ma, Y. Robust recovery of subspace structures by low-rank representation. IEEE transactions on pattern analysis and machine intelligence, 2012

  79. [79]

    S., Rager, C., Hindupur, S

    Lubana, E. S., Rager, C., Hindupur, S. S. R., Costa, V., Tuckute, G., Patel, O., Murthy, S. K., Fel, T., Wurgaft, D., Bigelow, E. J., et al. Priors in time: Missing inductive biases for language model interpretability. arXiv preprint arXiv:2511.01836, 2025

  80. [80]

    Sparse modeling for image and vision processing

    Mairal, J., Bach, F., and Ponce, J. Sparse modeling for image and vision processing. Foundations and Trends in Computer Graphics and Vision, 2014

Showing first 80 references.