Mechanistic Interpretability with Sparse Autoencoder Neural Operators

Ailsa Shen; Anima Anandkumar; Bahareh Tolooshams

arxiv: 2509.03738 · v4 · submitted 2025-09-03 · 💻 cs.LG · cs.AI· eess.SP· stat.ML

Mechanistic Interpretability with Sparse Autoencoder Neural Operators

Bahareh Tolooshams , Ailsa Shen , Anima Anandkumar This is my paper

Pith reviewed 2026-05-18 18:56 UTC · model grok-4.3

classification 💻 cs.LG cs.AIeess.SPstat.ML

keywords sparse autoencodersneural operatorsmechanistic interpretabilityfunctional representationsFourier neural operatorsconcept sparsitydomain sparsity

0 comments

The pith

Sparse autoencoder neural operators represent concepts as functions to capture where and how they appear across an input domain.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces sparse autoencoder neural operators that work directly in function space instead of fixed vectors. They formalize the idea that data arise from sparse combinations of structured functions, and implement this by parameterizing each concept as a function that can vary over the domain. Using Fourier neural operators as the base, the models apply joint sparsity to choose active concepts and to select where each one is expressed. On vision data these SAE-FNOs learn localized patterns, require fewer active concepts, and keep concept properties stable when sparsity changes. They also adjust automatically to new domain sizes and continue to work at resolutions never seen in training, settings where ordinary sparse autoencoders stop functioning.

Core claim

Moving from vector-valued to functional parameterizations, together with joint concept and domain sparsity, extends sparse autoencoders from merely indicating concept presence to modeling the structured spatial or spectral expression of those concepts, as shown by improved localization, efficiency, stability, and generalization across discretizations on vision tasks.

What carries the argument

SAE-FNOs, which instantiate sparse autoencoders with Fourier neural operators so that each concept is an integral operator in the Fourier domain, controlled by separate sparsity penalties on which concepts activate and where they act across the input domain.

If this is right

SAE-FNOs learn localized patterns on vision data.
They activate fewer concepts than standard SAEs while maintaining performance.
Concept properties remain stable when the sparsity level is varied.
The models automatically adapt when the size of the input domain changes.
They continue to operate correctly at grid resolutions higher than those used during training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same functional approach could be applied to time-series or physical simulation data where spatial or frequency structure is central.
Standard vector SAEs may be fundamentally limited when the underlying data vary continuously across a domain.
Choosing different operator bases could let practitioners inject domain knowledge directly into the interpretability model.

Load-bearing premise

Data are generated by sparse compositions of structured functions rather than by scalar activations inside a fixed-dimensional vector space.

What would settle it

A direct comparison on the same vision benchmark in which SAE-FNOs either fail to generalize to resolutions outside the training grid or require as many or more active concepts as a standard SAE to reach the same reconstruction quality.

Figures

Figures reproduced from arXiv: 2509.03738 by Ailsa Shen, Anima Anandkumar, Bahareh Tolooshams.

**Figure 1.** Figure 1: Model Recovery with SAEs. a) Architectural comparison of SAE, lifted SAE, and SAE Neural Operators. b) Learning in sampled Euclidean spaces vs. function spaces. conditions do networks and operators recover equivalent representations, and when do operators offer advantages? iii) How does lifting affect recovery dynamics? Our Contributions We address these questions by extending SAEs to lifted SAEs (L-SAEs),… view at source ↗

**Figure 2.** Figure 2: SAE-CNN vs. SAE-FNO. a) Lifting accelerates learning. b) SAE-FNO’s superiority in recovering smooth concepts via truncated Fourier modes. c) Equivalent learning when SAE-FNO uses all Fourier modes and matched spatial receptive field of SAE-CNNs. SAE-FNO We examined model recovery in function spaces. Our results show that: i) the liftinginduced preconditioning effect extends to L-SAE-FNO (Fig. 2a, Prop. D.… view at source ↗

**Figure 3.** Figure 3: SAE-FNO Upsampling Robustness Across Resolutions. SAE-FNO successfully infers the underlying sparse representations and reconstructs data across multiple discretization levels. The left panels show inference of 1-sparse code supports across 5 kernels, and the right panels display spatial-domain signal reconstruction (see also [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Lifting as a preconditioner. Lifting accelerates learning. 0 2500 5000 7500 10000 12500 15000 Training Iterations 0.0 0.1 0.2 0.3 Dictionary Recovery Error SAE-CNN L-SAE-CNN [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Lifting. When the lifting operator satisfies the orthogonal condition L⊤L = I, the lifted SAE-CNN (L-SAE-CNN) exhibits equivalent learning dynamics to the SAE-CNN . 14 [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: SAEs vs. L-SAEs. Learning the lifting operator L accelerates model recovery. (a) Dictionary recovery error converges faster for L-SAE-MLP, confirming the preconditioning effect of lifting; (b) Reconstruction loss follows similar convergence trends as dictionary recovery error; (c) Lifting encourages the effective dictionary D to learn more orthogonal (less correlated) atoms early in training, creating a mo… view at source ↗

**Figure 7.** Figure 7: SAE-FNO Upsampling Robustness Across Resolutions. SAE-FNO successfully infers the underlying sparse representations and reconstructs data across multiple discretization levels. The left panels show inference of 1-sparse code supports across 5 kernels, while the right panels display spatial-domain signal reconstruction. (a) Original resolution (1×): Baseline performance at training resolution. (b-d) Higher … view at source ↗

read the original abstract

We introduce sparse autoencoder neural operators (SAE-NOs), a new class of sparse autoencoders that operate in function spaces rather than fixed-dimensional Euclidean representations. We formalize the functional representation hypothesis, where data are explained through sparse compositions of structured functions. Unlike standard SAEs that represent concepts with scalar activations, SAE-NOs parameterize concepts as functions, enabling representations that capture not only a concept's presence, but also how and where it is expressed across the input domain. We achieve this through joint sparsity: concept sparsity selects active concepts, while domain sparsity governs where they are expressed. We instantiate this framework using Fourier neural operators (SAE-FNOs), parameterizing concepts as integral operators in the Fourier domain. This functional and spectral parameterization is particularly advantageous when data exhibit spatial structure across scales or when concepts are frequency-structured. We characterize SAE-FNO on vision data and demonstrate that it learns localized patterns, uses concepts more efficiently, and exhibits stable concept characteristics across sparsity levels. We further show that SAE-FNO adapts to changes in domain size and generalizes across discretizations, operating at resolutions beyond those seen during training, where standard SAEs fail. We also introduce lifting into SAEs and show theoretically and empirically that it acts as a preconditioner that accelerates optimization. Overall, our results show that moving from vector-valued to functional parameterizations, with concept and domain sparsity, extends SAEs from representing concept presence to modeling structured concept expression, highlighting the importance of parameterization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This extends SAEs into function space with neural operators and joint sparsity, which is a real parameterization shift, but the cross-resolution generalization rests on thin empirical ground.

read the letter

The main takeaway is that the authors have moved sparse autoencoders out of fixed vectors and into function space by using neural operators, specifically Fourier ones, so concepts are represented as integral operators rather than scalar activations. They add domain sparsity alongside the usual concept sparsity, which lets the model decide both which concepts fire and where they appear across the input. That setup, plus the lifting trick they treat as a preconditioner, is the actual novelty here and it directly targets the spatial and resolution issues that come up in vision or scientific models. On the positive side, the functional representation hypothesis is clearly stated and the choice of FNO instantiation fits frequency-structured data. They report that the resulting SAE-FNOs pick up localized patterns, use fewer concepts for similar reconstruction, and keep concept behavior stable across sparsity levels. The lifting result also looks useful: it speeds optimization both in theory and in their runs. Those pieces are grounded enough to be worth discussing. The weaker part is the headline claim that the model adapts to new domain sizes and generalizes to unseen discretizations where standard SAEs break. The abstract gives qualitative observations but no numbers, error bars, or ablation details on how the Fourier modes and domain mask behave under grid changes. If the learned spectral coefficients or sparsity pattern end up encoding training-grid artifacts, the apparent invariance would not hold up. That concern is worth checking because the whole advantage over vector SAEs hinges on it. This paper is for people working on mechanistic interpretability for spatial or continuous data. It deserves a serious referee because the parameterization move is substantive even if the experiments need tightening. I would send it out for review.

Referee Report

3 major / 2 minor

Summary. The paper introduces sparse autoencoder neural operators (SAE-NOs), a new class of sparse autoencoders that operate in function spaces. It formalizes the functional representation hypothesis, positing that data are explained through sparse compositions of structured functions rather than scalar activations. The framework employs joint sparsity (concept sparsity to select active concepts and domain sparsity to control where they are expressed) and instantiates the approach via Fourier neural operators as SAE-FNOs, parameterizing concepts as integral operators in the Fourier domain. Empirical results on vision data claim that SAE-FNOs learn localized patterns, use concepts more efficiently, exhibit stable concept characteristics across sparsity levels, adapt to domain size changes, and generalize across discretizations where standard SAEs fail. The work also introduces lifting into SAEs and provides theoretical and empirical support that it acts as a preconditioner accelerating optimization.

Significance. If the results hold, the work offers a substantive extension of mechanistic interpretability by shifting from vector-valued to functional representations, enabling modeling of structured concept expression across domains. This is especially relevant for spatially structured data. Strengths include the explicit parameterization choice, the joint sparsity mechanism, and the theoretical motivation for lifting as a preconditioner. The reported resolution-invariance properties, if quantitatively validated, would distinguish the approach from standard SAEs and support broader applicability in scientific machine learning.

major comments (3)

[Abstract and Results section] Abstract and Results section: The central claims that SAE-FNOs adapt to domain size changes and generalize across discretizations (where standard SAEs fail) are load-bearing for the contribution, yet rest on qualitative observations without reported quantitative metrics such as reconstruction error, concept stability scores, or cross-resolution ablation results. This weakens the ability to evaluate whether the Fourier parameterization and joint sparsity truly deliver the asserted invariance.
[§3 (Method, joint sparsity and SAE-FNO definition)] §3 (Method, joint sparsity and SAE-FNO definition): The interaction between the domain sparsity mask and the discrete Fourier integral operator is not shown to preserve resolution invariance under changes in grid size or sampling; if the learned spectral coefficients encode grid-specific artifacts via the FFT implementation, the cross-discretization generalization would be an artifact rather than a property of the functional form. A concrete test (e.g., explicit quadrature or mode truncation analysis) is needed.
[Theoretical section on lifting] Theoretical section on lifting: The claim that lifting acts as a preconditioner is presented as both theoretical and empirical, but the specific derivation (e.g., the relevant equation showing the preconditioning effect on the optimization landscape) is not clearly isolated, making it hard to verify the acceleration result independently of the empirical curves.

minor comments (2)

[Introduction] Clarify the precise distinction and notation between the general SAE-NO framework and the specific SAE-FNO instantiation at first use to improve readability.
[Figures] Add error bars or statistical details to any quantitative plots in the experimental figures, even when the primary emphasis is qualitative.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which has helped us strengthen the presentation and empirical support for our claims. We address each major comment below and have revised the manuscript to incorporate quantitative evidence, additional analysis, and clearer derivations where needed.

read point-by-point responses

Referee: [Abstract and Results section] Abstract and Results section: The central claims that SAE-FNOs adapt to domain size changes and generalize across discretizations (where standard SAEs fail) are load-bearing for the contribution, yet rest on qualitative observations without reported quantitative metrics such as reconstruction error, concept stability scores, or cross-resolution ablation results. This weakens the ability to evaluate whether the Fourier parameterization and joint sparsity truly deliver the asserted invariance.

Authors: We agree that quantitative metrics are necessary to substantiate the resolution-invariance claims. In the revised manuscript we have added reconstruction error tables across multiple grid resolutions, concept stability scores computed via functional cosine similarity between concepts learned at different discretizations, and explicit cross-resolution ablation experiments comparing SAE-FNOs against standard SAEs. These additions demonstrate that the observed generalization is accompanied by measurable improvements in reconstruction fidelity and concept consistency, directly supporting the contribution of the Fourier parameterization and joint sparsity. revision: yes
Referee: [§3 (Method, joint sparsity and SAE-FNO definition)] §3 (Method, joint sparsity and SAE-FNO definition): The interaction between the domain sparsity mask and the discrete Fourier integral operator is not shown to preserve resolution invariance under changes in grid size or sampling; if the learned spectral coefficients encode grid-specific artifacts via the FFT implementation, the cross-discretization generalization would be an artifact rather than a property of the functional form. A concrete test (e.g., explicit quadrature or mode truncation analysis) is needed.

Authors: We appreciate this observation and have added a dedicated analysis subsection in §3. Because concepts are represented by a fixed set of Fourier modes whose coefficients are learned independently of the spatial grid, the parameterization is theoretically resolution-invariant; the domain sparsity mask is applied in the spatial domain after the inverse Fourier transform and therefore does not introduce grid-dependent artifacts into the spectral coefficients. We now include a mode-truncation study and quadrature-error bounds showing that reconstruction error remains stable under grid refinement, together with an empirical test that trains on one discretization and evaluates on another with different sampling density. These results indicate that the generalization arises from the functional form rather than implementation artifacts. revision: yes
Referee: [Theoretical section on lifting] Theoretical section on lifting: The claim that lifting acts as a preconditioner is presented as both theoretical and empirical, but the specific derivation (e.g., the relevant equation showing the preconditioning effect on the optimization landscape) is not clearly isolated, making it hard to verify the acceleration result independently of the empirical curves.

Authors: We acknowledge that the derivation of the preconditioning effect was not sufficiently isolated. In the revised theoretical section we have extracted and numbered the key equations that demonstrate how lifting improves the conditioning of the loss landscape (specifically, how it reduces the Lipschitz constant of the gradient with respect to the encoder parameters). A step-by-step derivation is now provided, showing the relationship between the lifted representation and the Hessian spectrum, followed by the empirical curves that corroborate the predicted acceleration. This separation allows readers to verify the theoretical argument independently. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical tests of a design choice rather than definitional reduction

full rationale

The paper defines SAE-NOs by extending existing SAE and FNO architectures with joint sparsity and a functional representation hypothesis that is explicitly formalized within the work. Generalization across discretizations is presented as an empirical outcome demonstrated on vision data (training on one resolution, testing on others), leveraging the known resolution-invariance properties of Fourier neural operators rather than deriving it tautologically from fitted parameters or self-citations. No equation or result is shown to equal its inputs by construction; the lifting preconditioner claim is supported by both theory and experiments. The derivation chain remains self-contained against external benchmarks of neural operator behavior.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the newly stated functional representation hypothesis and the choice of Fourier parameterization; no explicit free parameters or invented physical entities are described in the abstract.

axioms (1)

domain assumption Functional representation hypothesis: data are explained through sparse compositions of structured functions.
Stated in the abstract as the foundation for moving from scalar to functional concept representations.

invented entities (1)

SAE-NO / SAE-FNO no independent evidence
purpose: Sparse autoencoder that operates in function space with joint concept and domain sparsity.
New class introduced by the paper; no independent evidence outside the work itself.

pith-pipeline@v0.9.0 · 5809 in / 1308 out tokens · 31826 ms · 2026-05-18T18:56:12.309628+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

We formalize the functional representation hypothesis, where data are explained through sparse compositions of structured functions... SAE-FNOs... parameterizing concepts as integral operators in the Fourier domain... lifting... acts as a preconditioner that accelerates optimization.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Lifting... has the effective update... L⊤L acts as a preconditioner... SAE-FNO with truncated modes exhibits an inductive bias that favours recovery of smooth concepts

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

75 extracted references · 75 canonical work pages · 6 internal anchors

[1]

Brain-score: Which artificial neural network for object recognition is most brain-like?,

M. Schrimpf, J. Kubilius, H. Hong, N. J. Majaj, R. Rajalingham, E. B. Issa, K. Kar, P. Bashivan, J. Prescott-Roy, F. Geiger,et al., “Brain-score: Which artificial neural network for object recognition is most brain-like?,”BioRxiv, p. 407007, 2018

work page 2018
[2]

The topology and geometry of neural representations,

B. Lin and N. Kriegeskorte, “The topology and geometry of neural representations,”Proceedings of the National Academy of Sciences, vol. 121, no. 42, p. e2317881121, 2024

work page 2024
[3]

High-level visual representations in the human brain are aligned with large language models,

A. Doerig, T. C. Kietzmann, E. Allen, Y . Wu, T. Naselaris, K. Kay, and I. Charest, “High-level visual representations in the human brain are aligned with large language models,”Nature Machine Intelligence, pp. 1–15, 2025

work page 2025
[4]

Stabilization of a brain–computer interface via the alignment of low-dimensional spaces of neural activity,

A. D. Degenhart, W. E. Bishop, E. R. Oby, E. C. Tyler-Kabara, S. M. Chase, A. P. Batista, and B. M. Yu, “Stabilization of a brain–computer interface via the alignment of low-dimensional spaces of neural activity,”Nature biomedical engineering, vol. 4, no. 7, pp. 672–685, 2020

work page 2020
[5]

Universality and individuality in neural dynamics across large populations of recurrent networks,

N. Maheswaranathan, A. H. Williams, M. D. Golub, S. Ganguli, and D. Sussillo, “Universality and individuality in neural dynamics across large populations of recurrent networks,”Advances in Neural Information Processing Systems, vol. 32, 2019

work page 2019
[6]

Equivalence between representational similarity analysis, centered kernel alignment, and canonical correlations analysis,

A. H. Williams, “Equivalence between representational similarity analysis, centered kernel alignment, and canonical correlations analysis,” inProceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models, pp. 10–23, PMLR, 2024

work page 2024
[7]

Soft matching distance: A metric on neural representations that captures single-neuron tuning,

M. Khosla and A. H. Williams, “Soft matching distance: A metric on neural representations that captures single-neuron tuning,” inProceedings of UniReps: the First Workshop on Unifying Representations in Neural Models, pp. 326–341, PMLR, 2024

work page 2024
[8]

Representation topology divergence: A method for comparing neural network representations.,

S. Barannikov, I. Trofimov, N. Balabin, and E. Burnaev, “Representation topology divergence: A method for comparing neural network representations.,” inInternational Conference on Machine Learning, pp. 1607–1626, PMLR, 2022

work page 2022
[9]

Representational similarity analysis–connecting the branches of systems neuroscience,

N. Kriegeskorte, M. Mur, and P. A. Bandettini, “Representational similarity analysis–connecting the branches of systems neuroscience,”Frontiers in Systems Neuroscience, vol. 2, p. 4, 2008

work page 2008
[10]

Position: The platonic representation hypothesis,

M. Huh, B. Cheung, T. Wang, and P. Isola, “Position: The platonic representation hypothesis,” inForty-first International Conference on Machine Learning, 2024

work page 2024
[11]

Proof of a perfect platonic representation hypothesis,

L. Ziyin and I. Chuang, “Proof of a perfect platonic representation hypothesis,”arXiv preprint arXiv:2507.01098, 2025

work page arXiv 2025
[12]

Neural Operator: Graph Kernel Network for Partial Differential Equations

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar, “Neural operator: Graph kernel network for partial differential equations,”arXiv preprint arXiv:2003.03485, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2003
[13]

Fourier neural operator for parametric partial differential equations,

Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. liu, K. Bhattacharya, A. Stuart, and A. Anand- kumar, “Fourier neural operator for parametric partial differential equations,” inInternational Conference on Learning Representations, 2021

work page 2021
[14]

Neural operator: Learning maps between function spaces with applications to pdes,

N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, and A. Anandkumar, “Neural operator: Learning maps between function spaces with applications to pdes,”Journal of Machine Learning Research, vol. 24, no. 89, pp. 1–97, 2023

work page 2023
[15]

Neural operators for accelerating scientific simulations and design,

K. Azizzadenesheli, N. Kovachki, Z. Li, M. Liu-Schiaffini, J. Kossaifi, and A. Anandkumar, “Neural operators for accelerating scientific simulations and design,”Nature Reviews Physics, pp. 1–9, 2024

work page 2024
[16]

Vars-fusi: Variable sampling for fast and efficient functional ultrasound imaging using neural operators,

B. Tolooshams, L. Lydia, T. Callier, J. Wang, S. Pal, A. Chandrashekar, C. Rabut, Z. Li, C. Blagden, S. L. Norman, K. Azizzadenesheli, C. Liu, M. G. Shapiro, R. A. Andersen, and A. Anandkumar, “Vars-fusi: Variable sampling for fast and efficient functional ultrasound imaging using neural operators,”bioRxiv, pp. 2025–04, 2025. 6

work page 2025
[17]

Noble–neural operator with biologically-informed latent embeddings to capture experimental variability in biological neuron models,

L. Ghafourpour, V . Duruisseaux*, B. Tolooshams*, P. H. Wong, C. A. Anastassiou, and A. Anandkumar, “Noble–neural operator with biologically-informed latent embeddings to capture experimental variability in biological neuron models,”arXiv:2506.04536, 2025

work page arXiv 2025
[18]

FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators

J. Pathak, S. Subramanian, P. Harrington, S. Raja, A. Chattopadhyay, M. Mardani, T. Kurth, D. Hall, Z. Li, K. Azizzadenesheli,et al., “Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators,”arXiv preprint arXiv:2202.11214, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[19]

Geometry-informed neural operator for large-scale 3d pdes,

Z. Li, N. Kovachki, C. Choy, B. Li, J. Kossaifi, S. Otta, M. A. Nabian, M. Stadler, C. Hundt, K. Azizzadenesheli,et al., “Geometry-informed neural operator for large-scale 3d pdes,”Ad- vances in Neural Information Processing Systems, vol. 36, 2024

work page 2024
[20]

Unify- ing subsampling pattern variations for compressed sensing mri with neural operators,

A. S. Jatyani, J. Wang, Z. Wu, M. Liu-Schiaffini, B. Tolooshams, and A. Anandkumar, “Unify- ing subsampling pattern variations for compressed sensing mri with neural operators,”IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025

work page 2025
[21]

Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav),

B. Kim, M. Wattenberg, J. Gilmer, C. Cai, J. Wexler, F. Viegas,et al., “Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav),” inInternational conference on machine learning, pp. 2668–2677, PMLR, 2018

work page 2018
[22]

Emergence of simple-cell receptive field properties by learning a sparse code for natural images,

B. A. Olshausen and D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,”Nature, vol. 381, no. 6583, pp. 607–609, 1996

work page 1996
[23]

Sparse coding with an overcomplete basis set: A strategy employed by v1?,

B. A. Olshausen and D. J. Field, “Sparse coding with an overcomplete basis set: A strategy employed by v1?,”Vision research, vol. 37, no. 23, pp. 3311–3325, 1997

work page 1997
[24]

Toy Models of Superposition

N. Elhage, T. Hume, C. Olsson, N. Schiefer, T. Henighan, S. Kravec, Z. Hatfield-Dodds, R. Lasenby, D. Drain, C. Chen,et al., “Toy models of superposition,”arXiv:2209.10652, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[25]

Sparse autoencoders find highly interpretable features in language models,

R. Huben, H. Cunningham, L. R. Smith, A. Ewart, and L. Sharkey, “Sparse autoencoders find highly interpretable features in language models,” inThe Twelfth International Conference on Learning Representations, 2023

work page 2023
[26]

Towards monosemanticity: Decomposing language models with dictionary learning,

T. Bricken, A. Templeton, J. Batson, B. Chen, A. Jermyn, T. Conerly, N. Turner, C. Anil, C. Denison, A. Askell,et al., “Towards monosemanticity: Decomposing language models with dictionary learning,”Transformer Circuits Thread, vol. 2, 2023

work page 2023
[27]

Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders

S. Rajamanoharan, T. Lieberum, N. Sonnerat, A. Conmy, V . Varma, J. Kramár, and N. Nanda, “Jumping ahead: Improving reconstruction fidelity with jumprelu sparse autoencoders,”arXiv preprint arXiv:2407.14435, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[28]

The linear representation hypothesis and the geometry of large language models,

K. Park, Y . J. Choe, and V . Veitch, “The linear representation hypothesis and the geometry of large language models,” inInternational Conference on Machine Learning, pp. 39643–39666, PMLR, 2024

work page 2024
[29]

Scaling monosemanticity: Extracting interpretable features from claude 3 sonnet,

A. Templeton, T. Conerly, J. Marcus, J. Lindsey, T. Bricken, B. Chen, A. Pearce, C. Citro, E. Ameisen, A. Jones, H. Cunningham, N. L. Turner, C. McDougall, M. MacDiarmid, C. D. Freeman, T. R. Sumers, E. Rees, J. Batson, A. Jermyn, S. Carter, C. Olah, and T. Henighan, “Scaling monosemanticity: Extracting interpretable features from claude 3 sonnet,”Transfo...

work page 2024
[30]

Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2,

T. Lieberum, S. Rajamanoharan, A. Conmy, L. Smith, N. Sonnerat, V . Varma, J. Kramar, A. Dragan, R. Shah, and N. Nanda, “Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2,” inThe 7th BlackboxNLP Workshop, 2024

work page 2024
[31]

Scaling and evaluating sparse autoencoders,

L. Gao, T. D. la Tour, H. Tillman, G. Goh, R. Troll, A. Radford, I. Sutskever, J. Leike, and J. Wu, “Scaling and evaluating sparse autoencoders,” inThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[32]

Archetypal sae: Adaptive and stable dictionary learning for concept extraction in large vision models

T. Fel, E. S. Lubana, J. S. Prince, M. Kowal, V . Boutin, I. Papadimitriou, B. Wang, M. Wat- tenberg, D. Ba, and T. Konkle, “Archetypal sae: Adaptive and stable dictionary learning for concept extraction in large vision models,”arXiv preprint arXiv:2502.12892, 2025. 7

work page arXiv 2025
[33]

Sparse feature circuits: Discovering and editing interpretable causal graphs in language models,

S. Marks, C. Rager, E. J. Michaud, Y . Belinkov, D. Bau, and A. Mueller, “Sparse feature circuits: Discovering and editing interpretable causal graphs in language models,” inThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[34]

SAEBench: A comprehensive benchmark for sparse autoencoders in language model interpretability,

A. Karvonen, C. Rager, J. Lin, C. Tigges, J. I. Bloom, D. Chanin, Y .-T. Lau, E. Farrell, C. S. Mc- Dougall, K. Ayonrinde, D. Till, M. Wearden, A. Conmy, S. Marks, and N. Nanda, “SAEBench: A comprehensive benchmark for sparse autoencoders in language model interpretability,” in Forty-second International Conference on Machine Learning, 2025

work page 2025
[35]

C. W. Groetsch and C. Groetsch,Inverse problems in the mathematical sciences, vol. 52. Springer, 1993

work page 1993
[36]

Hastie, R

T. Hastie, R. Tibshirani, and M. Wainwright,Statistical learning with sparsity: the lasso and generalizations. CRC press, 2015

work page 2015
[37]

Compressed sensing,

D. L. Donoho, “Compressed sensing,”IEEE Transactions on information theory, vol. 52, no. 4, pp. 1289–1306, 2006

work page 2006
[38]

Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,

E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,”IEEE Transactions on information theory, vol. 52, no. 2, pp. 489–509, 2006

work page 2006
[39]

An introduction to compressive sampling,

E. J. Candès and M. B. Wakin, “An introduction to compressive sampling,”IEEE signal processing magazine, vol. 25, no. 2, pp. 21–30, 2008

work page 2008
[40]

Estimating unknown sparsity in compressed sensing,

M. Lopes, “Estimating unknown sparsity in compressed sensing,” inInternational Conference on Machine Learning, pp. 217–225, PMLR, 2013

work page 2013
[41]

Online dictionary learning for sparse coding,

J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online dictionary learning for sparse coding,” in Proceedings of the 26th annual international conference on machine learning, pp. 689–696, 2009

work page 2009
[42]

Learning sparsely used overcomplete dictionaries via alternating minimization,

A. Agarwal, A. Anandkumar, P. Jain, and P. Netrapalli, “Learning sparsely used overcomplete dictionaries via alternating minimization,”SIAM Journal on Optimization, vol. 26, no. 4, pp. 2775–2799, 2016

work page 2016
[43]

Alternating minimization for dictionary learning: Local convergence guarantees,

N. S. Chatterji and P. L. Bartlett, “Alternating minimization for dictionary learning: Local convergence guarantees,”arXiv preprint arXiv:1711.03634, 2017

work page arXiv 2017
[44]

Tolooshams,Deep Learning for Inverse Problems in Engineering and Science

B. Tolooshams,Deep Learning for Inverse Problems in Engineering and Science. PhD thesis, Harvard University, 2023

work page 2023
[45]

Learning fast approximations of sparse coding,

K. Gregor and Y . LeCun, “Learning fast approximations of sparse coding,” inProceedings of international conference on international conference on machine learning, pp. 399–406, 2010

work page 2010
[46]

Learning step sizes for unfolded sparse coding,

P. Ablin, T. Moreau, M. Massias, and A. Gramfort, “Learning step sizes for unfolded sparse coding,” inProceedings of Advances in Neural Information Processing Systems, vol. 32, pp. 1– 11, 2019

work page 2019
[47]

Understanding approximate and unrolled dictio- nary learning for pattern recovery,

B. Malézieux, T. Moreau, and M. Kowalski, “Understanding approximate and unrolled dictio- nary learning for pattern recovery,” inInternational Conference on Learning Representations, 2022

work page 2022
[48]

Stable and interpretable unrolled dictionary learning,

B. Tolooshams and D. E. Ba, “Stable and interpretable unrolled dictionary learning,”Transac- tions on Machine Learning Research, 2022

work page 2022
[49]

On the dynamics of gradient descent for autoen- coders,

T. V . Nguyen, R. K. Wong, and C. Hegde, “On the dynamics of gradient descent for autoen- coders,” inProceedings of International Conference on Artificial Intelligence and Statistics, pp. 2858–2867, PMLR, 2019

work page 2019
[50]

Simple, efficient, and neural algorithms for sparse coding,

S. Arora, R. Ge, T. Ma, and A. Moitra, “Simple, efficient, and neural algorithms for sparse coding,” inProceedings of Conference on Learning Theory(P. Grünwald, E. Hazan, and S. Kale, eds.), vol. 40 ofProceedings of Machine Learning Research, (Paris, France), pp. 113–149, PMLR, 03–06 Jul 2015. 8

work page 2015
[51]

Theoretical linear convergence of unfolded ista and its practical weights and thresholds,

X. Chen, J. Liu, Z. Wang, and W. Yin, “Theoretical linear convergence of unfolded ista and its practical weights and thresholds,” inProceedings of Advances in Neural Information Processing Systems, vol. 31, pp. 1–11, 2018

work page 2018
[52]

Sparse coding and autoencoders,

A. Rangamani, A. Mukherjee, A. Basu, A. Arora, T. Ganapathi, S. Chin, and T. D. Tran, “Sparse coding and autoencoders,” inProceedings of IEEE International Symposium on Information Theory (ISIT), pp. 36–40, 2018

work page 2018
[53]

Convolutional dictionary learning based auto-encoders for natural exponential-family distributions,

B. Tolooshams, A. Song, S. Temereanca, and D. Ba, “Convolutional dictionary learning based auto-encoders for natural exponential-family distributions,” inProceedings of the 37th Interna- tional Conference on Machine Learning(H. D. III and A. Singh, eds.), vol. 119 ofProceedings of Machine Learning Research, pp. 9493–9503, PMLR, 7 2020

work page 2020
[54]

Noodl: Provable online dictionary learning and sparse coding,

S. Rambhatla, X. Li, and J. Haupt, “Noodl: Provable online dictionary learning and sparse coding,” inProceedings of International Conference on Learning Representations, pp. 1–11, 2018

work page 2018
[55]

Projecting assumptions: The duality between sparse autoencoders and concept geometry,

S. S. R. Hindupur, E. S. Lubana, T. Fel, and D. Ba, “Projecting assumptions: The duality between sparse autoencoders and concept geometry,”arXiv preprint arXiv:2503.01822, 2025

work page arXiv 2025
[56]

Elad,Sparse and redundant representations: from theory to applications in signal and image processing

M. Elad,Sparse and redundant representations: from theory to applications in signal and image processing. Springer Science & Business Media, 2010

work page 2010
[57]

K-svd: An algorithm for designing overcomplete dictionaries for sparse representation,

M. Aharon, M. Elad, and A. Bruckstein, “K-svd: An algorithm for designing overcomplete dictionaries for sparse representation,”IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311–4322, 2006

work page 2006
[58]

Efficient generation of transcrip- tomic profiles by random composite measurements,

B. Cleary, L. Cong, A. Cheung, E. S. Lander, and A. Regev, “Efficient generation of transcrip- tomic profiles by random composite measurements,”Cell, vol. 171, no. 6, pp. 1424–1436.e18, 2017

work page 2017
[59]

Compressed sensing for highly efficient imaging transcriptomics,

B. Cleary, B. Simonton, J. Bezney, E. Murray, S. Alam, A. Sinha, E. Habibi, J. Marshall, E. S. Lander, F. Chen,et al., “Compressed sensing for highly efficient imaging transcriptomics,” Nature Biotechnology, pp. 1–7, 2021

work page 2021
[60]

Regression shrinkage and selection via the lasso,

R. Tibshirani, “Regression shrinkage and selection via the lasso,”Journal of the Royal Statistical Society. Series B (Methodological), vol. 58, no. 1, pp. 267–288, 1996

work page 1996
[61]

Atomic decomposition by basis pursuit,

S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,”SIAM review, vol. 43, no. 1, pp. 129–159, 2001

work page 2001
[62]

Proximal algorithms,

N. Parikh and S. Boyd, “Proximal algorithms,”Foundations and Trends in optimization, vol. 1, no. 3, pp. 127–239, 2014

work page 2014
[63]

An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,

I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,”Communications on Pure and Applied Mathematics, vol. 57, no. 11, pp. 1413–1457, 2004

work page 2004
[64]

A fast iterative shrinkage-thresholding algorithm for linear inverse problems,

A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,”SIAM journal on imaging sciences, vol. 2, no. 1, pp. 183–202, 2009

work page 2009
[65]

Efficient learning of sparse representations with an energy-based model,

M. a. Ranzato, C. Poultney, S. Chopra, and Y . Cun, “Efficient learning of sparse representations with an energy-based model,” inAdvances in Neural Information Processing Systems, vol. 19, MIT Press, 2007

work page 2007
[66]

Sparse feature learning for deep belief networks,

M. a. Ranzato, Y .-l. Boureau, and Y . Cun, “Sparse feature learning for deep belief networks,” inProceedings of Advances in Neural Information Processing Systems(J. Platt, D. Koller, Y . Singer, and S. Roweis, eds.), vol. 20, 2008

work page 2008
[67]

Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures

J. R. Hershey, J. L. Roux, and F. Weninger, “Deep unfolding: Model-based inspiration of novel deep architectures,”preprint arXiv:1409.2574, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[68]

Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing,

V . Monga, Y . Li, and Y . C. Eldar, “Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing,”IEEE Signal Processing Magazine, vol. 38, no. 2, pp. 18–44, 2021. 9

work page 2021
[69]

Convolutional neural networks analyzed via convolutional sparse coding,

V . Papyan, Y . Romano, and M. Elad, “Convolutional neural networks analyzed via convolutional sparse coding,”Journal of Machine Learning Research, vol. 18, no. 83, pp. 1–52, 2017

work page 2017
[70]

Working locally thinking globally: Theoretical guarantees for convolutional sparse coding,

V . Papyan, J. Sulam, and M. Elad, “Working locally thinking globally: Theoretical guarantees for convolutional sparse coding,”IEEE Transactions on Signal Processing, vol. 65, no. 21, pp. 5687–5701, 2017

work page 2017
[71]

Deeply-sparse signal representations (ds2p),

D. Ba, “Deeply-sparse signal representations (ds2p),”IEEE Transactions on Signal Processing, vol. 68, pp. 4727–4742, 2020

work page 2020
[72]

Towards A Rigorous Science of Interpretable Machine Learning

F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable machine learning,” preprint arXiv:1702.08608, 2017. 10 A Appendix - Acknowledgments A.S. conducted this work as a Dale and Suzanne Burger SURF Fellow through the Summer Under- graduate Research Fellowship (SURF) program at Caltech and gratefully acknowledges its funding. A.A. was supp...

work page internal anchor Pith review Pith/arXiv arXiv 2017
[73]

From Proposition D.4, the architectural inference of an SAE-FNO is equivalent to SAE-CNN

work page
[74]

From Proposition D.1, the architectural inference of an SAE-CNN is equivalent to L-SAE-CNN

work page
[75]

From Proposition D.5, the architectural inference of a L-SAE-CNN is equivalent to L-SAE-FNO. By the transitive property of these equivalences, we can establish a direct architectural inference equivalence between SAE-FNO and L-SAE-FNO under the same lifting-projection conditions.■ 23

work page

[1] [1]

Brain-score: Which artificial neural network for object recognition is most brain-like?,

M. Schrimpf, J. Kubilius, H. Hong, N. J. Majaj, R. Rajalingham, E. B. Issa, K. Kar, P. Bashivan, J. Prescott-Roy, F. Geiger,et al., “Brain-score: Which artificial neural network for object recognition is most brain-like?,”BioRxiv, p. 407007, 2018

work page 2018

[2] [2]

The topology and geometry of neural representations,

B. Lin and N. Kriegeskorte, “The topology and geometry of neural representations,”Proceedings of the National Academy of Sciences, vol. 121, no. 42, p. e2317881121, 2024

work page 2024

[3] [3]

High-level visual representations in the human brain are aligned with large language models,

A. Doerig, T. C. Kietzmann, E. Allen, Y . Wu, T. Naselaris, K. Kay, and I. Charest, “High-level visual representations in the human brain are aligned with large language models,”Nature Machine Intelligence, pp. 1–15, 2025

work page 2025

[4] [4]

Stabilization of a brain–computer interface via the alignment of low-dimensional spaces of neural activity,

A. D. Degenhart, W. E. Bishop, E. R. Oby, E. C. Tyler-Kabara, S. M. Chase, A. P. Batista, and B. M. Yu, “Stabilization of a brain–computer interface via the alignment of low-dimensional spaces of neural activity,”Nature biomedical engineering, vol. 4, no. 7, pp. 672–685, 2020

work page 2020

[5] [5]

Universality and individuality in neural dynamics across large populations of recurrent networks,

N. Maheswaranathan, A. H. Williams, M. D. Golub, S. Ganguli, and D. Sussillo, “Universality and individuality in neural dynamics across large populations of recurrent networks,”Advances in Neural Information Processing Systems, vol. 32, 2019

work page 2019

[6] [6]

Equivalence between representational similarity analysis, centered kernel alignment, and canonical correlations analysis,

A. H. Williams, “Equivalence between representational similarity analysis, centered kernel alignment, and canonical correlations analysis,” inProceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models, pp. 10–23, PMLR, 2024

work page 2024

[7] [7]

Soft matching distance: A metric on neural representations that captures single-neuron tuning,

M. Khosla and A. H. Williams, “Soft matching distance: A metric on neural representations that captures single-neuron tuning,” inProceedings of UniReps: the First Workshop on Unifying Representations in Neural Models, pp. 326–341, PMLR, 2024

work page 2024

[8] [8]

Representation topology divergence: A method for comparing neural network representations.,

S. Barannikov, I. Trofimov, N. Balabin, and E. Burnaev, “Representation topology divergence: A method for comparing neural network representations.,” inInternational Conference on Machine Learning, pp. 1607–1626, PMLR, 2022

work page 2022

[9] [9]

Representational similarity analysis–connecting the branches of systems neuroscience,

N. Kriegeskorte, M. Mur, and P. A. Bandettini, “Representational similarity analysis–connecting the branches of systems neuroscience,”Frontiers in Systems Neuroscience, vol. 2, p. 4, 2008

work page 2008

[10] [10]

Position: The platonic representation hypothesis,

M. Huh, B. Cheung, T. Wang, and P. Isola, “Position: The platonic representation hypothesis,” inForty-first International Conference on Machine Learning, 2024

work page 2024

[11] [11]

Proof of a perfect platonic representation hypothesis,

L. Ziyin and I. Chuang, “Proof of a perfect platonic representation hypothesis,”arXiv preprint arXiv:2507.01098, 2025

work page arXiv 2025

[12] [12]

Neural Operator: Graph Kernel Network for Partial Differential Equations

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar, “Neural operator: Graph kernel network for partial differential equations,”arXiv preprint arXiv:2003.03485, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2003

[13] [13]

Fourier neural operator for parametric partial differential equations,

Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. liu, K. Bhattacharya, A. Stuart, and A. Anand- kumar, “Fourier neural operator for parametric partial differential equations,” inInternational Conference on Learning Representations, 2021

work page 2021

[14] [14]

Neural operator: Learning maps between function spaces with applications to pdes,

N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, and A. Anandkumar, “Neural operator: Learning maps between function spaces with applications to pdes,”Journal of Machine Learning Research, vol. 24, no. 89, pp. 1–97, 2023

work page 2023

[15] [15]

Neural operators for accelerating scientific simulations and design,

K. Azizzadenesheli, N. Kovachki, Z. Li, M. Liu-Schiaffini, J. Kossaifi, and A. Anandkumar, “Neural operators for accelerating scientific simulations and design,”Nature Reviews Physics, pp. 1–9, 2024

work page 2024

[16] [16]

Vars-fusi: Variable sampling for fast and efficient functional ultrasound imaging using neural operators,

B. Tolooshams, L. Lydia, T. Callier, J. Wang, S. Pal, A. Chandrashekar, C. Rabut, Z. Li, C. Blagden, S. L. Norman, K. Azizzadenesheli, C. Liu, M. G. Shapiro, R. A. Andersen, and A. Anandkumar, “Vars-fusi: Variable sampling for fast and efficient functional ultrasound imaging using neural operators,”bioRxiv, pp. 2025–04, 2025. 6

work page 2025

[17] [17]

Noble–neural operator with biologically-informed latent embeddings to capture experimental variability in biological neuron models,

L. Ghafourpour, V . Duruisseaux*, B. Tolooshams*, P. H. Wong, C. A. Anastassiou, and A. Anandkumar, “Noble–neural operator with biologically-informed latent embeddings to capture experimental variability in biological neuron models,”arXiv:2506.04536, 2025

work page arXiv 2025

[18] [18]

FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators

J. Pathak, S. Subramanian, P. Harrington, S. Raja, A. Chattopadhyay, M. Mardani, T. Kurth, D. Hall, Z. Li, K. Azizzadenesheli,et al., “Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators,”arXiv preprint arXiv:2202.11214, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[19] [19]

Geometry-informed neural operator for large-scale 3d pdes,

Z. Li, N. Kovachki, C. Choy, B. Li, J. Kossaifi, S. Otta, M. A. Nabian, M. Stadler, C. Hundt, K. Azizzadenesheli,et al., “Geometry-informed neural operator for large-scale 3d pdes,”Ad- vances in Neural Information Processing Systems, vol. 36, 2024

work page 2024

[20] [20]

Unify- ing subsampling pattern variations for compressed sensing mri with neural operators,

A. S. Jatyani, J. Wang, Z. Wu, M. Liu-Schiaffini, B. Tolooshams, and A. Anandkumar, “Unify- ing subsampling pattern variations for compressed sensing mri with neural operators,”IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025

work page 2025

[21] [21]

Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav),

B. Kim, M. Wattenberg, J. Gilmer, C. Cai, J. Wexler, F. Viegas,et al., “Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav),” inInternational conference on machine learning, pp. 2668–2677, PMLR, 2018

work page 2018

[22] [22]

Emergence of simple-cell receptive field properties by learning a sparse code for natural images,

B. A. Olshausen and D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,”Nature, vol. 381, no. 6583, pp. 607–609, 1996

work page 1996

[23] [23]

Sparse coding with an overcomplete basis set: A strategy employed by v1?,

B. A. Olshausen and D. J. Field, “Sparse coding with an overcomplete basis set: A strategy employed by v1?,”Vision research, vol. 37, no. 23, pp. 3311–3325, 1997

work page 1997

[24] [24]

Toy Models of Superposition

N. Elhage, T. Hume, C. Olsson, N. Schiefer, T. Henighan, S. Kravec, Z. Hatfield-Dodds, R. Lasenby, D. Drain, C. Chen,et al., “Toy models of superposition,”arXiv:2209.10652, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[25] [25]

Sparse autoencoders find highly interpretable features in language models,

R. Huben, H. Cunningham, L. R. Smith, A. Ewart, and L. Sharkey, “Sparse autoencoders find highly interpretable features in language models,” inThe Twelfth International Conference on Learning Representations, 2023

work page 2023

[26] [26]

Towards monosemanticity: Decomposing language models with dictionary learning,

T. Bricken, A. Templeton, J. Batson, B. Chen, A. Jermyn, T. Conerly, N. Turner, C. Anil, C. Denison, A. Askell,et al., “Towards monosemanticity: Decomposing language models with dictionary learning,”Transformer Circuits Thread, vol. 2, 2023

work page 2023

[27] [27]

Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders

S. Rajamanoharan, T. Lieberum, N. Sonnerat, A. Conmy, V . Varma, J. Kramár, and N. Nanda, “Jumping ahead: Improving reconstruction fidelity with jumprelu sparse autoencoders,”arXiv preprint arXiv:2407.14435, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[28] [28]

The linear representation hypothesis and the geometry of large language models,

K. Park, Y . J. Choe, and V . Veitch, “The linear representation hypothesis and the geometry of large language models,” inInternational Conference on Machine Learning, pp. 39643–39666, PMLR, 2024

work page 2024

[29] [29]

Scaling monosemanticity: Extracting interpretable features from claude 3 sonnet,

A. Templeton, T. Conerly, J. Marcus, J. Lindsey, T. Bricken, B. Chen, A. Pearce, C. Citro, E. Ameisen, A. Jones, H. Cunningham, N. L. Turner, C. McDougall, M. MacDiarmid, C. D. Freeman, T. R. Sumers, E. Rees, J. Batson, A. Jermyn, S. Carter, C. Olah, and T. Henighan, “Scaling monosemanticity: Extracting interpretable features from claude 3 sonnet,”Transfo...

work page 2024

[30] [30]

Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2,

T. Lieberum, S. Rajamanoharan, A. Conmy, L. Smith, N. Sonnerat, V . Varma, J. Kramar, A. Dragan, R. Shah, and N. Nanda, “Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2,” inThe 7th BlackboxNLP Workshop, 2024

work page 2024

[31] [31]

Scaling and evaluating sparse autoencoders,

L. Gao, T. D. la Tour, H. Tillman, G. Goh, R. Troll, A. Radford, I. Sutskever, J. Leike, and J. Wu, “Scaling and evaluating sparse autoencoders,” inThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[32] [32]

Archetypal sae: Adaptive and stable dictionary learning for concept extraction in large vision models

T. Fel, E. S. Lubana, J. S. Prince, M. Kowal, V . Boutin, I. Papadimitriou, B. Wang, M. Wat- tenberg, D. Ba, and T. Konkle, “Archetypal sae: Adaptive and stable dictionary learning for concept extraction in large vision models,”arXiv preprint arXiv:2502.12892, 2025. 7

work page arXiv 2025

[33] [33]

Sparse feature circuits: Discovering and editing interpretable causal graphs in language models,

S. Marks, C. Rager, E. J. Michaud, Y . Belinkov, D. Bau, and A. Mueller, “Sparse feature circuits: Discovering and editing interpretable causal graphs in language models,” inThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[34] [34]

SAEBench: A comprehensive benchmark for sparse autoencoders in language model interpretability,

A. Karvonen, C. Rager, J. Lin, C. Tigges, J. I. Bloom, D. Chanin, Y .-T. Lau, E. Farrell, C. S. Mc- Dougall, K. Ayonrinde, D. Till, M. Wearden, A. Conmy, S. Marks, and N. Nanda, “SAEBench: A comprehensive benchmark for sparse autoencoders in language model interpretability,” in Forty-second International Conference on Machine Learning, 2025

work page 2025

[35] [35]

C. W. Groetsch and C. Groetsch,Inverse problems in the mathematical sciences, vol. 52. Springer, 1993

work page 1993

[36] [36]

Hastie, R

T. Hastie, R. Tibshirani, and M. Wainwright,Statistical learning with sparsity: the lasso and generalizations. CRC press, 2015

work page 2015

[37] [37]

Compressed sensing,

D. L. Donoho, “Compressed sensing,”IEEE Transactions on information theory, vol. 52, no. 4, pp. 1289–1306, 2006

work page 2006

[38] [38]

Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,

E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,”IEEE Transactions on information theory, vol. 52, no. 2, pp. 489–509, 2006

work page 2006

[39] [39]

An introduction to compressive sampling,

E. J. Candès and M. B. Wakin, “An introduction to compressive sampling,”IEEE signal processing magazine, vol. 25, no. 2, pp. 21–30, 2008

work page 2008

[40] [40]

Estimating unknown sparsity in compressed sensing,

M. Lopes, “Estimating unknown sparsity in compressed sensing,” inInternational Conference on Machine Learning, pp. 217–225, PMLR, 2013

work page 2013

[41] [41]

Online dictionary learning for sparse coding,

J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online dictionary learning for sparse coding,” in Proceedings of the 26th annual international conference on machine learning, pp. 689–696, 2009

work page 2009

[42] [42]

Learning sparsely used overcomplete dictionaries via alternating minimization,

A. Agarwal, A. Anandkumar, P. Jain, and P. Netrapalli, “Learning sparsely used overcomplete dictionaries via alternating minimization,”SIAM Journal on Optimization, vol. 26, no. 4, pp. 2775–2799, 2016

work page 2016

[43] [43]

Alternating minimization for dictionary learning: Local convergence guarantees,

N. S. Chatterji and P. L. Bartlett, “Alternating minimization for dictionary learning: Local convergence guarantees,”arXiv preprint arXiv:1711.03634, 2017

work page arXiv 2017

[44] [44]

Tolooshams,Deep Learning for Inverse Problems in Engineering and Science

B. Tolooshams,Deep Learning for Inverse Problems in Engineering and Science. PhD thesis, Harvard University, 2023

work page 2023

[45] [45]

Learning fast approximations of sparse coding,

K. Gregor and Y . LeCun, “Learning fast approximations of sparse coding,” inProceedings of international conference on international conference on machine learning, pp. 399–406, 2010

work page 2010

[46] [46]

Learning step sizes for unfolded sparse coding,

P. Ablin, T. Moreau, M. Massias, and A. Gramfort, “Learning step sizes for unfolded sparse coding,” inProceedings of Advances in Neural Information Processing Systems, vol. 32, pp. 1– 11, 2019

work page 2019

[47] [47]

Understanding approximate and unrolled dictio- nary learning for pattern recovery,

B. Malézieux, T. Moreau, and M. Kowalski, “Understanding approximate and unrolled dictio- nary learning for pattern recovery,” inInternational Conference on Learning Representations, 2022

work page 2022

[48] [48]

Stable and interpretable unrolled dictionary learning,

B. Tolooshams and D. E. Ba, “Stable and interpretable unrolled dictionary learning,”Transac- tions on Machine Learning Research, 2022

work page 2022

[49] [49]

On the dynamics of gradient descent for autoen- coders,

T. V . Nguyen, R. K. Wong, and C. Hegde, “On the dynamics of gradient descent for autoen- coders,” inProceedings of International Conference on Artificial Intelligence and Statistics, pp. 2858–2867, PMLR, 2019

work page 2019

[50] [50]

Simple, efficient, and neural algorithms for sparse coding,

S. Arora, R. Ge, T. Ma, and A. Moitra, “Simple, efficient, and neural algorithms for sparse coding,” inProceedings of Conference on Learning Theory(P. Grünwald, E. Hazan, and S. Kale, eds.), vol. 40 ofProceedings of Machine Learning Research, (Paris, France), pp. 113–149, PMLR, 03–06 Jul 2015. 8

work page 2015

[51] [51]

Theoretical linear convergence of unfolded ista and its practical weights and thresholds,

X. Chen, J. Liu, Z. Wang, and W. Yin, “Theoretical linear convergence of unfolded ista and its practical weights and thresholds,” inProceedings of Advances in Neural Information Processing Systems, vol. 31, pp. 1–11, 2018

work page 2018

[52] [52]

Sparse coding and autoencoders,

A. Rangamani, A. Mukherjee, A. Basu, A. Arora, T. Ganapathi, S. Chin, and T. D. Tran, “Sparse coding and autoencoders,” inProceedings of IEEE International Symposium on Information Theory (ISIT), pp. 36–40, 2018

work page 2018

[53] [53]

Convolutional dictionary learning based auto-encoders for natural exponential-family distributions,

B. Tolooshams, A. Song, S. Temereanca, and D. Ba, “Convolutional dictionary learning based auto-encoders for natural exponential-family distributions,” inProceedings of the 37th Interna- tional Conference on Machine Learning(H. D. III and A. Singh, eds.), vol. 119 ofProceedings of Machine Learning Research, pp. 9493–9503, PMLR, 7 2020

work page 2020

[54] [54]

Noodl: Provable online dictionary learning and sparse coding,

S. Rambhatla, X. Li, and J. Haupt, “Noodl: Provable online dictionary learning and sparse coding,” inProceedings of International Conference on Learning Representations, pp. 1–11, 2018

work page 2018

[55] [55]

Projecting assumptions: The duality between sparse autoencoders and concept geometry,

S. S. R. Hindupur, E. S. Lubana, T. Fel, and D. Ba, “Projecting assumptions: The duality between sparse autoencoders and concept geometry,”arXiv preprint arXiv:2503.01822, 2025

work page arXiv 2025

[56] [56]

Elad,Sparse and redundant representations: from theory to applications in signal and image processing

M. Elad,Sparse and redundant representations: from theory to applications in signal and image processing. Springer Science & Business Media, 2010

work page 2010

[57] [57]

K-svd: An algorithm for designing overcomplete dictionaries for sparse representation,

M. Aharon, M. Elad, and A. Bruckstein, “K-svd: An algorithm for designing overcomplete dictionaries for sparse representation,”IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311–4322, 2006

work page 2006

[58] [58]

Efficient generation of transcrip- tomic profiles by random composite measurements,

B. Cleary, L. Cong, A. Cheung, E. S. Lander, and A. Regev, “Efficient generation of transcrip- tomic profiles by random composite measurements,”Cell, vol. 171, no. 6, pp. 1424–1436.e18, 2017

work page 2017

[59] [59]

Compressed sensing for highly efficient imaging transcriptomics,

B. Cleary, B. Simonton, J. Bezney, E. Murray, S. Alam, A. Sinha, E. Habibi, J. Marshall, E. S. Lander, F. Chen,et al., “Compressed sensing for highly efficient imaging transcriptomics,” Nature Biotechnology, pp. 1–7, 2021

work page 2021

[60] [60]

Regression shrinkage and selection via the lasso,

R. Tibshirani, “Regression shrinkage and selection via the lasso,”Journal of the Royal Statistical Society. Series B (Methodological), vol. 58, no. 1, pp. 267–288, 1996

work page 1996

[61] [61]

Atomic decomposition by basis pursuit,

S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,”SIAM review, vol. 43, no. 1, pp. 129–159, 2001

work page 2001

[62] [62]

Proximal algorithms,

N. Parikh and S. Boyd, “Proximal algorithms,”Foundations and Trends in optimization, vol. 1, no. 3, pp. 127–239, 2014

work page 2014

[63] [63]

An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,

I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,”Communications on Pure and Applied Mathematics, vol. 57, no. 11, pp. 1413–1457, 2004

work page 2004

[64] [64]

A fast iterative shrinkage-thresholding algorithm for linear inverse problems,

A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,”SIAM journal on imaging sciences, vol. 2, no. 1, pp. 183–202, 2009

work page 2009

[65] [65]

Efficient learning of sparse representations with an energy-based model,

M. a. Ranzato, C. Poultney, S. Chopra, and Y . Cun, “Efficient learning of sparse representations with an energy-based model,” inAdvances in Neural Information Processing Systems, vol. 19, MIT Press, 2007

work page 2007

[66] [66]

Sparse feature learning for deep belief networks,

M. a. Ranzato, Y .-l. Boureau, and Y . Cun, “Sparse feature learning for deep belief networks,” inProceedings of Advances in Neural Information Processing Systems(J. Platt, D. Koller, Y . Singer, and S. Roweis, eds.), vol. 20, 2008

work page 2008

[67] [67]

Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures

J. R. Hershey, J. L. Roux, and F. Weninger, “Deep unfolding: Model-based inspiration of novel deep architectures,”preprint arXiv:1409.2574, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[68] [68]

Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing,

V . Monga, Y . Li, and Y . C. Eldar, “Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing,”IEEE Signal Processing Magazine, vol. 38, no. 2, pp. 18–44, 2021. 9

work page 2021

[69] [69]

Convolutional neural networks analyzed via convolutional sparse coding,

V . Papyan, Y . Romano, and M. Elad, “Convolutional neural networks analyzed via convolutional sparse coding,”Journal of Machine Learning Research, vol. 18, no. 83, pp. 1–52, 2017

work page 2017

[70] [70]

Working locally thinking globally: Theoretical guarantees for convolutional sparse coding,

V . Papyan, J. Sulam, and M. Elad, “Working locally thinking globally: Theoretical guarantees for convolutional sparse coding,”IEEE Transactions on Signal Processing, vol. 65, no. 21, pp. 5687–5701, 2017

work page 2017

[71] [71]

Deeply-sparse signal representations (ds2p),

D. Ba, “Deeply-sparse signal representations (ds2p),”IEEE Transactions on Signal Processing, vol. 68, pp. 4727–4742, 2020

work page 2020

[72] [72]

Towards A Rigorous Science of Interpretable Machine Learning

F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable machine learning,” preprint arXiv:1702.08608, 2017. 10 A Appendix - Acknowledgments A.S. conducted this work as a Dale and Suzanne Burger SURF Fellow through the Summer Under- graduate Research Fellowship (SURF) program at Caltech and gratefully acknowledges its funding. A.A. was supp...

work page internal anchor Pith review Pith/arXiv arXiv 2017

[73] [73]

From Proposition D.4, the architectural inference of an SAE-FNO is equivalent to SAE-CNN

work page

[74] [74]

From Proposition D.1, the architectural inference of an SAE-CNN is equivalent to L-SAE-CNN

work page

[75] [75]

From Proposition D.5, the architectural inference of a L-SAE-CNN is equivalent to L-SAE-FNO. By the transitive property of these equivalences, we can establish a direct architectural inference equivalence between SAE-FNO and L-SAE-FNO under the same lifting-projection conditions.■ 23

work page