pith. machine review for the scientific record. sign in

arxiv: 2605.08891 · v1 · submitted 2026-05-09 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Bilinear autoencoders find interpretable manifolds

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:21 UTC · model grok-4.3

classification 💻 cs.LG
keywords bilinear autoencodersquadratic latentsinterpretable manifoldssparse autoencodersneural network interpretabilitymanifold discoverylanguage modelsactivation decomposition
0
0 comments X

The pith

Bilinear autoencoders with quadratic latents capture multi-dimensional manifolds that linear methods miss in neural activations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that sparse autoencoders based on linear latents cannot fully represent concepts spanning multi-dimensional manifolds without extra steps. Bilinear autoencoders address this by using quadratic latents that decompose activations into low-rank quadratic forms, allowing geometric analysis independent of specific inputs. If true, the approach reveals that such manifolds are common in language model representations and that composite latents from them reduce reconstruction error more effectively than linear baselines. A reader would care because it offers a mathematically tractable way to interpret nonlinear structures in network computations while challenging the assumption of purely linear representations.

Core claim

Bilinear autoencoders decompose activations into low-rank quadratic forms that compose linearly in weight space and support input-independent geometric analysis. This enables detection of multi-dimensional geometries, which experiments show are prevalent, and composite latents capture them to improve reconstruction error systematically in language models. Autoencoders with different geometric priors still recover the same input subspace even when their dictionary entries differ, providing an unsupervised tool for manifold discovery demonstrated via an interactive visualizer.

What carries the argument

Bilinear decomposition of activations into low-rank quadratic forms, which produces composite latents for capturing manifolds.

If this is right

  • Composite quadratic latents systematically lower reconstruction error compared to linear ones in language models.
  • Models with different geometric priors converge on the same input subspace.
  • Multi-dimensional geometries appear frequently in the learned representations.
  • The method enables unsupervised manifold discovery with tools like interactive visualizers for specific models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The technique could extend to other neural architectures to uncover similar manifold structures beyond language models.
  • It implies that interpretability methods may need to move past linear assumptions to handle feature interactions more accurately.
  • Similar decompositions might serve as a general tool for analyzing geometric properties in activation spaces across domains.

Load-bearing premise

The quadratic latents and bilinear decompositions identify genuinely meaningful concepts in the model's computation rather than artifacts of the fitting process.

What would settle it

A test showing that quadratic latents produce no consistent improvement in reconstruction error over linear autoencoders on held-out activations or that the identified manifolds do not align with observable changes in model behavior.

Figures

Figures reproduced from arXiv: 2605.08891 by Geraint Wiggins, Jose Oramas, Thomas Dooms, Ward Gauderis.

Figure 1
Figure 1. Figure 1: Three hand-picked latent features from our interactive viewer using Qwen 3.5: [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Most autoencoders reconstruct their inputs nonlinearly. Instead, bilinear autoencoders [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Linear atoms gain no expressivity under composition; quadratic atoms compose into quadric geometry. (Top) Linear atoms are signed half-space detectors: their sum is another half-space, merely averaging the directions. (Middle and bottom) Quadratic atoms are symmetric slabs measuring energy along a direction, ignoring phase. Composing two rank-1 forms yields a rank-2 symmetric matrix, unlocking quadric geom… view at source ↗
Figure 4
Figure 4. Figure 4: Reconstruction error across even layers. The three priors consistently follow the same [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of the captured structure across ranks between quadratic and composite priors. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Frobenius (global) and Hungarian (per-latent) similarity between bilinear autoencoders [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Hyperparameters related to data, the optimiser, and the architecture, respectively. [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Diagrammatic formulation of Equation 7. Lines indicate tensor contractions over that index, [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Diagrammatic equation for computing the kernel [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Sweep over selected layers on Gemma-3-1B using the hyperparameters discussed in [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Sweep over selected layers on Llama-3.2-1B using the hyperparameters discussed in [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Screenshot of the interactive viewer user interface for a particular latent: [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: The first few manifolds by index from our interactive viewer (indices 1 and 4 were [PITH_FULL_IMAGE:figures/full_fig_p021_13.png] view at source ↗
read the original abstract

Sparse autoencoders have become a standard tool for uncovering interpretable latent representations in neural networks. Yet salient concepts often span manifolds that current linear methods cannot capture without post hoc analysis. This paper uses quadratic latents to close this gap: we implement these with bilinear autoencoders, which decompose activations into low-rank quadratic forms, compose linearly in weight space, and admit input-independent geometric analysis. This qualitative difference in what concepts quadratic latents can detect challenges the standard linear representation hypothesis. Our experiments and visualisations show that multi-dimensional geometries are highly prevalent and that composite latents capture them well, systematically improving reconstruction error in language models. Furthermore, we show that autoencoders with varying geometric priors recover the same input subspace despite their dictionary entries being distinct. Practically, these models serve as an unsupervised tool for manifold discovery, which we demonstrate through an interactive online visualizer for Qwen 3.5. This is a step toward nonlinear but mathematically tractable latent representations whose composition is expressive and interpretable by design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces bilinear autoencoders to realize quadratic latents that decompose activations into low-rank quadratic forms, enabling capture of multi-dimensional geometric manifolds in neural network representations (especially language models) that linear sparse autoencoders cannot access without post-hoc analysis. It reports that these composite latents yield systematically lower reconstruction error, that multi-dimensional geometries are prevalent, that varying geometric priors recover the same input subspace, and that the approach supports an interactive unsupervised manifold-discovery visualizer for models like Qwen 3.5.

Significance. If the central claims hold after addressing capacity controls, the work supplies a mathematically tractable nonlinear extension of the sparse-autoencoder toolkit, directly challenging the linear representation hypothesis with evidence of prevalent composite structures and offering a practical visualization interface. This could shift interpretability research toward explicitly quadratic but still composable latents.

major comments (2)
  1. [Experiments] Experiments section (and abstract): the reported systematic improvement in reconstruction error and the claim that quadratic latents capture multi-dimensional manifolds inaccessible to linear methods lack controls that match the effective degrees of freedom or parameter count of the bilinear model against a linear SAE baseline (e.g., by enlarging the linear dictionary size). Without such matched-capacity ablations, the observed gains cannot be attributed specifically to the quadratic decomposition rather than increased expressivity.
  2. [Methods] Methods / bilinear formulation: the statement that the decomposition 'admits input-independent geometric analysis' and that 'composite latents capture them well' requires an explicit derivation or lemma showing how the low-rank quadratic terms produce interpretable, geometrically meaningful composites that are not artifacts of the fitting procedure; this is load-bearing for the challenge to the linear representation hypothesis.
minor comments (2)
  1. [Abstract] The abstract and introduction should cite the exact model, layer, and dataset used for the Qwen 3.5 visualizer to allow immediate reproducibility.
  2. [Figures] Figure captions would benefit from quantitative metrics (e.g., reconstruction MSE deltas or subspace overlap scores) alongside the qualitative visualizations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which has helped us improve the clarity and rigor of our work. We address each major comment below, providing additional experiments and theoretical derivations as requested.

read point-by-point responses
  1. Referee: Experiments section (and abstract): the reported systematic improvement in reconstruction error and the claim that quadratic latents capture multi-dimensional manifolds inaccessible to linear methods lack controls that match the effective degrees of freedom or parameter count of the bilinear model against a linear SAE baseline (e.g., by enlarging the linear dictionary size). Without such matched-capacity ablations, the observed gains cannot be attributed specifically to the quadratic decomposition rather than increased expressivity.

    Authors: We agree that controlling for model capacity is essential to attribute improvements to the bilinear structure rather than increased expressivity. In the revised version, we have included new ablations in the Experiments section where the linear SAE dictionary size is expanded to match the parameter count of the bilinear autoencoder. These matched-capacity comparisons confirm that the bilinear model still achieves lower reconstruction error and better captures multi-dimensional manifolds. We have also updated the abstract to reflect these findings. revision: yes

  2. Referee: Methods / bilinear formulation: the statement that the decomposition 'admits input-independent geometric analysis' and that 'composite latents capture them well' requires an explicit derivation or lemma showing how the low-rank quadratic terms produce interpretable, geometrically meaningful composites that are not artifacts of the fitting procedure; this is load-bearing for the challenge to the linear representation hypothesis.

    Authors: We appreciate this point, as it strengthens the theoretical foundation. We have added an explicit lemma (Lemma 2 in the revised Methods section) that derives the geometric properties of the low-rank quadratic terms. The lemma shows that each composite latent corresponds to a quadratic form whose level sets define interpretable manifolds in the input space, independent of specific activations. Furthermore, we prove that under the low-rank constraint and orthogonality conditions, these composites are unique and not fitting artifacts. This directly supports our challenge to the linear representation hypothesis by demonstrating that quadratic latents can represent multi-dimensional structures in a mathematically tractable way. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on experimental results

full rationale

The paper introduces bilinear autoencoders as a method to decompose activations into low-rank quadratic forms and reports empirical findings from experiments on language models, including improved reconstruction error and visualizations of multi-dimensional geometries. No load-bearing steps reduce by construction to self-definition, fitted inputs renamed as predictions, or self-citation chains. The abstract and described content ground claims in observed outputs rather than definitional equivalences or ansatzes smuggled via prior work. Absence of parameter-matched controls is a methodological concern for claim strength but does not constitute circularity under the specified patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The approach relies on the assumption that quadratic forms can represent interpretable manifolds, with the bilinear composition as the key technical innovation; no specific free parameters or standard axioms are detailed in the abstract.

invented entities (1)
  • bilinear autoencoder no independent evidence
    purpose: to implement quadratic latents via decomposition into low-rank quadratic forms
    New structure introduced to capture multi-dimensional concepts.

pith-pipeline@v0.9.0 · 5478 in / 1074 out tokens · 42007 ms · 2026-05-12T01:21:43.057160+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages · 3 internal anchors

  1. [1]

    2025 , eprint=

    Not All Language Model Features Are One-Dimensionally Linear , author=. 2025 , eprint=

  2. [2]

    2024 , eprint=

    The Linear Representation Hypothesis and the Geometry of Large Language Models , author=. 2024 , eprint=

  3. [3]

    2023 , eprint=

    Sparse Autoencoders Find Highly Interpretable Features in Language Models , author=. 2023 , eprint=

  4. [4]

    2025 , eprint=

    Sparse Autoencoders Trained on the Same Data Learn Different Features , author=. 2025 , eprint=

  5. [5]

    2025 , eprint=

    Position: Mechanistic Interpretability Should Prioritize Feature Consistency in SAEs , author=. 2025 , eprint=

  6. [6]

    2025 , month =

    Jai Bhagat and Sara Molas Medina and Giorgi Giglemiani and Stefan Heimersheim , title =. 2025 , month =

  7. [7]

    2022 , month =

    Chris Olah , title =. 2022 , month =

  8. [8]

    2020 , eprint=

    GLU Variants Improve Transformer , author=. 2020 , eprint=

  9. [9]

    Language Models are Unsupervised Multitask Learners , author=

  10. [10]

    , title =

    Hoyer, Patrik O. , title =. J. Mach. Learn. Res. , month = dec, pages =. 2004 , issue_date =

  11. [11]

    2024 , eprint=

    BatchTopK Sparse Autoencoders , author=. 2024 , eprint=

  12. [12]

    2023 , journal=

    Towards Monosemanticity: Decomposing Language Models With Dictionary Learning , author=. 2023 , journal=

  13. [13]

    International Conference on Machine Learning , pages=

    Pythia: A suite for analyzing large language models across training and scaling , author=. International Conference on Machine Learning , pages=. 2023 , organization=

  14. [14]

    2025 , eprint=

    Dense SAE Latents Are Features, Not Bugs , author=. 2025 , eprint=

  15. [15]

    2017 , eprint=

    Language Modeling with Gated Convolutional Networks , author=. 2017 , eprint=

  16. [16]

    Qwen3 Technical Report

    Qwen3 Technical Report , author=. arXiv preprint arXiv:2505.09388 , year=

  17. [17]

    2022 , eprint=

    FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness , author=. 2022 , eprint=

  18. [18]

    2025 , eprint=

    Sparse Autoencoders Do Not Find Canonical Units of Analysis , author=. 2025 , eprint=

  19. [19]

    2024 , eprint=

    Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs , author=. 2024 , eprint=

  20. [20]

    2025 , eprint=

    Learning Multi-Level Features with Matryoshka Sparse Autoencoders , author=. 2025 , eprint=

  21. [21]

    G. E. Hinton and R. R. Salakhutdinov , title =. Science , volume =. 2006 , doi =

  22. [22]

    2026 , eprint=

    Do Sparse Autoencoders Capture Concept Manifolds? , author=. 2026 , eprint=

  23. [23]

    2025 , eprint=

    The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability? , author=. 2025 , eprint=

  24. [24]

    2025 , eprint=

    Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry , author=. 2025 , eprint=

  25. [25]

    2025 , eprint=

    Bilinear MLPs enable weight-based mechanistic interpretability , author=. 2025 , eprint=

  26. [26]

    2021 , eprint=

    Compositionality as we see it, everywhere around us , author=. 2021 , eprint=

  27. [27]

    2025 , eprint=

    Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable? , author=. 2025 , eprint=

  28. [28]

    2022 , eprint=

    Toy Models of Superposition , author=. 2022 , eprint=

  29. [29]

    2025 , eprint=

    Stochastic Parameter Decomposition , author=. 2025 , eprint=

  30. [30]

    2025 , eprint=

    The Origins of Representation Manifolds in Large Language Models , author=. 2025 , eprint=

  31. [31]

    2024 , eprint=

    Scaling and evaluating sparse autoencoders , author=. 2024 , eprint=

  32. [32]

    2024 , eprint=

    Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders , author=. 2024 , eprint=

  33. [33]

    Tensor Diagram Notation , author =

  34. [34]

    2025 , eprint=

    A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders , author=. 2025 , eprint=

  35. [35]

    Ameisen, Emmanuel and Lindsey, Jack and Pearce, Adam and Gurnee, Wes and Turner, Nicholas L. and Chen, Brian and Citro, Craig and Abrahams, David and Carter, Shan and Hosmer, Basil and Marcus, Jonathan and Sklar, Michael and Templeton, Adly and Bricken, Trenton and McDougall, Callum and Cunningham, Hoagy and Henighan, Thomas and Jermyn, Adam and Jones, An...

  36. [36]

    2024 , url =

    Keller Jordan and Yuchen Jin and Vlado Boza and You Jiacheng and Franz Cesista and Laker Newhouse and Jeremy Bernstein , title =. 2024 , url =

  37. [37]

    Tull, Sean and Lorenz, Robin and Clark, Stephen and Khan, Ilyas and Coecke, Bob , year =. Towards. doi:10.48550/arXiv.2406.17583 , urldate =. arXiv , keywords =:2406.17583 , primaryclass =

  38. [38]

    Compositionality

    Dooms, Thomas and Gauderis, Ward and Wiggins, Geraint and Mogrovejo, Jose Antonio Oramas , year =. Compositionality. Connecting

  39. [39]

    , title =

    Gr\"unwald, Peter D. , title =. 2007 , isbn =

  40. [40]

    arXiv preprint arXiv:2511.13653 , year=

    Weight-Sparse Transformers Have Interpretable Circuits , author =. doi:10.48550/arXiv.2511.13653 , urldate =. arXiv , keywords =:2511.13653 , primaryclass =

  41. [41]

    2024 , month =

    What is a Linear Representation? What is a Multidimensional Feature? , author =. 2024 , month =

  42. [42]

    2010 , publisher=

    Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing , author=. 2010 , publisher=

  43. [43]

    Nature , volume=

    Emergence of simple-cell receptive field properties by learning a sparse code for natural images , author=. Nature , volume=. 1996 , publisher=

  44. [44]

    Qwen3.5: Accelerating Productivity with Native Multimodal Agents , url =

    Qwen Team , month =. Qwen3.5: Accelerating Productivity with Native Multimodal Agents , url =

  45. [45]

    Aharon, Michal and Elad, Michael and Bruckstein, Alfred , journal=. K-. 2006 , doi=

  46. [46]

    Neural Computation , volume=

    Learning Overcomplete Representations , author=. Neural Computation , volume=. 2000 , doi=

  47. [47]

    Proceedings of the 26th Annual International Conference on Machine Learning (ICML) , pages=

    Online Dictionary Learning for Sparse Coding , author=. Proceedings of the 26th Annual International Conference on Machine Learning (ICML) , pages=. 2009 , doi=

  48. [48]

    Advances in Neural Information Processing Systems (NeurIPS) , volume=

    Efficient Sparse Coding Algorithms , author=. Advances in Neural Information Processing Systems (NeurIPS) , volume=

  49. [49]

    Neural Computation , volume=

    An Information-Maximization Approach to Blind Separation and Blind Deconvolution , author=. Neural Computation , volume=. 1995 , doi=

  50. [50]

    Neural Computation , volume=

    Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , author=. Neural Computation , volume=. 2000 , doi=

  51. [51]

    Journal of Machine Learning Research , volume=

    Proximal Methods for Hierarchical Sparse Coding , author=. Journal of Machine Learning Research , volume=

  52. [52]

    Statistical Science , volume=

    Structured Sparsity through Convex Optimization , author=. Statistical Science , volume=. 2012 , doi=

  53. [53]

    Automatica , volume=

    Modeling by Shortest Data Description , author=. Automatica , volume=. 1978 , doi=

  54. [54]

    Sharkey, Lee and Chughtai, Bilal and Batson, Joshua and Lindsey, Jack and Wu, Jeff and Bushnaq, Lucius and. Open. doi:10.48550/arXiv.2501.16496 , urldate =. arXiv , langid =:2501.16496 , primaryclass =

  55. [55]

    The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

    The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale , author=. The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

  56. [56]

    2025 , eprint=

    Gemma 3 Technical Report , author=. 2025 , eprint=

  57. [57]

    2024 , eprint=

    The Llama 3 Herd of Models , author=. 2024 , eprint=

  58. [58]

    2019 , eprint=

    PyTorch: An Imperative Style, High-Performance Deep Learning Library , author=. 2019 , eprint=

  59. [59]

    , title =

    Hitchcock, Frank L. , title =. Journal of Mathematics and Physics , volume =. doi:https://doi.org/10.1002/sapm192761164 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1002/sapm192761164 , year =

  60. [60]

    and Ayonrinde, Kola and Wiggins, Geraint A

    Gauderis, Ward and Dooms, Thomas and Homer, Steven T. and Ayonrinde, Kola and Wiggins, Geraint A. , month = jul, year =. From. Proceedings of the 2nd

  61. [61]

    Compositional

    Lewis, Martha , month = oct, year =. Compositional. Proceedings -. doi:10.26615/978-954-452-056-4_075 , abstract =

  62. [62]

    ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning , url =

    Le, Quoc and Karpenko, Alexandre and Ngiam, Jiquan and Ng, Andrew , booktitle =. ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning , url =

  63. [63]

    Into the Rabbit Hull: From Task-Relevant Concepts in DINO to Minkowski Geometry

    Fel, Thomas and Wang, Binxu and Lepori, Michael A. and Kowal, Matthew and Lee, Andrew and Balestriero, Randall and Joseph, Sonia and Lubana, Ekdeep S. and Konkle, Talia and Ba, Demba and Wattenberg, Martin , month = oct, year =. Into the. doi:10.48550/arXiv.2510.08638 , abstract =

  64. [64]

    2025 , author =

    Learning. 2025 , author =

  65. [65]

    and Nasrabadi, Nasser M

    Nguyen, Hien Van and Patel, Vishal M. and Nasrabadi, Nasser M. and Chellappa, Rama , month = mar, year =. Kernel dictionary learning , isbn =. 2012. doi:10.1109/ICASSP.2012.6288305 , urldate =

  66. [66]

    Nguyen, Hien Van and Patel, Vishal M and Nasrabadi, Nasser M and Chellappa, Rama , file =

  67. [67]

    Luo, Yifan and Zhan, Yang and Jiang, Jiedong and Liu, Tianyang and Wu, Mingrui and Zhou, Zhennan and Dong, Bin , month = feb, year =. From. doi:10.48550/arXiv.2602.11881 , abstract =