pith. sign in

arxiv: 2607.01336 · v1 · pith:YPDNR633new · submitted 2026-07-01 · 🪐 quant-ph · cond-mat.dis-nn· cond-mat.str-el· cs.AI· cs.LG

Mechanistic Interpretability and Causal Feature Steering of Neural Quantum States via Sparse Autoencoders

Pith reviewed 2026-07-03 20:14 UTC · model grok-4.3

classification 🪐 quant-ph cond-mat.dis-nncond-mat.str-elcs.AIcs.LG
keywords neural quantum statessparse autoencodersmechanistic interpretabilitycausal feature interventionquantum many-body systemsorder parametersvariational wavefunctions
0
0 comments X

The pith

Sparse autoencoders extract features from neural quantum states that causally steer physical observables through single-feature interventions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that neural quantum states encode physical observables internally in a way that sparse autoencoders can uncover without any labels. These features correlate with quantities such as order parameters and magnetization across ground states and dynamics. Post-training edits to one extracted feature then adjust the corresponding observable in a smooth, monotonic manner while the variational energy remains nearly fixed. This shows that the networks learn structured representations of physics rather than opaque fits to data alone.

Core claim

Neural quantum states contain internal activations that sparse autoencoders decompose into features correlating with physical observables. Intervention on a single such feature after training produces smooth, monotonic changes to the matching observable while the variational energy stays nearly unchanged, confirming causal influence rather than mere correlation.

What carries the argument

Sparse autoencoders applied to the residual stream activations of the neural quantum state network, which isolate sparse features that serve as direct controls for observables.

If this is right

  • NQS predictions for observables can be diagnosed and adjusted post-training without retraining the full network.
  • Physical information such as order parameters is represented internally even when the training objective only minimizes energy.
  • The same feature extraction and steering process works for both ground-state representations and real-time quantum dynamics.
  • Unsupervised feature discovery provides a general diagnostic tool for understanding how variational wavefunctions capture many-body physics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could be tested on other variational ansatze to check whether similar causal features appear across different network architectures.
  • Steering might be combined with symmetry constraints to enforce conservation laws during inference.
  • Feature interventions could serve as a probe for how training data influences the emergence of specific physical representations.

Load-bearing premise

The features found by the autoencoders are the direct causes of the observed changes in physical quantities rather than incidental correlations that interventions happen to affect.

What would settle it

An experiment in which editing the identified feature alters the target observable in the predicted direction but also produces large unintended shifts in variational energy or unrelated observables would falsify the claim of isolated causal control.

Figures

Figures reproduced from arXiv: 2607.01336 by Christopher Earls, Zihao Qi.

Figure 1
Figure 1. Figure 1: FIG. 1 [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2 [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (a) shows that the SAE identifies a sparse fea￾ture (f132) that is strongly correlated with the staggered magnetization, with a correlation coefficient |r| = 0.97 (the negative slope means that the feature is aligned with −Mstag instead of +Mstag). To probe whether the activa￾tion strengths causally control Mstag, we follow the same activation-steering procedure used in Sec. III B: rescal￾ing the activatio… view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6: Effective Sample Size (ESS) for feature-steering [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: shows the scatter plot between the top-correlated feature (f79) and the magnetization Mz, in the para￾magnetic regime h = 1.5. The maximum correlation is only |r| = 0.16, which is in sharp contrast to the trained model, where the top feature tracks magneti￾zation with |r| = 0.99 ( [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
read the original abstract

Neural Quantum States (NQS) are a remarkably expressive class of variational ans\"atze for quantum many-body wavefunctions, yet little is understood about their internal mechanisms: trained on variational objectives alone, how do NQS accurately capture physical observables that they have never been explicitly optimized for? In this work, we present a systematic approach to analyze the internal activations of NQS using sparse autoencoders. We extract features from the residual stream and demonstrate that these features strongly correlate with physical observables such as order parameters, staggered magnetization, and half-chain correlators, across both ground state representation and real-time dynamics. Remarkably, the discovery of these features is entirely unsupervised, with no physical labels provided. We further establish that such features causally affect the corresponding observables predicted by NQS, by showing that targeted, post-training intervention on a \textit{single} feature smoothly and monotonically steers the corresponding observable, while leaving the variational energy nearly unchanged. These results demonstrate that NQS are not merely functional approximators, but encode rich, interpretable internal representations of physical information. Our approach provides both a diagnostic and an intervention tool for NQS, and serves as a foundation for using mechanistic interpretability towards more reliable, transparent NQS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper applies sparse autoencoders to the residual stream activations of Neural Quantum States (NQS) to extract unsupervised features. These features are reported to correlate strongly with physical observables (order parameters, staggered magnetization, half-chain correlators) in both ground-state and real-time dynamics settings. The central empirical claim is that post-training intervention on a single extracted feature produces smooth, monotonic steering of the corresponding observable while leaving the variational energy nearly unchanged, establishing causal influence and showing that NQS encode interpretable physical representations.

Significance. If the intervention results hold under rigorous controls, the work supplies both a diagnostic tool and a causal intervention method for NQS, moving the field from purely variational optimization toward mechanistic understanding. The unsupervised discovery of physically meaningful features and the energy-invariance check are concrete strengths that could improve reliability and transparency of neural ansätze in quantum many-body problems.

major comments (2)
  1. [Abstract] Abstract and methods description: the central claim of smooth monotonic steering via single-feature intervention is presented without any reported datasets (Hamiltonians, lattice sizes), number of independent runs, error bars, or quantitative metrics (e.g., Pearson coefficients or p-values for monotonicity). These details are load-bearing for evaluating whether the observed steering is robust or an artifact of particular choices.
  2. The causal interpretation rests on the intervention experiments isolating the target observable. Without explicit controls (e.g., random-feature baselines, ablation of multiple features, or checks that other observables remain unaffected), it remains unclear whether the reported energy invariance fully rules out unintended side effects on the network.
minor comments (2)
  1. Notation for the residual stream and SAE reconstruction loss should be defined explicitly on first use, including the sparsity penalty coefficient.
  2. Figure captions should state the precise system sizes and Hamiltonian parameters used for each panel to allow direct comparison with the text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments, which highlight important aspects of clarity and rigor in presenting our results. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract and methods description: the central claim of smooth monotonic steering via single-feature intervention is presented without any reported datasets (Hamiltonians, lattice sizes), number of independent runs, error bars, or quantitative metrics (e.g., Pearson coefficients or p-values for monotonicity). These details are load-bearing for evaluating whether the observed steering is robust or an artifact of particular choices.

    Authors: The Methods section and figure captions of the manuscript specify the Hamiltonians (transverse-field Ising and related models), lattice sizes, training procedures, and number of independent runs, while error bars and correlation metrics appear in the results figures. The abstract was written at a high level for brevity. We agree that incorporating key experimental parameters into the abstract will improve evaluability and have revised the abstract accordingly to reference the systems studied and the quantitative metrics employed. revision: yes

  2. Referee: The causal interpretation rests on the intervention experiments isolating the target observable. Without explicit controls (e.g., random-feature baselines, ablation of multiple features, or checks that other observables remain unaffected), it remains unclear whether the reported energy invariance fully rules out unintended side effects on the network.

    Authors: The near-invariance of the variational energy under single-feature intervention serves as our primary control, indicating that the change is localized rather than a global disruption of the network's optimization. We also examined the impact on non-target observables in our analyses. We did not include random-feature baselines or systematic multi-feature ablations. We will revise the text to make these controls and their scope more explicit, add a limitations discussion, and include additional checks on other observables where feasible. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical workflow: unsupervised SAE training on NQS residual-stream activations, followed by correlation measurements and single-feature intervention experiments that measure observable steering while monitoring variational energy. No equations, uniqueness theorems, or fitted parameters are presented that reduce the central claims (feature-observable causality) to the inputs by construction. The intervention results are reported as direct experimental outcomes rather than predictions derived from the same data used to fit the SAE. The work is therefore self-contained against external benchmarks and receives a normal non-finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields limited visibility into assumptions; no explicit free parameters or invented entities are described.

axioms (1)
  • domain assumption Sparse autoencoders recover physically meaningful features from neural activations in variational quantum models
    Central to the unsupervised extraction and intervention claims.

pith-pipeline@v0.9.1-grok · 5762 in / 1094 out tokens · 28851 ms · 2026-07-03T20:14:45.379009+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

73 extracted references · 45 canonical work pages · 16 internal anchors

  1. [1]

    Lange, A

    H. Lange, A. V. de Walle, A. Abedinnia, and A. Bohrdt, From architectures to applications: A review of neural quantum states (2024), arXiv:2402.09402 [cond-mat.dis- nn]

  2. [2]

    Z. Jia, B. Yi, R. Zhai, Y. Wu, G. Guo, and G. Guo, Quantum neural network states: A brief review of meth- ods and applications, Advanced Quantum Technologies 2, 10.1002/qute.201800077 (2019)

  3. [3]

    Medvidovi´ c and J

    M. Medvidovi´ c and J. R. Moreno, Neural-network quan- tum states for many-body physics, The European Phys- ical Journal Plus139, 10.1140/epjp/s13360-024-05311-y (2024)

  4. [4]

    Carleo and M

    G. Carleo and M. Troyer, Solving the quantum many- body problem with artificial neural networks, Science 355, 602–606 (2017)

  5. [5]

    I. L. Guti´ errez and C. B. Mendl, Real time evolution with neural-network quantum states, Quantum6, 627 (2022)

  6. [6]

    Schmitt and M

    M. Schmitt and M. Heyl, Quantum many-body dynamics in two dimensions with artificial neural networks, Physi- cal Review Letters125, 10.1103/physrevlett.125.100503 (2020)

  7. [7]

    A. V. de Walle, M. Schmitt, and A. Bohrdt, Many-body dynamics with explicitly time-dependent neural quantum states (2024), arXiv:2412.11830 [quant-ph]

  8. [8]

    Sinibaldi, D

    A. Sinibaldi, D. Hendry, F. Vicentini, and G. Carleo, Time-dependent neural galerkin method for quantum dy- namics, Physical Review Letters136, 10.1103/kqvx-dl54 (2026)

  9. [9]

    Hibat-Allah, M

    M. Hibat-Allah, M. Ganahl, L. E. Hayward, R. G. Melko, and J. Carrasquilla, Recurrent neural network wave func- tions, Phys. Rev. Res.2, 023358 (2020)

  10. [10]

    L. L. Viteritti, R. Rende, and F. Becca, Transformer vari- ational wave functions for frustrated quantum spin sys- tems, Phys. Rev. Lett.130, 236401 (2023)

  11. [11]

    Zhang and M

    Y.-H. Zhang and M. Di Ventra, Transformer quantum state: A multipurpose model for quantum many-body problems, Phys. Rev. B107, 075147 (2023)

  12. [12]

    Sinibaldi, A

    A. Sinibaldi, A. F. Mello, M. Collura, and G. Carleo, Nonstabilizerness of neural quantum states, Physical Re- view Research7, 10.1103/v5tw-yn1f (2025)

  13. [13]

    D.-L. Deng, X. Li, and S. Das Sarma, Quantum entan- glement in neural network states, Phys. Rev. X7, 021021 (2017)

  14. [14]

    L. L. Viteritti, R. Rende, C. Roth, A. Sengupta, G. Car- leo, and A. Georges, Beyond variational bias: Resolv- ing intertwined orders in the hubbard model (2026), arXiv:2604.21978 [cond-mat.str-el]

  15. [15]

    D¨ oschl, F

    F. D¨ oschl, F. A. Palm, H. Lange, F. Grusdt, and A. Bohrdt, Neural network quantum states for the in- teracting hofstadter model with higher local occupations and long-range interactions, Phys. Rev. B111, 045408 (2025)

  16. [16]

    M. A. Shamim, E. A. F. Reinhardt, T. A. Chowdhury, S. Gleyzer, and P. T. Araujo, Probing quantum spin sys- 11 tems with kolmogorov-arnold neural network quantum states, Phys. Rev. B113, 045157 (2026)

  17. [17]

    Ibarra-Garc´ ıa-Padilla, H

    E. Ibarra-Garc´ ıa-Padilla, H. Lange, R. G. Melko, R. T. Scalettar, J. Carrasquilla, A. Bohrdt, and E. Khatami, Autoregressive neural quantum states of fermi hubbard models, Phys. Rev. Res.7, 013122 (2025)

  18. [18]

    Sharir, Y

    O. Sharir, Y. Levine, N. Wies, G. Carleo, and A. Shashua, Deep autoregressive models for the efficient variational simulation of many-body quantum systems, Phys. Rev. Lett.124, 020503 (2020)

  19. [19]

    L. L. Viteritti, R. Rende, S. Sachdev, and G. Carleo, Ap- proaching the thermodynamic limit with neural-network quantum states (2026), arXiv:2602.02665 [cond-mat.str- el]

  20. [20]

    D. Luo, T. Zaklama, and L. Fu, Solving fractional elec- tron states in twisted mote 2 with deep neural network (2025), arXiv:2503.13585 [cond-mat.str-el]

  21. [21]

    D. Luo, D. D. Dai, and L. Fu, Pairing-based graph neural network for simulating quantum materials, Phys. Rev. B 113, 165107 (2026)

  22. [22]

    M. Reh, M. Schmitt, and M. G¨ arttner, Optimizing design choices for neural quantum states, Phys. Rev. B107, 195115 (2023)

  23. [23]

    D. Luo, Z. Chen, K. Hu, Z. Zhao, V. M. Hur, and B. K. Clark, Gauge-invariant and anyonic-symmetric au- toregressive neural network for quantum lattice models, Phys. Rev. Res.5, 013216 (2023)

  24. [24]

    K. Choo, G. Carleo, N. Regnault, and T. Neupert, Symmetries and many-body excitations with neural- network quantum states, Physical Review Letters121, 10.1103/physrevlett.121.167204 (2018)

  25. [25]

    D. Luo, G. Carleo, B. K. Clark, and J. Stokes, Gauge equivariant neural networks for quantum lattice gauge theories, Phys. Rev. Lett.127, 276402 (2021)

  26. [26]

    D¨ oschl and A

    F. D¨ oschl and A. Bohrdt, Towards interpretability of neu- ral quantum states (2026), arXiv:2508.14152 [quant-ph]

  27. [27]

    Valenti, E

    A. Valenti, E. Greplova, N. H. Lindner, and S. D. Huber, Correlation-enhanced neural networks as interpretable variational quantum states, Phys. Rev. Res.4, L012010 (2022)

  28. [28]

    J. A. Sobral, M. Perle, and M. S. Scheurer, Physics- informed transformers for electronic quantum states, Nature Communications16, 10.1038/s41467-025-66844-z (2025)

  29. [29]

    Malyshev, J

    A. Malyshev, J. M. Arrazola, and A. I. Lvovsky, Autore- gressive neural quantum states with quantum number symmetries (2023), arXiv:2310.04166 [quant-ph]

  30. [30]

    D. S. Kufel, J. Kemp, D. Vu, S. M. Linsel, C. R. Lau- mann, and N. Y. Yao, Approximately symmetric neural networks for quantum spin liquids, Physical Review Let- ters135, 10.1103/pgnx-11ph (2025)

  31. [31]

    T.-H. Yang, M. Soleimanifar, T. Bergamaschi, and J. Preskill, When can classical neural networks represent quantum states? (2024), arXiv:2410.23152 [quant-ph]

  32. [32]

    Paul, Bound on entanglement in neural quantum states, Phys

    N. Paul, Bound on entanglement in neural quantum states, Phys. Rev. Lett.136, 120403 (2026)

  33. [33]

    Passetti, D

    G. Passetti, D. Hofmann, P. Neitemeier, L. Grunwald, M. A. Sentef, and D. M. Kennes, Can neural quantum states learn volume-law ground states?, Phys. Rev. Lett. 131, 036502 (2023)

  34. [34]

    can neural quantum states learn volume-law ground states?

    Z. Denis, A. Sinibaldi, and G. Carleo, Comment on “can neural quantum states learn volume-law ground states?”, Phys. Rev. Lett.134, 079701 (2025)

  35. [35]

    Gao and L.-M

    X. Gao and L.-M. Duan, Efficient representation of quantum many-body states with deep neural networks, Nature Communications8, 10.1038/s41467-017-00705-2 (2017)

  36. [36]

    Glasser, N

    I. Glasser, N. Pancotti, M. August, I. D. Rodriguez, and J. I. Cirac, Neural-network quantum states, string-bond states, and chiral topological states, Physical Review X 8, 10.1103/physrevx.8.011006 (2018)

  37. [37]

    Scaling Laws for Neural-Network Quantum States

    R. Rende, A. Sinibaldi, L. L. Viteritti, R. Wiersema, A. Georges, and G. Carleo, Scaling laws for neural- network quantum states (2026), arXiv:2606.02794 [cond- mat.dis-nn]

  38. [38]

    Hernandes, T

    V. Hernandes, T. Spriggs, S. Khaleefah, and E. Greplova, Adiabatic fine-tuning of neural quantum states enables detection of phase transitions in weight space (2025), arXiv:2503.17140 [quant-ph]

  39. [39]

    Barton, J

    B. Barton, J. Carrasquilla, C. Roth, and A. Valenti, Con- nectivity determines the capability of sparse neural net- work quantum states (2026), arXiv:2505.22734 [quant- ph]

  40. [40]

    Golubeva and R

    A. Golubeva and R. G. Melko, Pruning a restricted boltz- mann machine for quantum state reconstruction, Phys. Rev. B105, 125124 (2022)

  41. [41]

    M. S. Moss, A. Orfi, C. Roth, A. M. Sengupta, A. Georges, D. Sels, A. Dawid, and A. Valenti, Double de- scent: When do neural quantum states generalize?, Phys. Rev. E113, 045303 (2026)

  42. [42]

    S. Dash, L. Gravina, F. Vicentini, M. Ferrero, and A. Georges, Efficiency of neural quantum states in light of the quantum geometric tensor, Communications Physics 8, 10.1038/s42005-025-02005-4 (2025)

  43. [43]

    D. Rai, Y. Zhou, S. Feng, A. Saparov, and Z. Yao, A practical review of mechanistic interpretabil- ity for transformer-based language models (2025), arXiv:2407.02646 [cs.AI]

  44. [44]

    Mechanistic Interpretability for AI Safety -- A Review

    L. Bereska and E. Gavves, Mechanistic interpretability for ai safety – a review (2024), arXiv:2404.14082 [cs.AI]

  45. [45]

    Open Problems in Mechanistic Interpretability

    L. Sharkey, B. Chughtai, J. Batson, J. Lindsey, J. Wu, L. Bushnaq, N. Goldowsky-Dill, S. Heimersheim, A. Or- tega, J. Bloom, S. Biderman, A. Garriga-Alonso, A. Conmy, N. Nanda, J. Rumbelow, M. Wattenberg, N. Schoots, J. Miller, E. J. Michaud, S. Casper, M. Tegmark, W. Saunders, D. Bau, E. Todd, A. Geiger, M. Geva, J. Hoogland, D. Murfet, and T. McGrath, O...

  46. [46]

    Bricken, A

    T. Bricken, A. Templeton, J. Batson, B. Chen, A. Jermyn, T. Conerly, N. Turner, C. Anil, C. Deni- son, A. Askell,et al., Towards monosemanticity: Decom- posing language models with dictionary learning, Trans- former Circuits Thread2, 6 (2023)

  47. [47]

    Sparse Autoencoders Find Highly Interpretable Features in Language Models

    H. Cunningham, A. Ewart, L. Riggs, R. Huben, and L. Sharkey, Sparse autoencoders find highly interpretable features in language models (2023), arXiv:2309.08600 [cs.LG]

  48. [48]

    L. Gao, T. D. la Tour, H. Tillman, G. Goh, R. Troll, A. Radford, I. Sutskever, J. Leike, and J. Wu, Scaling and evaluating sparse autoencoders (2024), arXiv:2406.04093 [cs.LG]

  49. [49]

    Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

    A. Templeton, T. Conerly, J. Marcus, J. Lindsey, T. Bricken, B. Chen, A. Pearce, C. Citro, E. Ameisen, A. Jones, H. Cunningham, N. L. Turner, C. Mc- Dougall, M. MacDiarmid, A. Tamkin, E. Durmus, T. Hume, F. Mosconi, C. D. Freeman, T. R. Sumers, 12 E. Rees, J. Batson, A. Jermyn, S. Carter, C. Olah, and T. Henighan, Scaling monosemanticity: Extract- ing int...

  50. [50]

    B. A. Olshausen and D. J. Field, Sparse coding with an overcomplete basis set: A strategy employed by v1?, Vi- sion research37, 3311 (1997)

  51. [51]

    Simon and J

    E. Simon and J. Zou, Interplm: Discovering interpretable features in protein language models via sparse autoen- coders (2024), arXiv:2412.12101 [q-bio.BM]

  52. [52]

    Adams, L

    E. Adams, L. Bai, M. Lee, Y. Yu, and M. AlQuraishi, From mechanistic interpretability to mechanistic biol- ogy: Training, evaluating, and interpreting sparse au- toencoders on protein language models, bioRxiv (2025)

  53. [53]

    Gujral, M

    O. Gujral, M. Bafna, E. Alm, and B. Berger, Sparse autoencoders uncover biologically in- terpretable features in protein language model representations, Proceedings of the National Academy of Sciences122, e2506316122 (2025), https://www.pnas.org/doi/pdf/10.1073/pnas.2506316122

  54. [54]

    E. N. V. Garcia and A. Ansuini, Interpreting and steer- ing protein language models through sparse autoencoders (2025), arXiv:2502.09135 [cs.LG]

  55. [55]

    MacMillan and N

    T. MacMillan and N. T. Ouellette, Towards mechanistic understanding in a data-driven weather model: internal activations reveal interpretable physical features (2025), arXiv:2512.24440 [physics.ao-ph]

  56. [56]

    Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics

    K. Rosenfeld and M. Sonnewald, Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics (2026), arXiv:2606.11657 [cs.LG]

  57. [57]

    Toy Models of Superposition

    N. Elhage, T. Hume, C. Olsson, N. Schiefer, T. Henighan, S. Kravec, Z. Hatfield-Dodds, R. Lasenby, D. Drain, C. Chen, R. Grosse, S. McCandlish, J. Kaplan, D. Amodei, M. Wattenberg, and C. Olah, Toy models of superposition (2022), arXiv:2209.10652 [cs.LG]

  58. [58]

    D. Bank, N. Koenigstein, and R. Giryes, Autoencoders (2021), arXiv:2003.05991 [cs.LG]

  59. [59]

    Pfeuty, The one-dimensional Ising model with a trans- verse field, Annals of Physics57, 79 (1970)

    P. Pfeuty, The one-dimensional Ising model with a trans- verse field, Annals of Physics57, 79 (1970)

  60. [60]

    Geiger, H

    A. Geiger, H. Lu, T. Icard, and C. Potts, Causal abstrac- tions of neural networks (2021), arXiv:2106.02997 [cs.AI]

  61. [61]

    K. Meng, D. Bau, A. Andonian, and Y. Belinkov, Lo- cating and editing factual associations in gpt (2023), arXiv:2202.05262 [cs.CL]

  62. [62]

    A. Zou, L. Phan, S. Chen, J. Campbell, P. Guo, R. Ren, A. Pan, X. Yin, M. Mazeika, A.-K. Dombrowski, S. Goel, N. Li, M. J. Byun, Z. Wang, A. Mallen, S. Basart, S. Koyejo, D. Song, M. Fredrikson, J. Z. Kolter, and D. Hendrycks, Representation engineering: A top-down approach to ai transparency (2025), arXiv:2310.01405 [cs.LG]

  63. [63]

    A. M. Turner, L. Thiergart, G. Leech, D. Udell, J. J. Vazquez, U. Mini, and M. MacDiarmid, Steer- ing language models with activation engineering (2024), arXiv:2308.10248 [cs.CL]

  64. [64]

    Rende, S

    R. Rende, S. Goldt, F. Becca, and L. L. Viteritti, Fine- tuning neural network quantum states, Phys. Rev. Res. 6, 043280 (2024)

  65. [65]

    Z. Qi, C. Earls, and Y. Peng, Neural operator quantum state: A foundation model for quantum dynamics (2026), arXiv:2603.25066 [quant-ph]

  66. [66]

    Z. Qi, C. Earls, and Y. Peng, Universal neural propa- gator: Learning time evolution in many-body quantum systems (2026), arXiv:2605.05299 [quant-ph]

  67. [67]

    A. V. Chubukov, S. Sachdev, and J. Ye, Theory of two- dimensional quantum heisenberg antiferromagnets with a nearly critical ground state, Physical Review B49, 11919–11961 (1994)

  68. [68]

    Manousakis, The spin-½heisenberg antiferromagnet on a square lattice and its application to the cuprous oxides, Rev

    E. Manousakis, The spin-½heisenberg antiferromagnet on a square lattice and its application to the cuprous oxides, Rev. Mod. Phys.63, 1 (1991)

  69. [69]

    A. W. Sandvik, Finite-size scaling of the ground-state pa- rameters of the two-dimensional heisenberg model, Phys- ical Review B56, 11678–11690 (1997)

  70. [70]

    Rende, L

    R. Rende, L. L. Viteritti, F. Becca, A. Scardicchio, A. Laio, and G. Carleo, Foundation neural-networks quantum states as a unified ansatz for multiple hamilto- nians, Nature Communications16, 10.1038/s41467-025- 62098-x (2025)

  71. [71]

    Attention-Based Foundation Model for Quantum States

    T. Zaklama, D. Guerci, and L. Fu, Attention- based foundation model for quantum states (2026), arXiv:2512.11962 [cond-mat.str-el]

  72. [72]

    Attention Is All You Need

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, At- tention is all you need (2023), arXiv:1706.03762 [cs.CL]

  73. [73]

    k-Sparse Autoencoders

    A. Makhzani and B. Frey, k-sparse autoencoders (2014), arXiv:1312.5663 [cs.LG]. Appendix A: Details of Transformer Architecture The transformer architecture [72] has recently become a powerful ans¨ atz for quantum wavefunctions [10, 11, 70, 71]. Its self-attention mechanism allows the wavefunction representation to capture long-range correlations in many ...