pith. sign in

arxiv: 2606.26760 · v1 · pith:VUJ7PBMDnew · submitted 2026-06-25 · ⚛️ physics.chem-ph · quant-ph

An Iterative Dual-Channel Neural Quantum State Algorithm for Selected Configuration Interaction

Pith reviewed 2026-06-26 02:46 UTC · model grok-4.3

classification ⚛️ physics.chem-ph quant-ph
keywords selected configuration interactionneural quantum statestransformerstrongly correlated electronselectronic structurequantum chemistryautoregressive sampling
0
0 comments X

The pith

A dual-channel Transformer neural quantum state achieves chemical accuracy in selected configuration interaction with more favorable determinant scaling than CIPSI.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Handover Iterative Neural Quantum State (HI-NQS) algorithm that places a classically trained autoregressive Transformer inside the iterative sample-diagonalize-update loop of Sample-Based Quantum Diagonalization. A dual-channel architecture with spin-up and spin-down cross-attention encodes fermionic spin structure as an inductive bias, and after each subspace diagonalization the resulting eigenvector is distilled back into the network through a factorized spin-marginal teacher signal. This creates a closed feedback loop in which generative sampling becomes more efficient at locating chemically important configurations. Benchmarks on small molecules and a nitrogen active-space series show chemical accuracy on every system tested together with determinant-count scaling that is substantially better than conventional CIPSI-based selected configuration interaction for all but the smallest active spaces. A sympathetic reader would care because the exponential growth of configuration space has long limited exact solutions of the electronic Schrödinger equation for strongly correlated molecules, and this approach operates entirely on classical GPU hardware.

Core claim

The central claim is that embedding an autoregressive Transformer neural quantum state within the iterative framework of Sample-Based Quantum Diagonalization, using a dual-channel architecture with explicit spin cross-attention and distilling the subspace eigenvector via a factorized spin-marginal teacher signal, produces determinantal expansions that reach chemical accuracy while exhibiting substantially more favorable determinant-count scaling than CIPSI-based selected configuration interaction on the tested systems.

What carries the argument

The dual-channel autoregressive Transformer with spin-up/spin-down cross-attention, combined with the handover mechanism that distills the exact eigenvector into the network through a factorized spin-marginal teacher signal after each subspace diagonalization.

If this is right

  • HI-NQS reaches chemical accuracy on all small-molecule and nitrogen active-space systems tested.
  • Determinant-count scaling is substantially more favorable than CIPSI-based SCI for all but the smallest active spaces.
  • All calculations run on classical GPU hardware with no quantum computing resources required.
  • The closed feedback loop between generative sampling and exact diagonalization improves configuration selection efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The distillation step could be generalized to other iterative selected-configuration methods that already perform subspace diagonalizations.
  • The same dual-channel architecture might reduce sampling cost in related variational Monte Carlo approaches that lack an exact diagonalization step.
  • If the scaling advantage persists at larger active spaces, the method could become competitive with density-matrix renormalization group for quasi-one-dimensional strongly correlated systems.

Load-bearing premise

The factorized spin-marginal teacher signal obtained after each subspace diagonalization is sufficient to distill the exact eigenvector back into the autoregressive Transformer so that subsequent generative sampling identifies chemically important configurations more efficiently.

What would settle it

A benchmark on a larger active space or molecule in which the number of determinants required to reach chemical accuracy exceeds that of CIPSI-based SCI or fails to reach chemical accuracy altogether would falsify the reported scaling advantage.

Figures

Figures reproduced from arXiv: 2606.26760 by En-Jui Kuo, Hsiu-Chi Tsai, Jen-Yu Chang, Ming-Chun Yang, Nan Yow Chen, Tai-Yue Li, Tsung-Wei Huang, Yi-Chun Chang, Yu-Jui Lin.

Figure 1
Figure 1. Figure 1: Dual-channel autoregressive Transformer NQS architecture. The spin- [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The iterative HI-NQS handover loop. Iteration t = 0 (cold start). No eigenvector Ψ0 is yet available. Candidates are ranked by their diagonal Hamiltonian matrix element Hxx = ⟨x|Hˆ |x⟩. The K0 lowest-energy candidates form the initial basis B, with the cold-start budget K0 chosen no larger than the per-iteration budget K used in later iterations. Iteration t = 1 (full rescore). Once the first eigenvector Ψ… view at source ↗
Figure 3
Figure 3. Figure 3: Determinant count at natural convergence vs. qubit count for the N [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Accuracy–cost Pareto fronts: |E − Eref| vs. determinant count for four representative active spaces. CIPSI-SCI sweep (orange) and NQS variational (green, median ± IQR over ten seeds); the dotted line marks chemical accuracy (1.6 mHa). References are exact PySCF FCI except for CAS(14,20), where a Dice-SHCI+det-PT2 estimate (ε = 10−6 ) serves as an FCI proxy. 10 [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
read the original abstract

Accurately solving the electronic Schr\"{o}dinger equation for strongly correlated systems remains a central challenge in quantum chemistry, where the exponential growth of configuration space limits the applicability of exact methods. Selected Configuration Interaction (SCI) algorithms address this challenge by adaptively constructing compact determinantal expansions, yet their efficiency depends critically on the quality of the sampling strategy used to identify chemically important configurations. Here we introduce the Handover Iterative Neural Quantum State (HI-NQS) algorithm, which embeds a classically trained autoregressive Transformer neural quantum state within the iterative sample--diagonalize--update framework of Sample-Based Quantum Diagonalization. A dual-channel Transformer architecture with explicit spin-up/spin-down cross-attention encodes fermionic spin structure as an architectural inductive bias, enabling expressive and physically informed wavefunction representations. After each subspace diagonalization, the resulting eigenvector is distilled back into the network through a factorized spin-marginal teacher signal, establishing a closed feedback loop between generative sampling and exact diagonalization. Benchmarks across a range of small molecules and a systematic nitrogen active-space series demonstrate that HI-NQS achieves chemical accuracy on all systems tested, with determinant-count scaling substantially more favorable than conventional CIPSI-based SCI for all but the smallest active spaces. All calculations are performed on GPU hardware without quantum computing resources, establishing HI-NQS as an efficient and scalable purely classical approach to the selected configuration interaction problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces the Handover Iterative Neural Quantum State (HI-NQS) algorithm, which embeds a dual-channel autoregressive Transformer neural quantum state (with explicit spin-up/spin-down cross-attention) into the iterative sample-diagonalize-update loop of Sample-Based Quantum Diagonalization. After each subspace diagonalization, the exact eigenvector is distilled back into the network via a factorized spin-marginal teacher signal. The central claim is that this closed-loop procedure achieves chemical accuracy across small molecules and a systematic nitrogen active-space series while exhibiting substantially more favorable determinant-count scaling than conventional CIPSI-based SCI for all but the smallest active spaces.

Significance. If the reported chemical accuracy and scaling advantage are robustly supported by the benchmarks, the work would constitute a meaningful advance in classical selected-configuration-interaction methods for strongly correlated electrons. The architectural inductive bias for fermionic spin structure and the use of an exact diagonalization teacher signal independent of the network are positive features that distinguish the approach from purely variational neural quantum states.

major comments (2)
  1. [Abstract and method description of the distillation step] Abstract and method description of the distillation step: the central performance claims (chemical accuracy and determinant scaling superior to CIPSI on the nitrogen series) rest on the assumption that the factorized spin-marginal teacher signal is informationally sufficient to distill the full eigenvector into the autoregressive Transformer. Because the factorization separates the up and down marginals before feedback, any joint spin correlations not captured by the marginals are lost; the manuscript provides no explicit analysis, ablation, or information-theoretic argument showing that the dual-channel cross-attention compensates for this loss in multi-reference regimes.
  2. [Benchmarks on the nitrogen active-space series] Benchmarks on the nitrogen active-space series: the scaling advantage is asserted to hold for all but the smallest active spaces, yet the strength of this claim cannot be evaluated without the concrete determinant counts, energy errors, and error bars for the largest active spaces tested. If the factorized teacher signal is incomplete, the generative sampling would not preferentially identify the chemically important determinants that CIPSI already locates, undermining the reported scaling comparison.
minor comments (1)
  1. [Computational details] The abstract states that all calculations are performed on GPU hardware without quantum resources; this is a useful clarification but should be repeated with hardware specifications in the computational-details section for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review. We address each major comment below, providing clarifications based on the manuscript content and indicating where revisions will strengthen the presentation.

read point-by-point responses
  1. Referee: Abstract and method description of the distillation step: the central performance claims (chemical accuracy and determinant scaling superior to CIPSI on the nitrogen series) rest on the assumption that the factorized spin-marginal teacher signal is informationally sufficient to distill the full eigenvector into the autoregressive Transformer. Because the factorization separates the up and down marginals before feedback, any joint spin correlations not captured by the marginals are lost; the manuscript provides no explicit analysis, ablation, or information-theoretic argument showing that the dual-channel cross-attention compensates for this loss in multi-reference regimes.

    Authors: We agree that the manuscript does not contain an explicit ablation study or information-theoretic quantification of information loss from the factorized marginals. The dual-channel Transformer with cross-attention is constructed precisely to allow the model to learn inter-spin correlations from the separate marginal teacher signals, and the exact diagonalization step supplies a complete target eigenvector at each iteration. Nevertheless, to directly address the concern, we will add a dedicated paragraph in the Methods section explaining the architectural inductive bias and how the closed feedback loop mitigates potential loss of joint correlations. This revision will be textual and will not require new calculations. revision: partial

  2. Referee: Benchmarks on the nitrogen active-space series: the scaling advantage is asserted to hold for all but the smallest active spaces, yet the strength of this claim cannot be evaluated without the concrete determinant counts, energy errors, and error bars for the largest active spaces tested. If the factorized teacher signal is incomplete, the generative sampling would not preferentially identify the chemically important determinants that CIPSI already locates, undermining the reported scaling comparison.

    Authors: Section 4 and the associated figures report chemical accuracy and improved determinant scaling for the nitrogen series relative to CIPSI. To make the quantitative basis of the scaling claim fully transparent, we will add a supplementary table listing, for each active space tested, the number of determinants retained, the energy error relative to the reference (FCI or DMRG where available), and any statistical error bars from multiple runs. These data already exist in our internal records and will allow readers to verify that the generative sampling preferentially recovers chemically relevant determinants. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on independent external diagonalization

full rationale

The HI-NQS method embeds an autoregressive Transformer within an iterative sample-diagonalize-update loop, where the eigenvector from each subspace diagonalization serves as an external teacher signal distilled via factorized spin-marginals. This signal is generated by exact diagonalization independent of the network parameters, and the reported chemical accuracy plus determinant scaling advantages are benchmarked against conventional CIPSI on external molecular systems. No equations or steps reduce the claimed predictions to fitted inputs by construction, nor do any load-bearing claims rest on self-citations or ansatzes imported from prior author work. The architecture and feedback loop remain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central performance claims rest on the assumption that a classically trained autoregressive Transformer can serve as an effective generative model for fermionic configurations and that the iterative distillation loop converges to chemically accurate eigenvectors; the network weights constitute a large set of fitted parameters whose values are not reported.

free parameters (1)
  • Transformer network weights
    All parameters of the dual-channel autoregressive Transformer are trained on each molecular system to represent the wavefunction amplitudes.
axioms (1)
  • domain assumption The electronic wavefunction must be antisymmetric under particle exchange.
    Standard requirement for fermionic systems invoked when encoding spin structure via the dual-channel architecture.
invented entities (1)
  • Dual-channel Transformer with explicit spin-up/spin-down cross-attention no independent evidence
    purpose: To embed fermionic spin structure directly into the network architecture as an inductive bias for sampling.
    New architectural component introduced to improve physical fidelity of the generative model.

pith-pipeline@v0.9.1-grok · 5816 in / 1360 out tokens · 62513 ms · 2026-06-26T02:46:37.808637+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 6 linked inside Pith

  1. [1]

    S.Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory; Macmillan: New York, 1982

    Szabo, A.; Ostlund, N. S.Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory; Macmillan: New York, 1982

  2. [2]

    P.; Rancurel, P

    Huron, B.; Malrieu, J. P.; Rancurel, P. Iterative perturbation calculations of ground and excited state energies from multiconfigurational zeroth-order wavefunctions.J. Chem. Phys.1973,58, 5745–5759

  3. [3]

    A.; Tubman, N

    Holmes, A. A.; Tubman, N. M.; Umrigar, C. J. Heat-bath configuration interaction: An efficient selected configuration interaction algorithm inspired by heat-bath sampling.J. Chem. Theory Comput.2016, 12, 3674–3680

  4. [4]

    A.; Jeanmairet, G.; Alavi, A.; Umrigar, C

    Sharma, S.; Holmes, A. A.; Jeanmairet, G.; Alavi, A.; Umrigar, C. J. Semistochastic heat-bath configu- ration interaction method: Selected configuration interaction with semistochastic perturbation theory. J. Chem. Theory Comput.2017,13, 1595–1604

  5. [5]

    M.; Freeman, C

    Tubman, N. M.; Freeman, C. D.; Levine, D. S.; Hait, D.; Head-Gordon, M.; Whaley, K. B. Modern Ap- proaches to Exact Diagonalization and Selected Configuration Interaction with the Adaptive Sampling CI Method.J. Chem. Theory Comput.2020,16, 2139–2159

  6. [6]

    B.; Evangelista, F

    Schriber, J. B.; Evangelista, F. A. Communication: An adaptive configuration interaction approach for strongly correlated electrons with tunable accuracy.J. Chem. Phys.2016,144, 161106

  7. [7]

    K.-L.; Sharma, S

    Chan, G. K.-L.; Sharma, S. The density matrix renormalization group in quantum chemistry.Annu. Rev. Phys. Chem.2011,62, 465–481

  8. [8]

    H.; Thom, A

    Booth, G. H.; Thom, A. J. W.; Alavi, A. Fermion Monte Carlo without fixed nodes: a game of life, death, and annihilation in Slater determinant space.J. Chem. Phys.2009,131, 054106

  9. [9]

    J.; Aspuru-Guzik, A.; O’Brien, J

    Peruzzo, A.; McClean, J.; Shadbolt, P.; Yung, M.-H.; Zhou, X.-Q.; Love, P. J.; Aspuru-Guzik, A.; O’Brien, J. L. A variational eigenvalue solver on a photonic quantum processor.Nat. Commun.2014, 5, 4213

  10. [10]

    R.; Romero, J.; Babbush, R.; Aspuru-Guzik, A

    McClean, J. R.; Romero, J.; Babbush, R.; Aspuru-Guzik, A. The theory of variational hybrid quantum- classical algorithms.New J. Phys.2016,18, 023023

  11. [11]

    R.; Economou, S

    Grimsley, H. R.; Economou, S. E.; Barnes, E.; Mayhall, N. J. An adaptive variational algorithm for exact molecular simulations on a quantum computer.Nat. Commun.2019,10, 3007

  12. [12]

    R.; Boixo, S.; Smelyanskiy, V

    McClean, J. R.; Boixo, S.; Smelyanskiy, V. N.; Babbush, R.; Neven, H. Barren plateaus in quantum neural network training landscapes.Nat. Commun.2018,9, 4812. 13

  13. [13]

    J.; Cincio, L.; McClean, J

    Larocca, M.; Thanasilp, S.; Wang, S.; Sharma, K.; Biamonte, J.; Coles, P. J.; Cincio, L.; McClean, J. R.; Holmes, Z.; Cerezo, M. Barren plateaus in variational quantum computing.Nat. Rev. Phys.2025,7, 174–189

  14. [14]

    B.; Troyer, M

    Wecker, D.; Hastings, M. B.; Troyer, M. Progress towards practical quantum variational algorithms. Phys. Rev. A2015,92, 042303

  15. [15]

    F.; Radin, M

    Gonthier, J. F.; Radin, M. D.; Buda, C.; Doskocil, E. J.; Abuan, C. M.; Romero, J. Measurements as a roadblock to near-term practical quantum advantage in chemistry: Resource analysis.Phys. Rev. Research2022,4, 033154

  16. [16]

    Robledo-Moreno, J. et al. Chemistry beyond the scale of exact diagonalization on a quantum-centric supercomputer.Sci. Adv.2025,11, eadu9991, arXiv:2405.05068

  17. [17]

    Yu, J. et al. Quantum-centric algorithm for sample-based Krylov diagonalization.arXiv2025, arXiv:2501.09702

  18. [18]

    Kanno, K.; Kohda, M.; Imai, R.; Koh, S.; Mitarai, K.; Mizukami, W.; Nakagawa, Y. O. Quantum- selected configuration interaction: classical diagonalization of Hamiltonians in subspaces selected by quantum computers.Phys. Rev. Research2026,8, 023268, arXiv:2302.11320

  19. [19]

    H.; Yoo, P.; Elala, E

    Pellow-Jarman, A.; McFarthing, S.; Kang, D. H.; Yoo, P.; Elala, E. E.; Pellow-Jarman, R.; Nakliang, P. M.; Kim, J.; Rhee, J.-K. K. HIVQE: handover iterative variational quantum eigensolver for efficient quantum chemistry calculations.arXiv2025, arXiv:2503.06292

  20. [20]

    Yoo, P. et al. Extending the handover-iterative VQE to challenging strongly correlated systems: N2 and Fe–S cluster.arXiv2026, arXiv:2601.06935

  21. [21]

    Solving the quantum many-body problem with artificial neural networks.Science 2017,355, 602–606

    Carleo, G.; Troyer, M. Solving the quantum many-body problem with artificial neural networks.Science 2017,355, 602–606

  22. [22]

    S.; Matthews, A

    Pfau, D.; Spencer, J. S.; Matthews, A. G. D. G.; Foulkes, W. M. C. Ab initio solution of the many- electron Schrödinger equation with deep neural networks.Phys. Rev. Research2020,2, 033429

  23. [23]

    Deep-neural-network solution of the electronic Schrödinger equation

    Hermann, J.; Schätzle, Z.; Noé, F. Deep-neural-network solution of the electronic Schrödinger equation. Nat. Chem.2020,12, 891–897

  24. [24]

    Fermionic neural-network states for ab-initio electronic structure

    Choo, K.; Mezzacapo, A.; Carleo, G. Fermionic neural-network states for ab-initio electronic structure. Nat. Commun.2020,11, 2368

  25. [25]

    From architectures to applications: a review of neural quantum states.Quantum Sci

    Lange, H.; Van de Walle, A.; Abedinnia, A.; Bohrdt, A. From architectures to applications: a review of neural quantum states.Quantum Sci. Technol.2024,9, 040501

  26. [26]

    Green function Monte Carlo with stochastic reconfiguration.Phys

    Sorella, S. Green function Monte Carlo with stochastic reconfiguration.Phys. Rev. Lett.1998,80, 4558–4561

  27. [27]

    A nonstochastic optimization algorithm for neural-network quantum states.J

    Li, X.; Huang, J.-C.; Zhang, G.-Z.; Li, H.-E.; Cao, C.-s.; Lv, D.; Hu, H.-S. A nonstochastic optimization algorithm for neural-network quantum states.J. Chem. Theory Comput.2023,19, 8156–8165

  28. [28]

    Empowering deep neural quantum states through efficient optimization.Nat

    Chen, A.; Heyl, M. Empowering deep neural quantum states through efficient optimization.Nat. Phys. 2024,20, 1476–1481

  29. [29]

    Schmerwitz, Y. L. A.; Thirion, L.; Levi, G.; Jónsson, E. O.; Bilous, P.; Jónsson, H.; Hansmann, P. Neural-Network-Based Selective Configuration Interaction Approach to Molecular Electronic Structure. J. Chem. Theory Comput.2025,21, 2301–2310, arXiv:2406.08154

  30. [30]

    W.; Pálffy, A.; Hansmann, P

    Bilous, P.; Thirion, L.; Menke, H.; Haverkort, M. W.; Pálffy, A.; Hansmann, P. Neural-network- supported basis optimizer for the configuration interaction problem in quantum many-body clusters: Feasibility study and numerical proof.Phys. Rev. B2025,111, 035124, arXiv:2406.00151. 14

  31. [31]

    Thirion, L.; Schmerwitz, Y. L. A.; Kroesbergen, M.; Levi, G.; Jónsson, E. O.; Bilous, P.; Jóns- son, H.; Hansmann, P. Natural-orbital-based neural network configuration interaction.arXiv2025, arXiv:2510.27665

  32. [32]

    Coe, J. P. Machine Learning Configuration Interaction.J. Chem. Theory Comput.2018,14, 5739–5749

  33. [33]

    Solving the Schrödinger Equation in the Configuration Space with Generative Machine Learning.J

    Herzog, B.; Casier, B.; Lebègue, S.; Rocca, D. Solving the Schrödinger Equation in the Configuration Space with Generative Machine Learning.J. Chem. Theory Comput.2023,19, 2484–2490

  34. [34]

    Sun, D. et al. A Fully GPU-Accelerated Framework for High-Performance Configuration Interaction Selection with Neural Network Quantum States.arXiv2026, arXiv:2604.15768

  35. [35]

    NNQS-Transformer: An Efficient and Scalable Neural Network Quantum States Approach for Ab initio Quantum Chemistry

    Wu, Y.; Guo, C.; Fan, Y.; Zhou, P.; Shang, H. NNQS-Transformer: An Efficient and Scalable Neural Network Quantum States Approach for Ab initio Quantum Chemistry. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’23). 2023

  36. [36]

    Solving the many-electron Schrödinger equation with a transformer-based framework.Nat

    Shang, H.; Guo, C.; Wu, Y.; Li, Z.; Yang, J. Solving the many-electron Schrödinger equation with a transformer-based framework.Nat. Commun.2025,16, 8464

  37. [37]

    J.; Ding, L.; Reiher, M

    Solanki, M. J.; Ding, L.; Reiher, M. Neural Quantum States Based on Selected Configurations.arXiv 2026, arXiv:2602.12993

  38. [38]

    Hamiltonian-guided autoregressive selected-configuration inter- action achieves chemical accuracy in strongly correlated systems.J

    Zhang, H.; Zeng, X.; Li, Z.; Zhou, Y. Hamiltonian-guided autoregressive selected-configuration inter- action achieves chemical accuracy in strongly correlated systems.J. Chem. Theory Comput.2025,21, 12622–12633

  39. [39]

    Stochastic beams and where to find them: The Gumbel-top-k trick for sampling sequences without replacement

    Kool, W.; van Hoof, H.; Welling, M. Stochastic beams and where to find them: The Gumbel-top-k trick for sampling sequences without replacement. Proceedings of the 36th International Conference on Machine Learning (ICML). 2019; pp 3499–3508, arXiv:1903.06059

  40. [40]

    Auto-regressive neural quantum state sampling for selected configuration interaction.arXiv2026, arXiv:2603.24728

    Thompson, S.; Gunlycke, D. Auto-regressive neural quantum state sampling for selected configuration interaction.arXiv2026, arXiv:2603.24728

  41. [41]

    MADE: Masked autoencoder for distribution es- timation

    Germain, M.; Gregor, K.; Murray, I.; Larochelle, H. MADE: Masked autoencoder for distribution es- timation. Proceedings of the 32nd International Conference on Machine Learning (ICML). 2015; pp 881–889

  42. [42]

    Accelerating many-body quantum chemistry via generative Transformer-enhanced configuration interaction.J

    Kan, B.; Shang, H. Accelerating many-body quantum chemistry via generative Transformer-enhanced configuration interaction.J. Chem. Theory Comput.2025,21, 11989–12000

  43. [43]

    D.; Malyshev, A.; Lvovsky, A

    Barrett, T. D.; Malyshev, A.; Lvovsky, A. I. Autoregressive neural-network wavefunctions forab initio quantum chemistry.Nat. Mach. Intell.2022,4, 351–358, arXiv:2109.12606

  44. [44]

    Deep autoregressive models for the efficient variational simulation of many-body quantum systems.Phys

    Sharir, O.; Levine, Y.; Wies, N.; Carleo, G.; Shashua, A. Deep autoregressive models for the efficient variational simulation of many-body quantum systems.Phys. Rev. Lett.2020,124, 020503

  45. [45]

    Malyshev, A.; Schmitt, M.; Lvovsky, A. I. Neural quantum states and peaked molecular wave functions: Curse or blessing?arXiv2024, arXiv:2408.07625

  46. [46]

    Epstein, P. S. The Stark effect from the point of view of Schroedinger’s quantum theory.Phys. Rev. 1926,28, 695–710

  47. [47]

    Nesbet, R. K. Configuration interaction in orbital theories.Proc. R. Soc. A1955,230, 312–321

  48. [48]

    N.; Kaiser, Ł.; Polosukhin, I

    Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Advances in Neural Information Processing Systems 30 (NIPS 2017). 2017; arXiv:1706.03762

  49. [49]

    S.; Pfau, D

    von Glehn, I.; Spencer, J. S.; Pfau, D. A self-attention ansatz forab-initioquantum chemistry. The Eleventh International Conference on Learning Representations (ICLR). 2023; arXiv:2211.13672. 15

  50. [50]

    On layer normalization in the Transformer architecture

    Xiong, R.; Yang, Y.; He, D.; Zheng, K.; Zheng, S.; Xing, C.; Zhang, H.; Lan, Y.; Wang, L.; Liu, T.-Y. On layer normalization in the Transformer architecture. Proceedings of the 37th International Conference on Machine Learning (ICML). 2020; pp 10524–10533

  51. [51]

    Davidson, E. R. The iterative calculation of a few of the lowest eigenvalues and corresponding eigenvec- tors of large real-symmetric matrices.J. Comput. Phys.1975,17, 87–94

  52. [52]

    Sun, Q. et al. Recent developments in the PySCF program package.J. Chem. Phys.2020,153, 024109

  53. [53]

    Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn.1992,8, 229–256

  54. [54]

    P.; Ba, J

    Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization.arXiv2014, arXiv:1412.6980

  55. [55]

    Distributed Implementation of Full Configuration Inter- action for One Trillion Determinants.J

    Gao, H.; Imamura, S.; Kasagi, A.; Yoshida, E. Distributed Implementation of Full Configuration Inter- action for One Trillion Determinants.J. Chem. Theory Comput.2024,20, 1185–1192

  56. [56]

    Zhai, H.; Li, C.; Zhang, X.; Li, Z.; Lee, S.; Chan, G. K.-L. Classical computational simulation of the FeMo-cofactor model to chemical accuracy and its implications.arXiv2026, arXiv:2601.04621

  57. [57]

    M.; Wecker, D.; Troyer, M

    Reiher, M.; Wiebe, N.; Svore, K. M.; Wecker, D.; Troyer, M. Elucidating reaction mechanisms on quantum computers.Proc. Natl. Acad. Sci. USA2017,114, 7555–7560

  58. [58]

    Large language model scaling laws for neural quantum states in quantum chemistry.Mach

    Knitter, O.; Zhao, D.; Leichenauer, S.; Veerapaneni, S. Large language model scaling laws for neural quantum states in quantum chemistry.Mach. Learn.: Sci. Technol.2026,7, 025033, arXiv:2509.12679. 16 TOC Graphic placeholder. Replace this box with \includegraphics{toc_graphic}. JCTC recommended size: 8.5 cm×3.5 cm. 17