An Iterative Dual-Channel Neural Quantum State Algorithm for Selected Configuration Interaction

En-Jui Kuo; Hsiu-Chi Tsai; Jen-Yu Chang; Ming-Chun Yang; Nan Yow Chen; Tai-Yue Li; Tsung-Wei Huang; Yi-Chun Chang; Yu-Jui Lin

arxiv: 2606.26760 · v1 · pith:VUJ7PBMDnew · submitted 2026-06-25 · ⚛️ physics.chem-ph · quant-ph

An Iterative Dual-Channel Neural Quantum State Algorithm for Selected Configuration Interaction

Jen-Yu Chang , Yi-Chun Chang , Yu-Jui Lin , Ming-Chun Yang , Hsiu-Chi Tsai , Tai-Yue Li , Nan Yow Chen , Tsung-Wei Huang

show 1 more author

En-Jui Kuo

This is my paper

Pith reviewed 2026-06-26 02:46 UTC · model grok-4.3

classification ⚛️ physics.chem-ph quant-ph

keywords selected configuration interactionneural quantum statestransformerstrongly correlated electronselectronic structurequantum chemistryautoregressive sampling

0 comments

The pith

A dual-channel Transformer neural quantum state achieves chemical accuracy in selected configuration interaction with more favorable determinant scaling than CIPSI.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Handover Iterative Neural Quantum State (HI-NQS) algorithm that places a classically trained autoregressive Transformer inside the iterative sample-diagonalize-update loop of Sample-Based Quantum Diagonalization. A dual-channel architecture with spin-up and spin-down cross-attention encodes fermionic spin structure as an inductive bias, and after each subspace diagonalization the resulting eigenvector is distilled back into the network through a factorized spin-marginal teacher signal. This creates a closed feedback loop in which generative sampling becomes more efficient at locating chemically important configurations. Benchmarks on small molecules and a nitrogen active-space series show chemical accuracy on every system tested together with determinant-count scaling that is substantially better than conventional CIPSI-based selected configuration interaction for all but the smallest active spaces. A sympathetic reader would care because the exponential growth of configuration space has long limited exact solutions of the electronic Schrödinger equation for strongly correlated molecules, and this approach operates entirely on classical GPU hardware.

Core claim

The central claim is that embedding an autoregressive Transformer neural quantum state within the iterative framework of Sample-Based Quantum Diagonalization, using a dual-channel architecture with explicit spin cross-attention and distilling the subspace eigenvector via a factorized spin-marginal teacher signal, produces determinantal expansions that reach chemical accuracy while exhibiting substantially more favorable determinant-count scaling than CIPSI-based selected configuration interaction on the tested systems.

What carries the argument

The dual-channel autoregressive Transformer with spin-up/spin-down cross-attention, combined with the handover mechanism that distills the exact eigenvector into the network through a factorized spin-marginal teacher signal after each subspace diagonalization.

If this is right

HI-NQS reaches chemical accuracy on all small-molecule and nitrogen active-space systems tested.
Determinant-count scaling is substantially more favorable than CIPSI-based SCI for all but the smallest active spaces.
All calculations run on classical GPU hardware with no quantum computing resources required.
The closed feedback loop between generative sampling and exact diagonalization improves configuration selection efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The distillation step could be generalized to other iterative selected-configuration methods that already perform subspace diagonalizations.
The same dual-channel architecture might reduce sampling cost in related variational Monte Carlo approaches that lack an exact diagonalization step.
If the scaling advantage persists at larger active spaces, the method could become competitive with density-matrix renormalization group for quasi-one-dimensional strongly correlated systems.

Load-bearing premise

The factorized spin-marginal teacher signal obtained after each subspace diagonalization is sufficient to distill the exact eigenvector back into the autoregressive Transformer so that subsequent generative sampling identifies chemically important configurations more efficiently.

What would settle it

A benchmark on a larger active space or molecule in which the number of determinants required to reach chemical accuracy exceeds that of CIPSI-based SCI or fails to reach chemical accuracy altogether would falsify the reported scaling advantage.

Figures

Figures reproduced from arXiv: 2606.26760 by En-Jui Kuo, Hsiu-Chi Tsai, Jen-Yu Chang, Ming-Chun Yang, Nan Yow Chen, Tai-Yue Li, Tsung-Wei Huang, Yi-Chun Chang, Yu-Jui Lin.

**Figure 2.** Figure 2: The iterative HI-NQS handover loop. Iteration t = 0 (cold start). No eigenvector Ψ0 is yet available. Candidates are ranked by their diagonal Hamiltonian matrix element Hxx = ⟨x|Hˆ |x⟩. The K0 lowest-energy candidates form the initial basis B, with the cold-start budget K0 chosen no larger than the per-iteration budget K used in later iterations. Iteration t = 1 (full rescore). Once the first eigenvector Ψ… view at source ↗

**Figure 3.** Figure 3: Determinant count at natural convergence vs. qubit count for the N [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Accuracy–cost Pareto fronts: |E − Eref| vs. determinant count for four representative active spaces. CIPSI-SCI sweep (orange) and NQS variational (green, median ± IQR over ten seeds); the dotted line marks chemical accuracy (1.6 mHa). References are exact PySCF FCI except for CAS(14,20), where a Dice-SHCI+det-PT2 estimate (ε = 10−6 ) serves as an FCI proxy. 10 [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

read the original abstract

Accurately solving the electronic Schr\"{o}dinger equation for strongly correlated systems remains a central challenge in quantum chemistry, where the exponential growth of configuration space limits the applicability of exact methods. Selected Configuration Interaction (SCI) algorithms address this challenge by adaptively constructing compact determinantal expansions, yet their efficiency depends critically on the quality of the sampling strategy used to identify chemically important configurations. Here we introduce the Handover Iterative Neural Quantum State (HI-NQS) algorithm, which embeds a classically trained autoregressive Transformer neural quantum state within the iterative sample--diagonalize--update framework of Sample-Based Quantum Diagonalization. A dual-channel Transformer architecture with explicit spin-up/spin-down cross-attention encodes fermionic spin structure as an architectural inductive bias, enabling expressive and physically informed wavefunction representations. After each subspace diagonalization, the resulting eigenvector is distilled back into the network through a factorized spin-marginal teacher signal, establishing a closed feedback loop between generative sampling and exact diagonalization. Benchmarks across a range of small molecules and a systematic nitrogen active-space series demonstrate that HI-NQS achieves chemical accuracy on all systems tested, with determinant-count scaling substantially more favorable than conventional CIPSI-based SCI for all but the smallest active spaces. All calculations are performed on GPU hardware without quantum computing resources, establishing HI-NQS as an efficient and scalable purely classical approach to the selected configuration interaction problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HI-NQS adds a dual-channel Transformer with spin cross-attention and iterative distillation to the SCI loop, but the abstract gives no numbers to back the scaling and accuracy claims.

read the letter

The main new piece is the HI-NQS loop: an autoregressive Transformer that samples determinants, hands them to exact diagonalization, then distills the eigenvector back via factorized spin-marginal targets. The dual-channel architecture with explicit up/down cross-attention is a concrete architectural choice that bakes in fermionic spin structure rather than learning it from scratch.

That setup is coherent on paper and sits on top of existing NQS and SCI ideas without obvious circularity, since the teacher signal comes from independent diagonalization. The nitrogen active-space series is the right test bed for checking whether the method actually beats CIPSI scaling.

The soft spot is exactly the one the stress-test flags. Factorizing the teacher into separate spin marginals before feeding it back risks dropping joint correlations that matter in multi-reference cases. If that loss is real, the generative model will not reliably surface the important determinants that CIPSI already finds, and the claimed scaling edge disappears. The abstract states chemical accuracy and favorable determinant scaling but supplies none of the supporting numbers, error bars, or system sizes, so it is impossible to tell whether the loop actually works as described.

This is the kind of paper that belongs in a reading group for people who already follow neural quantum states or selected CI. A serious referee should see the full benchmarks and implementation details before any stronger judgment. I would send it to review rather than desk-reject, mainly to get the numerical evidence on the table.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces the Handover Iterative Neural Quantum State (HI-NQS) algorithm, which embeds a dual-channel autoregressive Transformer neural quantum state (with explicit spin-up/spin-down cross-attention) into the iterative sample-diagonalize-update loop of Sample-Based Quantum Diagonalization. After each subspace diagonalization, the exact eigenvector is distilled back into the network via a factorized spin-marginal teacher signal. The central claim is that this closed-loop procedure achieves chemical accuracy across small molecules and a systematic nitrogen active-space series while exhibiting substantially more favorable determinant-count scaling than conventional CIPSI-based SCI for all but the smallest active spaces.

Significance. If the reported chemical accuracy and scaling advantage are robustly supported by the benchmarks, the work would constitute a meaningful advance in classical selected-configuration-interaction methods for strongly correlated electrons. The architectural inductive bias for fermionic spin structure and the use of an exact diagonalization teacher signal independent of the network are positive features that distinguish the approach from purely variational neural quantum states.

major comments (2)

[Abstract and method description of the distillation step] Abstract and method description of the distillation step: the central performance claims (chemical accuracy and determinant scaling superior to CIPSI on the nitrogen series) rest on the assumption that the factorized spin-marginal teacher signal is informationally sufficient to distill the full eigenvector into the autoregressive Transformer. Because the factorization separates the up and down marginals before feedback, any joint spin correlations not captured by the marginals are lost; the manuscript provides no explicit analysis, ablation, or information-theoretic argument showing that the dual-channel cross-attention compensates for this loss in multi-reference regimes.
[Benchmarks on the nitrogen active-space series] Benchmarks on the nitrogen active-space series: the scaling advantage is asserted to hold for all but the smallest active spaces, yet the strength of this claim cannot be evaluated without the concrete determinant counts, energy errors, and error bars for the largest active spaces tested. If the factorized teacher signal is incomplete, the generative sampling would not preferentially identify the chemically important determinants that CIPSI already locates, undermining the reported scaling comparison.

minor comments (1)

[Computational details] The abstract states that all calculations are performed on GPU hardware without quantum resources; this is a useful clarification but should be repeated with hardware specifications in the computational-details section for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review. We address each major comment below, providing clarifications based on the manuscript content and indicating where revisions will strengthen the presentation.

read point-by-point responses

Referee: Abstract and method description of the distillation step: the central performance claims (chemical accuracy and determinant scaling superior to CIPSI on the nitrogen series) rest on the assumption that the factorized spin-marginal teacher signal is informationally sufficient to distill the full eigenvector into the autoregressive Transformer. Because the factorization separates the up and down marginals before feedback, any joint spin correlations not captured by the marginals are lost; the manuscript provides no explicit analysis, ablation, or information-theoretic argument showing that the dual-channel cross-attention compensates for this loss in multi-reference regimes.

Authors: We agree that the manuscript does not contain an explicit ablation study or information-theoretic quantification of information loss from the factorized marginals. The dual-channel Transformer with cross-attention is constructed precisely to allow the model to learn inter-spin correlations from the separate marginal teacher signals, and the exact diagonalization step supplies a complete target eigenvector at each iteration. Nevertheless, to directly address the concern, we will add a dedicated paragraph in the Methods section explaining the architectural inductive bias and how the closed feedback loop mitigates potential loss of joint correlations. This revision will be textual and will not require new calculations. revision: partial
Referee: Benchmarks on the nitrogen active-space series: the scaling advantage is asserted to hold for all but the smallest active spaces, yet the strength of this claim cannot be evaluated without the concrete determinant counts, energy errors, and error bars for the largest active spaces tested. If the factorized teacher signal is incomplete, the generative sampling would not preferentially identify the chemically important determinants that CIPSI already locates, undermining the reported scaling comparison.

Authors: Section 4 and the associated figures report chemical accuracy and improved determinant scaling for the nitrogen series relative to CIPSI. To make the quantitative basis of the scaling claim fully transparent, we will add a supplementary table listing, for each active space tested, the number of determinants retained, the energy error relative to the reference (FCI or DMRG where available), and any statistical error bars from multiple runs. These data already exist in our internal records and will allow readers to verify that the generative sampling preferentially recovers chemically relevant determinants. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on independent external diagonalization

full rationale

The HI-NQS method embeds an autoregressive Transformer within an iterative sample-diagonalize-update loop, where the eigenvector from each subspace diagonalization serves as an external teacher signal distilled via factorized spin-marginals. This signal is generated by exact diagonalization independent of the network parameters, and the reported chemical accuracy plus determinant scaling advantages are benchmarked against conventional CIPSI on external molecular systems. No equations or steps reduce the claimed predictions to fitted inputs by construction, nor do any load-bearing claims rest on self-citations or ansatzes imported from prior author work. The architecture and feedback loop remain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central performance claims rest on the assumption that a classically trained autoregressive Transformer can serve as an effective generative model for fermionic configurations and that the iterative distillation loop converges to chemically accurate eigenvectors; the network weights constitute a large set of fitted parameters whose values are not reported.

free parameters (1)

Transformer network weights
All parameters of the dual-channel autoregressive Transformer are trained on each molecular system to represent the wavefunction amplitudes.

axioms (1)

domain assumption The electronic wavefunction must be antisymmetric under particle exchange.
Standard requirement for fermionic systems invoked when encoding spin structure via the dual-channel architecture.

invented entities (1)

Dual-channel Transformer with explicit spin-up/spin-down cross-attention no independent evidence
purpose: To embed fermionic spin structure directly into the network architecture as an inductive bias for sampling.
New architectural component introduced to improve physical fidelity of the generative model.

pith-pipeline@v0.9.1-grok · 5816 in / 1360 out tokens · 62513 ms · 2026-06-26T02:46:37.808637+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 6 linked inside Pith

[1]

S.Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory; Macmillan: New York, 1982

Szabo, A.; Ostlund, N. S.Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory; Macmillan: New York, 1982

1982
[2]

P.; Rancurel, P

Huron, B.; Malrieu, J. P.; Rancurel, P. Iterative perturbation calculations of ground and excited state energies from multiconfigurational zeroth-order wavefunctions.J. Chem. Phys.1973,58, 5745–5759

1973
[3]

A.; Tubman, N

Holmes, A. A.; Tubman, N. M.; Umrigar, C. J. Heat-bath configuration interaction: An efficient selected configuration interaction algorithm inspired by heat-bath sampling.J. Chem. Theory Comput.2016, 12, 3674–3680

2016
[4]

A.; Jeanmairet, G.; Alavi, A.; Umrigar, C

Sharma, S.; Holmes, A. A.; Jeanmairet, G.; Alavi, A.; Umrigar, C. J. Semistochastic heat-bath configu- ration interaction method: Selected configuration interaction with semistochastic perturbation theory. J. Chem. Theory Comput.2017,13, 1595–1604

2017
[5]

M.; Freeman, C

Tubman, N. M.; Freeman, C. D.; Levine, D. S.; Hait, D.; Head-Gordon, M.; Whaley, K. B. Modern Ap- proaches to Exact Diagonalization and Selected Configuration Interaction with the Adaptive Sampling CI Method.J. Chem. Theory Comput.2020,16, 2139–2159

2020
[6]

B.; Evangelista, F

Schriber, J. B.; Evangelista, F. A. Communication: An adaptive configuration interaction approach for strongly correlated electrons with tunable accuracy.J. Chem. Phys.2016,144, 161106

2016
[7]

K.-L.; Sharma, S

Chan, G. K.-L.; Sharma, S. The density matrix renormalization group in quantum chemistry.Annu. Rev. Phys. Chem.2011,62, 465–481

2011
[8]

H.; Thom, A

Booth, G. H.; Thom, A. J. W.; Alavi, A. Fermion Monte Carlo without fixed nodes: a game of life, death, and annihilation in Slater determinant space.J. Chem. Phys.2009,131, 054106

2009
[9]

J.; Aspuru-Guzik, A.; O’Brien, J

Peruzzo, A.; McClean, J.; Shadbolt, P.; Yung, M.-H.; Zhou, X.-Q.; Love, P. J.; Aspuru-Guzik, A.; O’Brien, J. L. A variational eigenvalue solver on a photonic quantum processor.Nat. Commun.2014, 5, 4213

2014
[10]

R.; Romero, J.; Babbush, R.; Aspuru-Guzik, A

McClean, J. R.; Romero, J.; Babbush, R.; Aspuru-Guzik, A. The theory of variational hybrid quantum- classical algorithms.New J. Phys.2016,18, 023023

2016
[11]

R.; Economou, S

Grimsley, H. R.; Economou, S. E.; Barnes, E.; Mayhall, N. J. An adaptive variational algorithm for exact molecular simulations on a quantum computer.Nat. Commun.2019,10, 3007

2019
[12]

R.; Boixo, S.; Smelyanskiy, V

McClean, J. R.; Boixo, S.; Smelyanskiy, V. N.; Babbush, R.; Neven, H. Barren plateaus in quantum neural network training landscapes.Nat. Commun.2018,9, 4812. 13

2018
[13]

J.; Cincio, L.; McClean, J

Larocca, M.; Thanasilp, S.; Wang, S.; Sharma, K.; Biamonte, J.; Coles, P. J.; Cincio, L.; McClean, J. R.; Holmes, Z.; Cerezo, M. Barren plateaus in variational quantum computing.Nat. Rev. Phys.2025,7, 174–189

2025
[14]

B.; Troyer, M

Wecker, D.; Hastings, M. B.; Troyer, M. Progress towards practical quantum variational algorithms. Phys. Rev. A2015,92, 042303
[15]

F.; Radin, M

Gonthier, J. F.; Radin, M. D.; Buda, C.; Doskocil, E. J.; Abuan, C. M.; Romero, J. Measurements as a roadblock to near-term practical quantum advantage in chemistry: Resource analysis.Phys. Rev. Research2022,4, 033154
[16]

Robledo-Moreno, J. et al. Chemistry beyond the scale of exact diagonalization on a quantum-centric supercomputer.Sci. Adv.2025,11, eadu9991, arXiv:2405.05068

arXiv 2025
[17]

Yu, J. et al. Quantum-centric algorithm for sample-based Krylov diagonalization.arXiv2025, arXiv:2501.09702

arXiv
[18]

Kanno, K.; Kohda, M.; Imai, R.; Koh, S.; Mitarai, K.; Mizukami, W.; Nakagawa, Y. O. Quantum- selected configuration interaction: classical diagonalization of Hamiltonians in subspaces selected by quantum computers.Phys. Rev. Research2026,8, 023268, arXiv:2302.11320

Pith/arXiv arXiv
[19]

H.; Yoo, P.; Elala, E

Pellow-Jarman, A.; McFarthing, S.; Kang, D. H.; Yoo, P.; Elala, E. E.; Pellow-Jarman, R.; Nakliang, P. M.; Kim, J.; Rhee, J.-K. K. HIVQE: handover iterative variational quantum eigensolver for efficient quantum chemistry calculations.arXiv2025, arXiv:2503.06292

arXiv
[20]

Yoo, P. et al. Extending the handover-iterative VQE to challenging strongly correlated systems: N2 and Fe–S cluster.arXiv2026, arXiv:2601.06935

arXiv
[21]

Solving the quantum many-body problem with artificial neural networks.Science 2017,355, 602–606

Carleo, G.; Troyer, M. Solving the quantum many-body problem with artificial neural networks.Science 2017,355, 602–606

2017
[22]

S.; Matthews, A

Pfau, D.; Spencer, J. S.; Matthews, A. G. D. G.; Foulkes, W. M. C. Ab initio solution of the many- electron Schrödinger equation with deep neural networks.Phys. Rev. Research2020,2, 033429
[23]

Deep-neural-network solution of the electronic Schrödinger equation

Hermann, J.; Schätzle, Z.; Noé, F. Deep-neural-network solution of the electronic Schrödinger equation. Nat. Chem.2020,12, 891–897

2020
[24]

Fermionic neural-network states for ab-initio electronic structure

Choo, K.; Mezzacapo, A.; Carleo, G. Fermionic neural-network states for ab-initio electronic structure. Nat. Commun.2020,11, 2368

2020
[25]

From architectures to applications: a review of neural quantum states.Quantum Sci

Lange, H.; Van de Walle, A.; Abedinnia, A.; Bohrdt, A. From architectures to applications: a review of neural quantum states.Quantum Sci. Technol.2024,9, 040501

2024
[26]

Green function Monte Carlo with stochastic reconfiguration.Phys

Sorella, S. Green function Monte Carlo with stochastic reconfiguration.Phys. Rev. Lett.1998,80, 4558–4561

1998
[27]

A nonstochastic optimization algorithm for neural-network quantum states.J

Li, X.; Huang, J.-C.; Zhang, G.-Z.; Li, H.-E.; Cao, C.-s.; Lv, D.; Hu, H.-S. A nonstochastic optimization algorithm for neural-network quantum states.J. Chem. Theory Comput.2023,19, 8156–8165

2023
[28]

Empowering deep neural quantum states through efficient optimization.Nat

Chen, A.; Heyl, M. Empowering deep neural quantum states through efficient optimization.Nat. Phys. 2024,20, 1476–1481

2024
[29]

Schmerwitz, Y. L. A.; Thirion, L.; Levi, G.; Jónsson, E. O.; Bilous, P.; Jónsson, H.; Hansmann, P. Neural-Network-Based Selective Configuration Interaction Approach to Molecular Electronic Structure. J. Chem. Theory Comput.2025,21, 2301–2310, arXiv:2406.08154

arXiv 2025
[30]

W.; Pálffy, A.; Hansmann, P

Bilous, P.; Thirion, L.; Menke, H.; Haverkort, M. W.; Pálffy, A.; Hansmann, P. Neural-network- supported basis optimizer for the configuration interaction problem in quantum many-body clusters: Feasibility study and numerical proof.Phys. Rev. B2025,111, 035124, arXiv:2406.00151. 14

arXiv
[31]

Thirion, L.; Schmerwitz, Y. L. A.; Kroesbergen, M.; Levi, G.; Jónsson, E. O.; Bilous, P.; Jóns- son, H.; Hansmann, P. Natural-orbital-based neural network configuration interaction.arXiv2025, arXiv:2510.27665

arXiv
[32]

Coe, J. P. Machine Learning Configuration Interaction.J. Chem. Theory Comput.2018,14, 5739–5749

2018
[33]

Solving the Schrödinger Equation in the Configuration Space with Generative Machine Learning.J

Herzog, B.; Casier, B.; Lebègue, S.; Rocca, D. Solving the Schrödinger Equation in the Configuration Space with Generative Machine Learning.J. Chem. Theory Comput.2023,19, 2484–2490

2023
[34]

Sun, D. et al. A Fully GPU-Accelerated Framework for High-Performance Configuration Interaction Selection with Neural Network Quantum States.arXiv2026, arXiv:2604.15768

Pith/arXiv arXiv
[35]

NNQS-Transformer: An Efficient and Scalable Neural Network Quantum States Approach for Ab initio Quantum Chemistry

Wu, Y.; Guo, C.; Fan, Y.; Zhou, P.; Shang, H. NNQS-Transformer: An Efficient and Scalable Neural Network Quantum States Approach for Ab initio Quantum Chemistry. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’23). 2023

2023
[36]

Solving the many-electron Schrödinger equation with a transformer-based framework.Nat

Shang, H.; Guo, C.; Wu, Y.; Li, Z.; Yang, J. Solving the many-electron Schrödinger equation with a transformer-based framework.Nat. Commun.2025,16, 8464

2025
[37]

J.; Ding, L.; Reiher, M

Solanki, M. J.; Ding, L.; Reiher, M. Neural Quantum States Based on Selected Configurations.arXiv 2026, arXiv:2602.12993

arXiv 2026
[38]

Hamiltonian-guided autoregressive selected-configuration inter- action achieves chemical accuracy in strongly correlated systems.J

Zhang, H.; Zeng, X.; Li, Z.; Zhou, Y. Hamiltonian-guided autoregressive selected-configuration inter- action achieves chemical accuracy in strongly correlated systems.J. Chem. Theory Comput.2025,21, 12622–12633

2025
[39]

Stochastic beams and where to find them: The Gumbel-top-k trick for sampling sequences without replacement

Kool, W.; van Hoof, H.; Welling, M. Stochastic beams and where to find them: The Gumbel-top-k trick for sampling sequences without replacement. Proceedings of the 36th International Conference on Machine Learning (ICML). 2019; pp 3499–3508, arXiv:1903.06059

Pith/arXiv arXiv 2019
[40]

Auto-regressive neural quantum state sampling for selected configuration interaction.arXiv2026, arXiv:2603.24728

Thompson, S.; Gunlycke, D. Auto-regressive neural quantum state sampling for selected configuration interaction.arXiv2026, arXiv:2603.24728

arXiv
[41]

MADE: Masked autoencoder for distribution es- timation

Germain, M.; Gregor, K.; Murray, I.; Larochelle, H. MADE: Masked autoencoder for distribution es- timation. Proceedings of the 32nd International Conference on Machine Learning (ICML). 2015; pp 881–889

2015
[42]

Accelerating many-body quantum chemistry via generative Transformer-enhanced configuration interaction.J

Kan, B.; Shang, H. Accelerating many-body quantum chemistry via generative Transformer-enhanced configuration interaction.J. Chem. Theory Comput.2025,21, 11989–12000

2025
[43]

D.; Malyshev, A.; Lvovsky, A

Barrett, T. D.; Malyshev, A.; Lvovsky, A. I. Autoregressive neural-network wavefunctions forab initio quantum chemistry.Nat. Mach. Intell.2022,4, 351–358, arXiv:2109.12606

arXiv 2022
[44]

Deep autoregressive models for the efficient variational simulation of many-body quantum systems.Phys

Sharir, O.; Levine, Y.; Wies, N.; Carleo, G.; Shashua, A. Deep autoregressive models for the efficient variational simulation of many-body quantum systems.Phys. Rev. Lett.2020,124, 020503

2020
[45]

Malyshev, A.; Schmitt, M.; Lvovsky, A. I. Neural quantum states and peaked molecular wave functions: Curse or blessing?arXiv2024, arXiv:2408.07625

arXiv
[46]

Epstein, P. S. The Stark effect from the point of view of Schroedinger’s quantum theory.Phys. Rev. 1926,28, 695–710

1926
[47]

Nesbet, R. K. Configuration interaction in orbital theories.Proc. R. Soc. A1955,230, 312–321
[48]

N.; Kaiser, Ł.; Polosukhin, I

Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Advances in Neural Information Processing Systems 30 (NIPS 2017). 2017; arXiv:1706.03762

Pith/arXiv arXiv 2017
[49]

S.; Pfau, D

von Glehn, I.; Spencer, J. S.; Pfau, D. A self-attention ansatz forab-initioquantum chemistry. The Eleventh International Conference on Learning Representations (ICLR). 2023; arXiv:2211.13672. 15

arXiv 2023
[50]

On layer normalization in the Transformer architecture

Xiong, R.; Yang, Y.; He, D.; Zheng, K.; Zheng, S.; Xing, C.; Zhang, H.; Lan, Y.; Wang, L.; Liu, T.-Y. On layer normalization in the Transformer architecture. Proceedings of the 37th International Conference on Machine Learning (ICML). 2020; pp 10524–10533

2020
[51]

Davidson, E. R. The iterative calculation of a few of the lowest eigenvalues and corresponding eigenvec- tors of large real-symmetric matrices.J. Comput. Phys.1975,17, 87–94

1975
[52]

Sun, Q. et al. Recent developments in the PySCF program package.J. Chem. Phys.2020,153, 024109

2020
[53]

Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn.1992,8, 229–256

1992
[54]

P.; Ba, J

Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization.arXiv2014, arXiv:1412.6980

Pith/arXiv arXiv
[55]

Distributed Implementation of Full Configuration Inter- action for One Trillion Determinants.J

Gao, H.; Imamura, S.; Kasagi, A.; Yoshida, E. Distributed Implementation of Full Configuration Inter- action for One Trillion Determinants.J. Chem. Theory Comput.2024,20, 1185–1192

2024
[56]

Zhai, H.; Li, C.; Zhang, X.; Li, Z.; Lee, S.; Chan, G. K.-L. Classical computational simulation of the FeMo-cofactor model to chemical accuracy and its implications.arXiv2026, arXiv:2601.04621

Pith/arXiv arXiv
[57]

M.; Wecker, D.; Troyer, M

Reiher, M.; Wiebe, N.; Svore, K. M.; Wecker, D.; Troyer, M. Elucidating reaction mechanisms on quantum computers.Proc. Natl. Acad. Sci. USA2017,114, 7555–7560
[58]

Large language model scaling laws for neural quantum states in quantum chemistry.Mach

Knitter, O.; Zhao, D.; Leichenauer, S.; Veerapaneni, S. Large language model scaling laws for neural quantum states in quantum chemistry.Mach. Learn.: Sci. Technol.2026,7, 025033, arXiv:2509.12679. 16 TOC Graphic placeholder. Replace this box with \includegraphics{toc_graphic}. JCTC recommended size: 8.5 cm×3.5 cm. 17

arXiv 2026

[1] [1]

S.Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory; Macmillan: New York, 1982

Szabo, A.; Ostlund, N. S.Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory; Macmillan: New York, 1982

1982

[2] [2]

P.; Rancurel, P

Huron, B.; Malrieu, J. P.; Rancurel, P. Iterative perturbation calculations of ground and excited state energies from multiconfigurational zeroth-order wavefunctions.J. Chem. Phys.1973,58, 5745–5759

1973

[3] [3]

A.; Tubman, N

Holmes, A. A.; Tubman, N. M.; Umrigar, C. J. Heat-bath configuration interaction: An efficient selected configuration interaction algorithm inspired by heat-bath sampling.J. Chem. Theory Comput.2016, 12, 3674–3680

2016

[4] [4]

A.; Jeanmairet, G.; Alavi, A.; Umrigar, C

Sharma, S.; Holmes, A. A.; Jeanmairet, G.; Alavi, A.; Umrigar, C. J. Semistochastic heat-bath configu- ration interaction method: Selected configuration interaction with semistochastic perturbation theory. J. Chem. Theory Comput.2017,13, 1595–1604

2017

[5] [5]

M.; Freeman, C

Tubman, N. M.; Freeman, C. D.; Levine, D. S.; Hait, D.; Head-Gordon, M.; Whaley, K. B. Modern Ap- proaches to Exact Diagonalization and Selected Configuration Interaction with the Adaptive Sampling CI Method.J. Chem. Theory Comput.2020,16, 2139–2159

2020

[6] [6]

B.; Evangelista, F

Schriber, J. B.; Evangelista, F. A. Communication: An adaptive configuration interaction approach for strongly correlated electrons with tunable accuracy.J. Chem. Phys.2016,144, 161106

2016

[7] [7]

K.-L.; Sharma, S

Chan, G. K.-L.; Sharma, S. The density matrix renormalization group in quantum chemistry.Annu. Rev. Phys. Chem.2011,62, 465–481

2011

[8] [8]

H.; Thom, A

Booth, G. H.; Thom, A. J. W.; Alavi, A. Fermion Monte Carlo without fixed nodes: a game of life, death, and annihilation in Slater determinant space.J. Chem. Phys.2009,131, 054106

2009

[9] [9]

J.; Aspuru-Guzik, A.; O’Brien, J

Peruzzo, A.; McClean, J.; Shadbolt, P.; Yung, M.-H.; Zhou, X.-Q.; Love, P. J.; Aspuru-Guzik, A.; O’Brien, J. L. A variational eigenvalue solver on a photonic quantum processor.Nat. Commun.2014, 5, 4213

2014

[10] [10]

R.; Romero, J.; Babbush, R.; Aspuru-Guzik, A

McClean, J. R.; Romero, J.; Babbush, R.; Aspuru-Guzik, A. The theory of variational hybrid quantum- classical algorithms.New J. Phys.2016,18, 023023

2016

[11] [11]

R.; Economou, S

Grimsley, H. R.; Economou, S. E.; Barnes, E.; Mayhall, N. J. An adaptive variational algorithm for exact molecular simulations on a quantum computer.Nat. Commun.2019,10, 3007

2019

[12] [12]

R.; Boixo, S.; Smelyanskiy, V

McClean, J. R.; Boixo, S.; Smelyanskiy, V. N.; Babbush, R.; Neven, H. Barren plateaus in quantum neural network training landscapes.Nat. Commun.2018,9, 4812. 13

2018

[13] [13]

J.; Cincio, L.; McClean, J

Larocca, M.; Thanasilp, S.; Wang, S.; Sharma, K.; Biamonte, J.; Coles, P. J.; Cincio, L.; McClean, J. R.; Holmes, Z.; Cerezo, M. Barren plateaus in variational quantum computing.Nat. Rev. Phys.2025,7, 174–189

2025

[14] [14]

B.; Troyer, M

Wecker, D.; Hastings, M. B.; Troyer, M. Progress towards practical quantum variational algorithms. Phys. Rev. A2015,92, 042303

[15] [15]

F.; Radin, M

Gonthier, J. F.; Radin, M. D.; Buda, C.; Doskocil, E. J.; Abuan, C. M.; Romero, J. Measurements as a roadblock to near-term practical quantum advantage in chemistry: Resource analysis.Phys. Rev. Research2022,4, 033154

[16] [16]

Robledo-Moreno, J. et al. Chemistry beyond the scale of exact diagonalization on a quantum-centric supercomputer.Sci. Adv.2025,11, eadu9991, arXiv:2405.05068

arXiv 2025

[17] [17]

Yu, J. et al. Quantum-centric algorithm for sample-based Krylov diagonalization.arXiv2025, arXiv:2501.09702

arXiv

[18] [18]

Kanno, K.; Kohda, M.; Imai, R.; Koh, S.; Mitarai, K.; Mizukami, W.; Nakagawa, Y. O. Quantum- selected configuration interaction: classical diagonalization of Hamiltonians in subspaces selected by quantum computers.Phys. Rev. Research2026,8, 023268, arXiv:2302.11320

Pith/arXiv arXiv

[19] [19]

H.; Yoo, P.; Elala, E

Pellow-Jarman, A.; McFarthing, S.; Kang, D. H.; Yoo, P.; Elala, E. E.; Pellow-Jarman, R.; Nakliang, P. M.; Kim, J.; Rhee, J.-K. K. HIVQE: handover iterative variational quantum eigensolver for efficient quantum chemistry calculations.arXiv2025, arXiv:2503.06292

arXiv

[20] [20]

Yoo, P. et al. Extending the handover-iterative VQE to challenging strongly correlated systems: N2 and Fe–S cluster.arXiv2026, arXiv:2601.06935

arXiv

[21] [21]

Solving the quantum many-body problem with artificial neural networks.Science 2017,355, 602–606

Carleo, G.; Troyer, M. Solving the quantum many-body problem with artificial neural networks.Science 2017,355, 602–606

2017

[22] [22]

S.; Matthews, A

Pfau, D.; Spencer, J. S.; Matthews, A. G. D. G.; Foulkes, W. M. C. Ab initio solution of the many- electron Schrödinger equation with deep neural networks.Phys. Rev. Research2020,2, 033429

[23] [23]

Deep-neural-network solution of the electronic Schrödinger equation

Hermann, J.; Schätzle, Z.; Noé, F. Deep-neural-network solution of the electronic Schrödinger equation. Nat. Chem.2020,12, 891–897

2020

[24] [24]

Fermionic neural-network states for ab-initio electronic structure

Choo, K.; Mezzacapo, A.; Carleo, G. Fermionic neural-network states for ab-initio electronic structure. Nat. Commun.2020,11, 2368

2020

[25] [25]

From architectures to applications: a review of neural quantum states.Quantum Sci

Lange, H.; Van de Walle, A.; Abedinnia, A.; Bohrdt, A. From architectures to applications: a review of neural quantum states.Quantum Sci. Technol.2024,9, 040501

2024

[26] [26]

Green function Monte Carlo with stochastic reconfiguration.Phys

Sorella, S. Green function Monte Carlo with stochastic reconfiguration.Phys. Rev. Lett.1998,80, 4558–4561

1998

[27] [27]

A nonstochastic optimization algorithm for neural-network quantum states.J

Li, X.; Huang, J.-C.; Zhang, G.-Z.; Li, H.-E.; Cao, C.-s.; Lv, D.; Hu, H.-S. A nonstochastic optimization algorithm for neural-network quantum states.J. Chem. Theory Comput.2023,19, 8156–8165

2023

[28] [28]

Empowering deep neural quantum states through efficient optimization.Nat

Chen, A.; Heyl, M. Empowering deep neural quantum states through efficient optimization.Nat. Phys. 2024,20, 1476–1481

2024

[29] [29]

Schmerwitz, Y. L. A.; Thirion, L.; Levi, G.; Jónsson, E. O.; Bilous, P.; Jónsson, H.; Hansmann, P. Neural-Network-Based Selective Configuration Interaction Approach to Molecular Electronic Structure. J. Chem. Theory Comput.2025,21, 2301–2310, arXiv:2406.08154

arXiv 2025

[30] [30]

W.; Pálffy, A.; Hansmann, P

Bilous, P.; Thirion, L.; Menke, H.; Haverkort, M. W.; Pálffy, A.; Hansmann, P. Neural-network- supported basis optimizer for the configuration interaction problem in quantum many-body clusters: Feasibility study and numerical proof.Phys. Rev. B2025,111, 035124, arXiv:2406.00151. 14

arXiv

[31] [31]

Thirion, L.; Schmerwitz, Y. L. A.; Kroesbergen, M.; Levi, G.; Jónsson, E. O.; Bilous, P.; Jóns- son, H.; Hansmann, P. Natural-orbital-based neural network configuration interaction.arXiv2025, arXiv:2510.27665

arXiv

[32] [32]

Coe, J. P. Machine Learning Configuration Interaction.J. Chem. Theory Comput.2018,14, 5739–5749

2018

[33] [33]

Solving the Schrödinger Equation in the Configuration Space with Generative Machine Learning.J

Herzog, B.; Casier, B.; Lebègue, S.; Rocca, D. Solving the Schrödinger Equation in the Configuration Space with Generative Machine Learning.J. Chem. Theory Comput.2023,19, 2484–2490

2023

[34] [34]

Sun, D. et al. A Fully GPU-Accelerated Framework for High-Performance Configuration Interaction Selection with Neural Network Quantum States.arXiv2026, arXiv:2604.15768

Pith/arXiv arXiv

[35] [35]

NNQS-Transformer: An Efficient and Scalable Neural Network Quantum States Approach for Ab initio Quantum Chemistry

Wu, Y.; Guo, C.; Fan, Y.; Zhou, P.; Shang, H. NNQS-Transformer: An Efficient and Scalable Neural Network Quantum States Approach for Ab initio Quantum Chemistry. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’23). 2023

2023

[36] [36]

Solving the many-electron Schrödinger equation with a transformer-based framework.Nat

Shang, H.; Guo, C.; Wu, Y.; Li, Z.; Yang, J. Solving the many-electron Schrödinger equation with a transformer-based framework.Nat. Commun.2025,16, 8464

2025

[37] [37]

J.; Ding, L.; Reiher, M

Solanki, M. J.; Ding, L.; Reiher, M. Neural Quantum States Based on Selected Configurations.arXiv 2026, arXiv:2602.12993

arXiv 2026

[38] [38]

Hamiltonian-guided autoregressive selected-configuration inter- action achieves chemical accuracy in strongly correlated systems.J

Zhang, H.; Zeng, X.; Li, Z.; Zhou, Y. Hamiltonian-guided autoregressive selected-configuration inter- action achieves chemical accuracy in strongly correlated systems.J. Chem. Theory Comput.2025,21, 12622–12633

2025

[39] [39]

Stochastic beams and where to find them: The Gumbel-top-k trick for sampling sequences without replacement

Kool, W.; van Hoof, H.; Welling, M. Stochastic beams and where to find them: The Gumbel-top-k trick for sampling sequences without replacement. Proceedings of the 36th International Conference on Machine Learning (ICML). 2019; pp 3499–3508, arXiv:1903.06059

Pith/arXiv arXiv 2019

[40] [40]

Auto-regressive neural quantum state sampling for selected configuration interaction.arXiv2026, arXiv:2603.24728

Thompson, S.; Gunlycke, D. Auto-regressive neural quantum state sampling for selected configuration interaction.arXiv2026, arXiv:2603.24728

arXiv

[41] [41]

MADE: Masked autoencoder for distribution es- timation

Germain, M.; Gregor, K.; Murray, I.; Larochelle, H. MADE: Masked autoencoder for distribution es- timation. Proceedings of the 32nd International Conference on Machine Learning (ICML). 2015; pp 881–889

2015

[42] [42]

Accelerating many-body quantum chemistry via generative Transformer-enhanced configuration interaction.J

Kan, B.; Shang, H. Accelerating many-body quantum chemistry via generative Transformer-enhanced configuration interaction.J. Chem. Theory Comput.2025,21, 11989–12000

2025

[43] [43]

D.; Malyshev, A.; Lvovsky, A

Barrett, T. D.; Malyshev, A.; Lvovsky, A. I. Autoregressive neural-network wavefunctions forab initio quantum chemistry.Nat. Mach. Intell.2022,4, 351–358, arXiv:2109.12606

arXiv 2022

[44] [44]

Deep autoregressive models for the efficient variational simulation of many-body quantum systems.Phys

Sharir, O.; Levine, Y.; Wies, N.; Carleo, G.; Shashua, A. Deep autoregressive models for the efficient variational simulation of many-body quantum systems.Phys. Rev. Lett.2020,124, 020503

2020

[45] [45]

Malyshev, A.; Schmitt, M.; Lvovsky, A. I. Neural quantum states and peaked molecular wave functions: Curse or blessing?arXiv2024, arXiv:2408.07625

arXiv

[46] [46]

Epstein, P. S. The Stark effect from the point of view of Schroedinger’s quantum theory.Phys. Rev. 1926,28, 695–710

1926

[47] [47]

Nesbet, R. K. Configuration interaction in orbital theories.Proc. R. Soc. A1955,230, 312–321

[48] [48]

N.; Kaiser, Ł.; Polosukhin, I

Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Advances in Neural Information Processing Systems 30 (NIPS 2017). 2017; arXiv:1706.03762

Pith/arXiv arXiv 2017

[49] [49]

S.; Pfau, D

von Glehn, I.; Spencer, J. S.; Pfau, D. A self-attention ansatz forab-initioquantum chemistry. The Eleventh International Conference on Learning Representations (ICLR). 2023; arXiv:2211.13672. 15

arXiv 2023

[50] [50]

On layer normalization in the Transformer architecture

Xiong, R.; Yang, Y.; He, D.; Zheng, K.; Zheng, S.; Xing, C.; Zhang, H.; Lan, Y.; Wang, L.; Liu, T.-Y. On layer normalization in the Transformer architecture. Proceedings of the 37th International Conference on Machine Learning (ICML). 2020; pp 10524–10533

2020

[51] [51]

Davidson, E. R. The iterative calculation of a few of the lowest eigenvalues and corresponding eigenvec- tors of large real-symmetric matrices.J. Comput. Phys.1975,17, 87–94

1975

[52] [52]

Sun, Q. et al. Recent developments in the PySCF program package.J. Chem. Phys.2020,153, 024109

2020

[53] [53]

Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn.1992,8, 229–256

1992

[54] [54]

P.; Ba, J

Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization.arXiv2014, arXiv:1412.6980

Pith/arXiv arXiv

[55] [55]

Distributed Implementation of Full Configuration Inter- action for One Trillion Determinants.J

Gao, H.; Imamura, S.; Kasagi, A.; Yoshida, E. Distributed Implementation of Full Configuration Inter- action for One Trillion Determinants.J. Chem. Theory Comput.2024,20, 1185–1192

2024

[56] [56]

Zhai, H.; Li, C.; Zhang, X.; Li, Z.; Lee, S.; Chan, G. K.-L. Classical computational simulation of the FeMo-cofactor model to chemical accuracy and its implications.arXiv2026, arXiv:2601.04621

Pith/arXiv arXiv

[57] [57]

M.; Wecker, D.; Troyer, M

Reiher, M.; Wiebe, N.; Svore, K. M.; Wecker, D.; Troyer, M. Elucidating reaction mechanisms on quantum computers.Proc. Natl. Acad. Sci. USA2017,114, 7555–7560

[58] [58]

Large language model scaling laws for neural quantum states in quantum chemistry.Mach

Knitter, O.; Zhao, D.; Leichenauer, S.; Veerapaneni, S. Large language model scaling laws for neural quantum states in quantum chemistry.Mach. Learn.: Sci. Technol.2026,7, 025033, arXiv:2509.12679. 16 TOC Graphic placeholder. Replace this box with \includegraphics{toc_graphic}. JCTC recommended size: 8.5 cm×3.5 cm. 17

arXiv 2026