Is Variational Monte Carlo Robust? Sharp Moment Thresholds and Heavy-tailed Stochastic Optimization

Davide Nobile; Philipp Grohs

arxiv: 2606.26009 · v1 · pith:EY7KNLTCnew · submitted 2026-06-24 · 💻 cs.LG

Is Variational Monte Carlo Robust? Sharp Moment Thresholds and Heavy-tailed Stochastic Optimization

Philipp Grohs , Davide Nobile This is my paper

Pith reviewed 2026-06-25 19:37 UTC · model grok-4.3

classification 💻 cs.LG

keywords variational monte carlonodal setsheavy-tailed estimatorsstochastic optimizationclipped VMCwave function ansatzeelectronic structure

0 comments

The pith

Nodal geometry of trial wave functions determines integrability thresholds for the stochastic estimators that drive Variational Monte Carlo.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Variational Monte Carlo minimizes the Rayleigh quotient using stochastic estimates of local energy and gradients, yet the statistical behavior of these estimates is controlled by where the trial wave function vanishes. The paper proves that for Slater-Jastrow ansatze with variable-exponent Slater-type orbitals the estimators are generically heavy-tailed and lack moments of order greater than one. For general analytic ansatze the same analysis yields precise weak-moment regimes whose thresholds shift according to whether the nodal set is generic or degenerate. A clipped variant called PS-Clip-VMC is introduced and shown to converge both in expectation and with high probability inside the low-moment regime. This matters because modern neural-network ansatze are trained with exactly these estimators, and their tail behavior directly affects training stability.

Core claim

The paper claims that properties of the nodal set determine the integrability of the local energy and gradient estimators that drive VMC. For broad and practically relevant ansatz classes, including Slater-Jastrow wave functions with variable-exponent Slater-type orbitals, these estimators are generically heavy-tailed and fail to admit higher moments. At the same time, for general analytic ansatze, weak moment bounds are established and precise low-moment regimes are identified, showing how generic and degenerate nodal structures lead to different integrability thresholds. Building on this analysis, PS-Clip-VMC, which clips both the local energy and the gradient random variable, is proved to

What carries the argument

The nodal set of the wave function ansatz, whose local geometry controls the order of singularities in the local energy and thereby fixes the moment thresholds of the Monte Carlo estimators.

If this is right

Local energy and gradient estimators for Slater-Jastrow ansatze with variable-exponent STOs generically lack moments beyond order one.
PS-Clip-VMC converges both in expectation and with high probability whenever VMC operates in its weak-moment regime.
Generic versus degenerate nodal structures produce distinct integrability thresholds for analytic ansatze.
Preliminary numerical tests indicate that clipping improves robustness when training FermiNet on atoms with up to 18 electrons.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Neural-network ansatze may inherit similar heavy-tail behavior unless their learned nodal sets happen to be degenerate in the favorable sense.
Ansatz engineering that deliberately enforces more degenerate nodes could raise the moment thresholds without clipping.
The same nodal-analysis approach may apply to other local-energy-based quantum Monte Carlo methods that rely on the same estimators.

Load-bearing premise

The nodal structures of the specific ansatz classes examined are either generic or degenerate exactly as required to produce the stated integrability thresholds.

What would settle it

A direct Monte Carlo sampling experiment on a simple Slater-Jastrow wave function with variable-exponent STOs that measures whether the sample variance of the local energy remains finite or diverges as sample size grows.

Figures

Figures reproduced from arXiv: 2606.26009 by Davide Nobile, Philipp Grohs.

read the original abstract

Variational Monte Carlo (VMC) is a central algorithm in electronic structure theory and has gained renewed importance through modern neural-network ans\"atze such as FermiNet. At its core, VMC seeks ground states by minimizing the Rayleigh quotient by stochastic optimization. In this work, we show that the resulting stochastic optimization problem is intrinsically governed by the nodal geometry of the underlying wave function. More precisely, we establish that properties of the nodal set determine the integrability of the local energy and gradient estimators that drive VMC. For broad and practically relevant ansatz classes, including Slater-Jastrow wave functions with variable-exponent Slater-type orbitals, we prove that these estimators are generically heavy-tailed and fail to admit higher moments. At the same time, for general analytic ans\"atze, we prove weak moment bounds for the relevant estimators and identify precise low-moment regimes, showing how generic and degenerate nodal structures lead to different integrability thresholds. Building on this analysis, we introduce a new robust variant of VMC $\unicode{x2013}$ coined PS-Clip-VMC $\unicode{x2013}$ which is based on clipping both the local energy and the gradient random variable. We prove that PS-Clip-VMC converges both in expectation and with high probability in the weak moment regime of VMC. Preliminary experiments for training FermiNet on Atoms with up to 18 electrons suggest that PS-Clip-VMC is significantly more robust than standard methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Nodal geometry sets VMC moment thresholds and PS-Clip-VMC converges under weak moments, but experiments stay preliminary.

read the letter

The main point is that the integrability of local energy and gradient estimators in VMC is governed by the nodal set of the wave function. For Slater-Jastrow ansatze with variable-exponent STOs the estimators are generically heavy-tailed with no higher moments, while analytic ansatze admit weak moment bounds that depend on whether the nodes are generic or degenerate. From there the authors construct PS-Clip-VMC, which clips both quantities, and prove convergence in expectation and with high probability in the weak-moment regime.

What the paper does cleanly is make the nodal-geometry link explicit and supply the corresponding moment thresholds plus the convergence argument for the clipped variant. That combination is not in the earlier VMC literature referenced in the abstract, and the proofs rest on external integrability results rather than circular fitting.

The soft spot is the experimental section. The abstract only calls the FermiNet runs on atoms up to 18 electrons “preliminary” and gives no baselines, quantitative gains, or failure-mode details, so it is hard to judge practical impact yet. The theory is tied to the specific ansatz families considered, which is stated but narrows immediate generality.

This is for researchers who train neural or Slater-Jastrow wave functions with stochastic optimization and care about robustness when moments are low. A reader who wants the moment analysis or the new algorithm will find usable material.

It deserves a serious referee because the central claims are sharp, formally stated, and address a real bottleneck in VMC. I would send it to peer review, expecting the experiments to be expanded and the proofs checked in detail.

Referee Report

0 major / 2 minor

Summary. The paper claims that Variational Monte Carlo (VMC) stochastic optimization is governed by the nodal geometry of the wave-function ansatz. Properties of the nodal set control the integrability of the local-energy and gradient estimators. For Slater-Jastrow ansatze with variable-exponent Slater-type orbitals the estimators are shown to be generically heavy-tailed and to lack higher moments; for general analytic ansatze weak-moment bounds are derived and precise low-moment regimes are identified according to whether nodal structures are generic or degenerate. A clipped variant (PS-Clip-VMC) is introduced and proved to converge in expectation and with high probability in the weak-moment regime. Preliminary experiments on FermiNet for atoms up to 18 electrons indicate improved robustness over standard VMC.

Significance. If the nodal-set analysis and convergence proofs hold, the work supplies a rigorous explanation for the origin of instability in VMC and supplies a provably convergent robust variant. The explicit moment-threshold results for practically relevant ansatz classes and the identification of the weak-moment regime constitute a substantive theoretical contribution to stochastic optimization in quantum Monte Carlo. The preliminary numerical evidence on neural-network ansatze is consistent with the theory but remains limited in scope.

minor comments (2)

The abstract states that 'preliminary experiments' suggest improved robustness, yet no quantitative baselines, variance estimates, or statistical significance tests are referenced in the provided summary; adding these details would strengthen the empirical section.
Notation for the clipping thresholds and the precise definition of the 'weak moment regime' should be introduced earlier and used consistently throughout the convergence statements.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, recognition of the theoretical contribution on nodal-set controlled moment thresholds, and the recommendation of minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper establishes its central claims via explicit mathematical proofs that nodal geometry controls integrability thresholds for local-energy and gradient estimators, with separate results for Slater-Jastrow variable-exponent STOs (heavy tails) and general analytic ansatze (weak-moment regimes), followed by a convergence argument for the clipped PS-Clip-VMC variant. No step reduces a derived quantity to a fitted parameter, self-definition, or load-bearing self-citation; the dependence on the declared ansatz classes is stated outright rather than smuggled in. The derivation is therefore self-contained against its own stated assumptions and external mathematical analysis of integrability.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the claims rest on unstated mathematical assumptions about wave-function classes and nodal sets that cannot be audited here.

pith-pipeline@v0.9.1-grok · 5792 in / 1139 out tokens · 18938 ms · 2026-06-25T19:37:55.534028+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 12 canonical work pages

[1]

Convergence of variational Monte Carlo simulation and scale-invariant pre-training

Nilin Abrahamsen et al. “Convergence of variational Monte Carlo simulation and scale-invariant pre-training”. In:Journal of Computational Physics513 (2024), p. 113140.doi:10 . 1016 / j . jcp . 2024.113140

arXiv 2024
[2]

Bounds on exponential decay of eigenfunctions of Schr¨ odinger operators

Shmuel Agmon. “Bounds on exponential decay of eigenfunctions of Schr¨ odinger operators”. In:Schr¨ odinger Operators: Lectures given at the 2nd 1984 Session of the Centro Internationale Matematico Estivo (CIME) held at Como, Italy, Aug. 26–Sept. 4, 1984. Springer, 2006, pp. 1–38

1984
[3]

Bernstein operators for exponential polynomi- als

JM Aldaz, Ognyan Kounchev, and Hermann Render. “Bernstein operators for exponential polynomi- als”. In:Constructive Approximation29.3 (2009), pp. 345–367

2009
[4]

Functional Neural Wavefunction Optimization

Victor Armegioiu et al. “Functional Neural Wavefunction Optimization”. In:arXiv preprint arXiv:2507.10835 (2025)

arXiv 2025
[5]

Cam- bridge University Press, 2017

Federico Becca and Sandro Sorella.Quantum Monte Carlo approaches for correlated systems. Cam- bridge University Press, 2017

2017
[6]

Bandits With Heavy Tail

Sebastien Bubeck, Nicolo Cesa-Bianchi, and Gabor Lugosi. “Bandits With Heavy Tail”. In:IEEE Trans. Inf. Theor.59.11 (2013), pp. 7711–7717

2013
[7]

Ground-state correlation energies for atomic ions with 3 to 18 electrons

Subhas J. Chakravorty et al. “Ground-state correlation energies for atomic ions with 3 to 18 electrons”. In:Physical Review A47.5 (1993), pp. 3649–3670

1993
[8]

Ab-Initio Potential Energy Surfaces by Pairing GNNs with Neural Wave Functions

Nicholas Gao and Stephan G¨ unnemann. “Ab-Initio Potential Energy Surfaces by Pairing GNNs with Neural Wave Functions”. In:International Conference on Learning Representations. 2022.doi:10. 48550/arXiv.2110.05064. arXiv:2110.05064

arXiv 2022
[9]

Generalizing Neural Wave Functions

Nicholas Gao and Stephan G¨ unnemann. “Generalizing Neural Wave Functions”. In:Proceedings of the 40th International Conference on Machine Learning. Vol. 202. Proceedings of Machine Learning Research. PMLR, 2023, pp. 10708–10726.doi:10.48550/arXiv.2302.04168. arXiv:2302.04168

work page doi:10.48550/arxiv.2302.04168 2023
[10]

Sampling-free Inference for Ab-Initio Potential Energy Sur- face Networks

Nicholas Gao and Stephan G¨ unnemann. “Sampling-free Inference for Ab-Initio Potential Energy Sur- face Networks”. In:The Eleventh International Conference on Learning Representations. 2023.doi: 10.48550/arXiv.2205.14962. arXiv:2205.14962

work page doi:10.48550/arxiv.2205.14962 2023
[11]

Deep learning variational Monte Carlo for solving the electronic Schr¨ odinger equation

Leon Gerard, Philipp Grohs, and Michael Scherbela. “Deep learning variational Monte Carlo for solving the electronic Schr¨ odinger equation”. In:Numerical Analysis Meets Machine Learning. Vol. 25. Hand- book of Numerical Analysis. North-Holland, 2024, pp. 231–292.doi:10.1016/bs.hna.2024.05.010

work page doi:10.1016/bs.hna.2024.05.010 2024
[12]

Gold-standard solutions to the Schr¨ odinger equation using deep learning: How much physics do we need?

Leon Gerard et al. “Gold-standard solutions to the Schr¨ odinger equation using deep learning: How much physics do we need?” In:Advances in Neural Information Processing Systems. Vol. 35. 2022, pp. 10282–10294.doi:10.48550/arXiv.2205.09438. arXiv:2205.09438

work page doi:10.48550/arxiv.2205.09438 2022
[13]

Nature Computational Science5(12), 1147–1157 (2025) https://doi.org/10.1038/s43588-025-00872-z

Leon Gerard et al. “Transferable neural wavefunctions for solids”. In:Nature Computational Science 5 (2025), pp. 1147–1157.doi:10.1038/s43588-025-00872-z. arXiv:2405.07599

work page doi:10.1038/s43588-025-00872-z 2025
[14]

A Self-Attention Ansatz for Ab-initio Quantum Chemistry

Ingrid von Glehn, James S. Spencer, and David Pfau. “A Self-Attention Ansatz for Ab-initio Quantum Chemistry”. In:The Eleventh International Conference on Learning Representations. 2023.doi:10. 48550/arXiv.2211.13672. arXiv:2211.13672

arXiv 2023
[15]

GitHub repository

Google DeepMind.FermiNet. GitHub repository. 2020.url:https://github.com/google-deepmind/ ferminet

2020
[16]

Stable Gabor phase retrieval for multivariate functions

Philipp Grohs and Martin Rathmair. “Stable Gabor phase retrieval for multivariate functions”. In: Journal of the European Mathematical Society24.5 (2021), pp. 1593–1615

2021
[17]

Stephen J Gustafson et al.Mathematical concepts of quantum mechanics. Vol. 33. Springer, 2003

2003
[18]

John Wiley & Sons, 2013

Trygve Helgaker, Poul Jorgensen, and Jeppe Olsen.Molecular electronic-structure theory. John Wiley & Sons, 2013

2013
[19]

Deep-neural-network solution of the electronic Schr¨ odinger equation

Jan Hermann, Zeno Sch¨ atzle, and Frank No´ e. “Deep-neural-network solution of the electronic Schr¨ odinger equation”. In:Nature Chemistry12.10 (2020), pp. 891–897

2020
[20]

Ab initio quantum chemistry with neural-network wavefunctions

Jan Hermann et al. “Ab initio quantum chemistry with neural-network wavefunctions”. In:Nature Reviews Chemistry7 (2023), pp. 692–709.doi:10.1038/s41570-023-00516-8. 30

work page doi:10.1038/s41570-023-00516-8 2023
[21]

Springer Science & Business Media, 2012

Morris W Hirsch.Differential topology. Springer Science & Business Media, 2012

2012
[22]

On the eigenfunctions of many-particle systems in quantum mechanics

Tosio Kato. “On the eigenfunctions of many-particle systems in quantum mechanics”. In:Communi- cations on Pure and Applied Mathematics10.2 (1957), pp. 151–177

1957
[23]

Tosio Kato.Perturbation theory for linear operators. Second. Grundlehren der Mathematischen Wis- senschaften, Band 132. Springer-Verlag, Berlin-New York, 1976, pp. xxi+619

1976
[24]

Sub-sampled cubic regularization for non-convex opti- mization

Jonas Moritz Kohler and Aurelien Lucchi. “Sub-sampled cubic regularization for non-convex opti- mization”. In:Proceedings of the 34th International Conference on Machine Learning - Volume 70. ICML’17. 2017, pp. 1895–1904

2017
[25]

Korteweg-de Vries Institute for Math- ematics Amsterdam, 2017

Jacob Korevaar and Jan Wiegerinck.Several complex variables. Korteweg-de Vries Institute for Math- ematics Amsterdam, 2017

2017
[26]

Tianyou Li et al.Convergence Analysis of Stochastic Gradient Descent with MCMC Estimators. 2024. arXiv:2303.10599

arXiv 2024
[27]

Explicitly antisymmetrized neural network layers for vari- ational Monte Carlo simulation

Jeffmin Lin, Gil Goldshlager, and Lin Lin. “Explicitly antisymmetrized neural network layers for vari- ational Monte Carlo simulation”. In:Journal of Computational Physics474 (2023), p. 111765

2023
[28]

The preparation theorem for differentiable functions

Bernard Malgrange. “The preparation theorem for differentiable functions”. In:Differential Analysis, Bombay Colloq. 1964, pp. 203–208

1964
[29]

Robust and Fast Training via Per-Sample Clipping

Davide Nobile and Philipp Grohs. “Robust and Fast Training via Per-Sample Clipping”. In:arXiv preprint arXiv:2605.02701(2026)

Pith/arXiv arXiv 2026
[30]

2022.doi:10.48550/arXiv.2205.13205

Tianyu Pang, Shuicheng Yan, and Min Lin.O(N 2)Universal Antisymmetry in Fermionic Neural Networks. 2022.doi:10.48550/arXiv.2205.13205. arXiv:2205.13205

work page doi:10.48550/arxiv.2205.13205 2022
[31]

Ab initio solution of the many-electron Schr¨ odinger equation with deep neural networks

David Pfau et al. “Ab initio solution of the many-electron Schr¨ odinger equation with deep neural networks”. In:Physical review research2.3 (2020), p. 033429

2020
[32]

Springer Science & Business Media, 2012

Allan Pinkus.N-widths in Approximation Theory. Springer Science & Business Media, 2012

2012
[33]

R Michael Range.Holomorphic functions and integral representations in several complex variables. Vol. 108. Springer Science & Business Media, 1998

1998
[34]

QMCTorch: Molecular Wave Function with Neural Components for Energy and Force Calculations

Nicolas Renaud. “QMCTorch: Molecular Wave Function with Neural Components for Energy and Force Calculations”. In:Methods4 (2025), p. 4

2025
[35]

Preprint title/version of the transferable molecular neural-wavefunction work

Michael Scherbela, Leon Gerard, and Philipp Grohs.Towards a Foundation Model for Neural Network Wavefunctions. Preprint title/version of the transferable molecular neural-wavefunction work. 2023. doi:10.48550/arXiv.2303.09949. arXiv:2303.09949 [physics.chem-ph]

work page doi:10.48550/arxiv.2303.09949 2023
[36]

Space and Space-Time Topologies in a Type-II Hyperbolic Lattice

Michael Scherbela, Leon Gerard, and Philipp Grohs. “Towards a transferable fermionic neural wave- function for molecules”. In:Nature Communications15 (2024), p. 120.doi:10.1038/s41467- 023- 44216-9

work page doi:10.1038/s41467- 2024
[37]

Variational Monte Carlo on a Budget: Fine- tuning pre-trained Neural Wavefunctions

Michael Scherbela, Leon Gerard, and Philipp Grohs. “Variational Monte Carlo on a Budget: Fine- tuning pre-trained Neural Wavefunctions”. In:Advances in Neural Information Processing Systems. Vol. 36. 2023.doi:10.48550/arXiv.2307.09337. arXiv:2307.09337

work page doi:10.48550/arxiv.2307.09337 2023
[38]

Michael Scherbela et al.Accurate Ab-initio Neural-network Solutions to Large-Scale Electronic Struc- ture Problems. 2025. arXiv:2504.06087 [physics.comp-ph]

arXiv 2025
[39]

Nature Computational Science2(5), 331–341 (2022) https://doi.org/10.1038/s43588-022-00228-x

Michael Scherbela et al. “Solving the electronic Schr¨ odinger equation for multiple nuclear geometries with weight-sharing deep neural networks”. In:Nature Computational Science2.5 (2022), pp. 331–341. doi:10.1038/s43588-022-00228-x. arXiv:2105.08351

work page doi:10.1038/s43588-022-00228-x 2022
[40]

Spencer et al.Better, Faster Fermionic Neural Networks

James S. Spencer et al.Better, Faster Fermionic Neural Networks. 2020.doi:10.48550/arXiv.2011. 07125. arXiv:2011.07125 [physics.chem-ph]

work page doi:10.48550/arxiv.2011 2020
[41]

Courier Corporation, 2012

Attila Szabo and Neil S Ostlund.Modern quantum chemistry: introduction to advanced electronic structure theory. Courier Corporation, 2012

2012
[42]

Gerald Teschl.Mathematical methods in quantum mechanics. Vol. 157. American Mathematical Soc., 2014. 31

2014
[43]

Introduction to the variational and diffusion Monte Carlo methods

Julien Toulouse, Roland Assaraf, and Cyrus J Umrigar. “Introduction to the variational and diffusion Monte Carlo methods”. In:Advances in quantum chemistry. Vol. 73. Elsevier, 2016, pp. 285–314

2016
[44]

Heavy-tailed random error in quantum Monte Carlo

JR Trail. “Heavy-tailed random error in quantum Monte Carlo”. In:Physical Review E—Statistical, Nonlinear, and Soft Matter Physics77.1 (2008), p. 016703

2008
[45]

Holger Wendland.Scattered data approximation. Vol. 17. Cambridge university press, 2004. A Additional Technical Results Lemma A.1(Vector Bernstein Inequality, see [24, Lemma 18]).LetX 1, . . . , Xn be independent vector-valued random variables with common dimensiond, satisfying E[Xi] = 0,|X i| ≤candE |Xi|2 ≤σ 2, for alli= 1, . . . , nand somec, σ >0. Then...

2004

[1] [1]

Convergence of variational Monte Carlo simulation and scale-invariant pre-training

Nilin Abrahamsen et al. “Convergence of variational Monte Carlo simulation and scale-invariant pre-training”. In:Journal of Computational Physics513 (2024), p. 113140.doi:10 . 1016 / j . jcp . 2024.113140

arXiv 2024

[2] [2]

Bounds on exponential decay of eigenfunctions of Schr¨ odinger operators

Shmuel Agmon. “Bounds on exponential decay of eigenfunctions of Schr¨ odinger operators”. In:Schr¨ odinger Operators: Lectures given at the 2nd 1984 Session of the Centro Internationale Matematico Estivo (CIME) held at Como, Italy, Aug. 26–Sept. 4, 1984. Springer, 2006, pp. 1–38

1984

[3] [3]

Bernstein operators for exponential polynomi- als

JM Aldaz, Ognyan Kounchev, and Hermann Render. “Bernstein operators for exponential polynomi- als”. In:Constructive Approximation29.3 (2009), pp. 345–367

2009

[4] [4]

Functional Neural Wavefunction Optimization

Victor Armegioiu et al. “Functional Neural Wavefunction Optimization”. In:arXiv preprint arXiv:2507.10835 (2025)

arXiv 2025

[5] [5]

Cam- bridge University Press, 2017

Federico Becca and Sandro Sorella.Quantum Monte Carlo approaches for correlated systems. Cam- bridge University Press, 2017

2017

[6] [6]

Bandits With Heavy Tail

Sebastien Bubeck, Nicolo Cesa-Bianchi, and Gabor Lugosi. “Bandits With Heavy Tail”. In:IEEE Trans. Inf. Theor.59.11 (2013), pp. 7711–7717

2013

[7] [7]

Ground-state correlation energies for atomic ions with 3 to 18 electrons

Subhas J. Chakravorty et al. “Ground-state correlation energies for atomic ions with 3 to 18 electrons”. In:Physical Review A47.5 (1993), pp. 3649–3670

1993

[8] [8]

Ab-Initio Potential Energy Surfaces by Pairing GNNs with Neural Wave Functions

Nicholas Gao and Stephan G¨ unnemann. “Ab-Initio Potential Energy Surfaces by Pairing GNNs with Neural Wave Functions”. In:International Conference on Learning Representations. 2022.doi:10. 48550/arXiv.2110.05064. arXiv:2110.05064

arXiv 2022

[9] [9]

Generalizing Neural Wave Functions

Nicholas Gao and Stephan G¨ unnemann. “Generalizing Neural Wave Functions”. In:Proceedings of the 40th International Conference on Machine Learning. Vol. 202. Proceedings of Machine Learning Research. PMLR, 2023, pp. 10708–10726.doi:10.48550/arXiv.2302.04168. arXiv:2302.04168

work page doi:10.48550/arxiv.2302.04168 2023

[10] [10]

Sampling-free Inference for Ab-Initio Potential Energy Sur- face Networks

Nicholas Gao and Stephan G¨ unnemann. “Sampling-free Inference for Ab-Initio Potential Energy Sur- face Networks”. In:The Eleventh International Conference on Learning Representations. 2023.doi: 10.48550/arXiv.2205.14962. arXiv:2205.14962

work page doi:10.48550/arxiv.2205.14962 2023

[11] [11]

Deep learning variational Monte Carlo for solving the electronic Schr¨ odinger equation

Leon Gerard, Philipp Grohs, and Michael Scherbela. “Deep learning variational Monte Carlo for solving the electronic Schr¨ odinger equation”. In:Numerical Analysis Meets Machine Learning. Vol. 25. Hand- book of Numerical Analysis. North-Holland, 2024, pp. 231–292.doi:10.1016/bs.hna.2024.05.010

work page doi:10.1016/bs.hna.2024.05.010 2024

[12] [12]

Gold-standard solutions to the Schr¨ odinger equation using deep learning: How much physics do we need?

Leon Gerard et al. “Gold-standard solutions to the Schr¨ odinger equation using deep learning: How much physics do we need?” In:Advances in Neural Information Processing Systems. Vol. 35. 2022, pp. 10282–10294.doi:10.48550/arXiv.2205.09438. arXiv:2205.09438

work page doi:10.48550/arxiv.2205.09438 2022

[13] [13]

Nature Computational Science5(12), 1147–1157 (2025) https://doi.org/10.1038/s43588-025-00872-z

Leon Gerard et al. “Transferable neural wavefunctions for solids”. In:Nature Computational Science 5 (2025), pp. 1147–1157.doi:10.1038/s43588-025-00872-z. arXiv:2405.07599

work page doi:10.1038/s43588-025-00872-z 2025

[14] [14]

A Self-Attention Ansatz for Ab-initio Quantum Chemistry

Ingrid von Glehn, James S. Spencer, and David Pfau. “A Self-Attention Ansatz for Ab-initio Quantum Chemistry”. In:The Eleventh International Conference on Learning Representations. 2023.doi:10. 48550/arXiv.2211.13672. arXiv:2211.13672

arXiv 2023

[15] [15]

GitHub repository

Google DeepMind.FermiNet. GitHub repository. 2020.url:https://github.com/google-deepmind/ ferminet

2020

[16] [16]

Stable Gabor phase retrieval for multivariate functions

Philipp Grohs and Martin Rathmair. “Stable Gabor phase retrieval for multivariate functions”. In: Journal of the European Mathematical Society24.5 (2021), pp. 1593–1615

2021

[17] [17]

Stephen J Gustafson et al.Mathematical concepts of quantum mechanics. Vol. 33. Springer, 2003

2003

[18] [18]

John Wiley & Sons, 2013

Trygve Helgaker, Poul Jorgensen, and Jeppe Olsen.Molecular electronic-structure theory. John Wiley & Sons, 2013

2013

[19] [19]

Deep-neural-network solution of the electronic Schr¨ odinger equation

Jan Hermann, Zeno Sch¨ atzle, and Frank No´ e. “Deep-neural-network solution of the electronic Schr¨ odinger equation”. In:Nature Chemistry12.10 (2020), pp. 891–897

2020

[20] [20]

Ab initio quantum chemistry with neural-network wavefunctions

Jan Hermann et al. “Ab initio quantum chemistry with neural-network wavefunctions”. In:Nature Reviews Chemistry7 (2023), pp. 692–709.doi:10.1038/s41570-023-00516-8. 30

work page doi:10.1038/s41570-023-00516-8 2023

[21] [21]

Springer Science & Business Media, 2012

Morris W Hirsch.Differential topology. Springer Science & Business Media, 2012

2012

[22] [22]

On the eigenfunctions of many-particle systems in quantum mechanics

Tosio Kato. “On the eigenfunctions of many-particle systems in quantum mechanics”. In:Communi- cations on Pure and Applied Mathematics10.2 (1957), pp. 151–177

1957

[23] [23]

Tosio Kato.Perturbation theory for linear operators. Second. Grundlehren der Mathematischen Wis- senschaften, Band 132. Springer-Verlag, Berlin-New York, 1976, pp. xxi+619

1976

[24] [24]

Sub-sampled cubic regularization for non-convex opti- mization

Jonas Moritz Kohler and Aurelien Lucchi. “Sub-sampled cubic regularization for non-convex opti- mization”. In:Proceedings of the 34th International Conference on Machine Learning - Volume 70. ICML’17. 2017, pp. 1895–1904

2017

[25] [25]

Korteweg-de Vries Institute for Math- ematics Amsterdam, 2017

Jacob Korevaar and Jan Wiegerinck.Several complex variables. Korteweg-de Vries Institute for Math- ematics Amsterdam, 2017

2017

[26] [26]

Tianyou Li et al.Convergence Analysis of Stochastic Gradient Descent with MCMC Estimators. 2024. arXiv:2303.10599

arXiv 2024

[27] [27]

Explicitly antisymmetrized neural network layers for vari- ational Monte Carlo simulation

Jeffmin Lin, Gil Goldshlager, and Lin Lin. “Explicitly antisymmetrized neural network layers for vari- ational Monte Carlo simulation”. In:Journal of Computational Physics474 (2023), p. 111765

2023

[28] [28]

The preparation theorem for differentiable functions

Bernard Malgrange. “The preparation theorem for differentiable functions”. In:Differential Analysis, Bombay Colloq. 1964, pp. 203–208

1964

[29] [29]

Robust and Fast Training via Per-Sample Clipping

Davide Nobile and Philipp Grohs. “Robust and Fast Training via Per-Sample Clipping”. In:arXiv preprint arXiv:2605.02701(2026)

Pith/arXiv arXiv 2026

[30] [30]

2022.doi:10.48550/arXiv.2205.13205

Tianyu Pang, Shuicheng Yan, and Min Lin.O(N 2)Universal Antisymmetry in Fermionic Neural Networks. 2022.doi:10.48550/arXiv.2205.13205. arXiv:2205.13205

work page doi:10.48550/arxiv.2205.13205 2022

[31] [31]

Ab initio solution of the many-electron Schr¨ odinger equation with deep neural networks

David Pfau et al. “Ab initio solution of the many-electron Schr¨ odinger equation with deep neural networks”. In:Physical review research2.3 (2020), p. 033429

2020

[32] [32]

Springer Science & Business Media, 2012

Allan Pinkus.N-widths in Approximation Theory. Springer Science & Business Media, 2012

2012

[33] [33]

R Michael Range.Holomorphic functions and integral representations in several complex variables. Vol. 108. Springer Science & Business Media, 1998

1998

[34] [34]

QMCTorch: Molecular Wave Function with Neural Components for Energy and Force Calculations

Nicolas Renaud. “QMCTorch: Molecular Wave Function with Neural Components for Energy and Force Calculations”. In:Methods4 (2025), p. 4

2025

[35] [35]

Preprint title/version of the transferable molecular neural-wavefunction work

Michael Scherbela, Leon Gerard, and Philipp Grohs.Towards a Foundation Model for Neural Network Wavefunctions. Preprint title/version of the transferable molecular neural-wavefunction work. 2023. doi:10.48550/arXiv.2303.09949. arXiv:2303.09949 [physics.chem-ph]

work page doi:10.48550/arxiv.2303.09949 2023

[36] [36]

Space and Space-Time Topologies in a Type-II Hyperbolic Lattice

Michael Scherbela, Leon Gerard, and Philipp Grohs. “Towards a transferable fermionic neural wave- function for molecules”. In:Nature Communications15 (2024), p. 120.doi:10.1038/s41467- 023- 44216-9

work page doi:10.1038/s41467- 2024

[37] [37]

Variational Monte Carlo on a Budget: Fine- tuning pre-trained Neural Wavefunctions

Michael Scherbela, Leon Gerard, and Philipp Grohs. “Variational Monte Carlo on a Budget: Fine- tuning pre-trained Neural Wavefunctions”. In:Advances in Neural Information Processing Systems. Vol. 36. 2023.doi:10.48550/arXiv.2307.09337. arXiv:2307.09337

work page doi:10.48550/arxiv.2307.09337 2023

[38] [38]

Michael Scherbela et al.Accurate Ab-initio Neural-network Solutions to Large-Scale Electronic Struc- ture Problems. 2025. arXiv:2504.06087 [physics.comp-ph]

arXiv 2025

[39] [39]

Nature Computational Science2(5), 331–341 (2022) https://doi.org/10.1038/s43588-022-00228-x

Michael Scherbela et al. “Solving the electronic Schr¨ odinger equation for multiple nuclear geometries with weight-sharing deep neural networks”. In:Nature Computational Science2.5 (2022), pp. 331–341. doi:10.1038/s43588-022-00228-x. arXiv:2105.08351

work page doi:10.1038/s43588-022-00228-x 2022

[40] [40]

Spencer et al.Better, Faster Fermionic Neural Networks

James S. Spencer et al.Better, Faster Fermionic Neural Networks. 2020.doi:10.48550/arXiv.2011. 07125. arXiv:2011.07125 [physics.chem-ph]

work page doi:10.48550/arxiv.2011 2020

[41] [41]

Courier Corporation, 2012

Attila Szabo and Neil S Ostlund.Modern quantum chemistry: introduction to advanced electronic structure theory. Courier Corporation, 2012

2012

[42] [42]

Gerald Teschl.Mathematical methods in quantum mechanics. Vol. 157. American Mathematical Soc., 2014. 31

2014

[43] [43]

Introduction to the variational and diffusion Monte Carlo methods

Julien Toulouse, Roland Assaraf, and Cyrus J Umrigar. “Introduction to the variational and diffusion Monte Carlo methods”. In:Advances in quantum chemistry. Vol. 73. Elsevier, 2016, pp. 285–314

2016

[44] [44]

Heavy-tailed random error in quantum Monte Carlo

JR Trail. “Heavy-tailed random error in quantum Monte Carlo”. In:Physical Review E—Statistical, Nonlinear, and Soft Matter Physics77.1 (2008), p. 016703

2008

[45] [45]

Holger Wendland.Scattered data approximation. Vol. 17. Cambridge university press, 2004. A Additional Technical Results Lemma A.1(Vector Bernstein Inequality, see [24, Lemma 18]).LetX 1, . . . , Xn be independent vector-valued random variables with common dimensiond, satisfying E[Xi] = 0,|X i| ≤candE |Xi|2 ≤σ 2, for alli= 1, . . . , nand somec, σ >0. Then...

2004