Is Variational Monte Carlo Robust? Sharp Moment Thresholds and Heavy-tailed Stochastic Optimization
Pith reviewed 2026-06-25 19:37 UTC · model grok-4.3
The pith
Nodal geometry of trial wave functions determines integrability thresholds for the stochastic estimators that drive Variational Monte Carlo.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that properties of the nodal set determine the integrability of the local energy and gradient estimators that drive VMC. For broad and practically relevant ansatz classes, including Slater-Jastrow wave functions with variable-exponent Slater-type orbitals, these estimators are generically heavy-tailed and fail to admit higher moments. At the same time, for general analytic ansatze, weak moment bounds are established and precise low-moment regimes are identified, showing how generic and degenerate nodal structures lead to different integrability thresholds. Building on this analysis, PS-Clip-VMC, which clips both the local energy and the gradient random variable, is proved to
What carries the argument
The nodal set of the wave function ansatz, whose local geometry controls the order of singularities in the local energy and thereby fixes the moment thresholds of the Monte Carlo estimators.
If this is right
- Local energy and gradient estimators for Slater-Jastrow ansatze with variable-exponent STOs generically lack moments beyond order one.
- PS-Clip-VMC converges both in expectation and with high probability whenever VMC operates in its weak-moment regime.
- Generic versus degenerate nodal structures produce distinct integrability thresholds for analytic ansatze.
- Preliminary numerical tests indicate that clipping improves robustness when training FermiNet on atoms with up to 18 electrons.
Where Pith is reading between the lines
- Neural-network ansatze may inherit similar heavy-tail behavior unless their learned nodal sets happen to be degenerate in the favorable sense.
- Ansatz engineering that deliberately enforces more degenerate nodes could raise the moment thresholds without clipping.
- The same nodal-analysis approach may apply to other local-energy-based quantum Monte Carlo methods that rely on the same estimators.
Load-bearing premise
The nodal structures of the specific ansatz classes examined are either generic or degenerate exactly as required to produce the stated integrability thresholds.
What would settle it
A direct Monte Carlo sampling experiment on a simple Slater-Jastrow wave function with variable-exponent STOs that measures whether the sample variance of the local energy remains finite or diverges as sample size grows.
Figures
read the original abstract
Variational Monte Carlo (VMC) is a central algorithm in electronic structure theory and has gained renewed importance through modern neural-network ans\"atze such as FermiNet. At its core, VMC seeks ground states by minimizing the Rayleigh quotient by stochastic optimization. In this work, we show that the resulting stochastic optimization problem is intrinsically governed by the nodal geometry of the underlying wave function. More precisely, we establish that properties of the nodal set determine the integrability of the local energy and gradient estimators that drive VMC. For broad and practically relevant ansatz classes, including Slater-Jastrow wave functions with variable-exponent Slater-type orbitals, we prove that these estimators are generically heavy-tailed and fail to admit higher moments. At the same time, for general analytic ans\"atze, we prove weak moment bounds for the relevant estimators and identify precise low-moment regimes, showing how generic and degenerate nodal structures lead to different integrability thresholds. Building on this analysis, we introduce a new robust variant of VMC $\unicode{x2013}$ coined PS-Clip-VMC $\unicode{x2013}$ which is based on clipping both the local energy and the gradient random variable. We prove that PS-Clip-VMC converges both in expectation and with high probability in the weak moment regime of VMC. Preliminary experiments for training FermiNet on Atoms with up to 18 electrons suggest that PS-Clip-VMC is significantly more robust than standard methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that Variational Monte Carlo (VMC) stochastic optimization is governed by the nodal geometry of the wave-function ansatz. Properties of the nodal set control the integrability of the local-energy and gradient estimators. For Slater-Jastrow ansatze with variable-exponent Slater-type orbitals the estimators are shown to be generically heavy-tailed and to lack higher moments; for general analytic ansatze weak-moment bounds are derived and precise low-moment regimes are identified according to whether nodal structures are generic or degenerate. A clipped variant (PS-Clip-VMC) is introduced and proved to converge in expectation and with high probability in the weak-moment regime. Preliminary experiments on FermiNet for atoms up to 18 electrons indicate improved robustness over standard VMC.
Significance. If the nodal-set analysis and convergence proofs hold, the work supplies a rigorous explanation for the origin of instability in VMC and supplies a provably convergent robust variant. The explicit moment-threshold results for practically relevant ansatz classes and the identification of the weak-moment regime constitute a substantive theoretical contribution to stochastic optimization in quantum Monte Carlo. The preliminary numerical evidence on neural-network ansatze is consistent with the theory but remains limited in scope.
minor comments (2)
- The abstract states that 'preliminary experiments' suggest improved robustness, yet no quantitative baselines, variance estimates, or statistical significance tests are referenced in the provided summary; adding these details would strengthen the empirical section.
- Notation for the clipping thresholds and the precise definition of the 'weak moment regime' should be introduced earlier and used consistently throughout the convergence statements.
Simulated Author's Rebuttal
We thank the referee for the positive summary, recognition of the theoretical contribution on nodal-set controlled moment thresholds, and the recommendation of minor revision. No specific major comments were provided in the report.
Circularity Check
No significant circularity identified
full rationale
The paper establishes its central claims via explicit mathematical proofs that nodal geometry controls integrability thresholds for local-energy and gradient estimators, with separate results for Slater-Jastrow variable-exponent STOs (heavy tails) and general analytic ansatze (weak-moment regimes), followed by a convergence argument for the clipped PS-Clip-VMC variant. No step reduces a derived quantity to a fitted parameter, self-definition, or load-bearing self-citation; the dependence on the declared ansatz classes is stated outright rather than smuggled in. The derivation is therefore self-contained against its own stated assumptions and external mathematical analysis of integrability.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Convergence of variational Monte Carlo simulation and scale-invariant pre-training
Nilin Abrahamsen et al. “Convergence of variational Monte Carlo simulation and scale-invariant pre-training”. In:Journal of Computational Physics513 (2024), p. 113140.doi:10 . 1016 / j . jcp . 2024.113140
arXiv 2024
-
[2]
Bounds on exponential decay of eigenfunctions of Schr¨ odinger operators
Shmuel Agmon. “Bounds on exponential decay of eigenfunctions of Schr¨ odinger operators”. In:Schr¨ odinger Operators: Lectures given at the 2nd 1984 Session of the Centro Internationale Matematico Estivo (CIME) held at Como, Italy, Aug. 26–Sept. 4, 1984. Springer, 2006, pp. 1–38
1984
-
[3]
Bernstein operators for exponential polynomi- als
JM Aldaz, Ognyan Kounchev, and Hermann Render. “Bernstein operators for exponential polynomi- als”. In:Constructive Approximation29.3 (2009), pp. 345–367
2009
-
[4]
Functional Neural Wavefunction Optimization
Victor Armegioiu et al. “Functional Neural Wavefunction Optimization”. In:arXiv preprint arXiv:2507.10835 (2025)
arXiv 2025
-
[5]
Cam- bridge University Press, 2017
Federico Becca and Sandro Sorella.Quantum Monte Carlo approaches for correlated systems. Cam- bridge University Press, 2017
2017
-
[6]
Bandits With Heavy Tail
Sebastien Bubeck, Nicolo Cesa-Bianchi, and Gabor Lugosi. “Bandits With Heavy Tail”. In:IEEE Trans. Inf. Theor.59.11 (2013), pp. 7711–7717
2013
-
[7]
Ground-state correlation energies for atomic ions with 3 to 18 electrons
Subhas J. Chakravorty et al. “Ground-state correlation energies for atomic ions with 3 to 18 electrons”. In:Physical Review A47.5 (1993), pp. 3649–3670
1993
-
[8]
Ab-Initio Potential Energy Surfaces by Pairing GNNs with Neural Wave Functions
Nicholas Gao and Stephan G¨ unnemann. “Ab-Initio Potential Energy Surfaces by Pairing GNNs with Neural Wave Functions”. In:International Conference on Learning Representations. 2022.doi:10. 48550/arXiv.2110.05064. arXiv:2110.05064
arXiv 2022
-
[9]
Generalizing Neural Wave Functions
Nicholas Gao and Stephan G¨ unnemann. “Generalizing Neural Wave Functions”. In:Proceedings of the 40th International Conference on Machine Learning. Vol. 202. Proceedings of Machine Learning Research. PMLR, 2023, pp. 10708–10726.doi:10.48550/arXiv.2302.04168. arXiv:2302.04168
-
[10]
Sampling-free Inference for Ab-Initio Potential Energy Sur- face Networks
Nicholas Gao and Stephan G¨ unnemann. “Sampling-free Inference for Ab-Initio Potential Energy Sur- face Networks”. In:The Eleventh International Conference on Learning Representations. 2023.doi: 10.48550/arXiv.2205.14962. arXiv:2205.14962
-
[11]
Deep learning variational Monte Carlo for solving the electronic Schr¨ odinger equation
Leon Gerard, Philipp Grohs, and Michael Scherbela. “Deep learning variational Monte Carlo for solving the electronic Schr¨ odinger equation”. In:Numerical Analysis Meets Machine Learning. Vol. 25. Hand- book of Numerical Analysis. North-Holland, 2024, pp. 231–292.doi:10.1016/bs.hna.2024.05.010
-
[12]
Leon Gerard et al. “Gold-standard solutions to the Schr¨ odinger equation using deep learning: How much physics do we need?” In:Advances in Neural Information Processing Systems. Vol. 35. 2022, pp. 10282–10294.doi:10.48550/arXiv.2205.09438. arXiv:2205.09438
-
[13]
Nature Computational Science5(12), 1147–1157 (2025) https://doi.org/10.1038/s43588-025-00872-z
Leon Gerard et al. “Transferable neural wavefunctions for solids”. In:Nature Computational Science 5 (2025), pp. 1147–1157.doi:10.1038/s43588-025-00872-z. arXiv:2405.07599
-
[14]
A Self-Attention Ansatz for Ab-initio Quantum Chemistry
Ingrid von Glehn, James S. Spencer, and David Pfau. “A Self-Attention Ansatz for Ab-initio Quantum Chemistry”. In:The Eleventh International Conference on Learning Representations. 2023.doi:10. 48550/arXiv.2211.13672. arXiv:2211.13672
arXiv 2023
-
[15]
GitHub repository
Google DeepMind.FermiNet. GitHub repository. 2020.url:https://github.com/google-deepmind/ ferminet
2020
-
[16]
Stable Gabor phase retrieval for multivariate functions
Philipp Grohs and Martin Rathmair. “Stable Gabor phase retrieval for multivariate functions”. In: Journal of the European Mathematical Society24.5 (2021), pp. 1593–1615
2021
-
[17]
Stephen J Gustafson et al.Mathematical concepts of quantum mechanics. Vol. 33. Springer, 2003
2003
-
[18]
John Wiley & Sons, 2013
Trygve Helgaker, Poul Jorgensen, and Jeppe Olsen.Molecular electronic-structure theory. John Wiley & Sons, 2013
2013
-
[19]
Deep-neural-network solution of the electronic Schr¨ odinger equation
Jan Hermann, Zeno Sch¨ atzle, and Frank No´ e. “Deep-neural-network solution of the electronic Schr¨ odinger equation”. In:Nature Chemistry12.10 (2020), pp. 891–897
2020
-
[20]
Ab initio quantum chemistry with neural-network wavefunctions
Jan Hermann et al. “Ab initio quantum chemistry with neural-network wavefunctions”. In:Nature Reviews Chemistry7 (2023), pp. 692–709.doi:10.1038/s41570-023-00516-8. 30
-
[21]
Springer Science & Business Media, 2012
Morris W Hirsch.Differential topology. Springer Science & Business Media, 2012
2012
-
[22]
On the eigenfunctions of many-particle systems in quantum mechanics
Tosio Kato. “On the eigenfunctions of many-particle systems in quantum mechanics”. In:Communi- cations on Pure and Applied Mathematics10.2 (1957), pp. 151–177
1957
-
[23]
Tosio Kato.Perturbation theory for linear operators. Second. Grundlehren der Mathematischen Wis- senschaften, Band 132. Springer-Verlag, Berlin-New York, 1976, pp. xxi+619
1976
-
[24]
Sub-sampled cubic regularization for non-convex opti- mization
Jonas Moritz Kohler and Aurelien Lucchi. “Sub-sampled cubic regularization for non-convex opti- mization”. In:Proceedings of the 34th International Conference on Machine Learning - Volume 70. ICML’17. 2017, pp. 1895–1904
2017
-
[25]
Korteweg-de Vries Institute for Math- ematics Amsterdam, 2017
Jacob Korevaar and Jan Wiegerinck.Several complex variables. Korteweg-de Vries Institute for Math- ematics Amsterdam, 2017
2017
-
[26]
Tianyou Li et al.Convergence Analysis of Stochastic Gradient Descent with MCMC Estimators. 2024. arXiv:2303.10599
arXiv 2024
-
[27]
Explicitly antisymmetrized neural network layers for vari- ational Monte Carlo simulation
Jeffmin Lin, Gil Goldshlager, and Lin Lin. “Explicitly antisymmetrized neural network layers for vari- ational Monte Carlo simulation”. In:Journal of Computational Physics474 (2023), p. 111765
2023
-
[28]
The preparation theorem for differentiable functions
Bernard Malgrange. “The preparation theorem for differentiable functions”. In:Differential Analysis, Bombay Colloq. 1964, pp. 203–208
1964
-
[29]
Robust and Fast Training via Per-Sample Clipping
Davide Nobile and Philipp Grohs. “Robust and Fast Training via Per-Sample Clipping”. In:arXiv preprint arXiv:2605.02701(2026)
Pith/arXiv arXiv 2026
-
[30]
2022.doi:10.48550/arXiv.2205.13205
Tianyu Pang, Shuicheng Yan, and Min Lin.O(N 2)Universal Antisymmetry in Fermionic Neural Networks. 2022.doi:10.48550/arXiv.2205.13205. arXiv:2205.13205
-
[31]
Ab initio solution of the many-electron Schr¨ odinger equation with deep neural networks
David Pfau et al. “Ab initio solution of the many-electron Schr¨ odinger equation with deep neural networks”. In:Physical review research2.3 (2020), p. 033429
2020
-
[32]
Springer Science & Business Media, 2012
Allan Pinkus.N-widths in Approximation Theory. Springer Science & Business Media, 2012
2012
-
[33]
R Michael Range.Holomorphic functions and integral representations in several complex variables. Vol. 108. Springer Science & Business Media, 1998
1998
-
[34]
QMCTorch: Molecular Wave Function with Neural Components for Energy and Force Calculations
Nicolas Renaud. “QMCTorch: Molecular Wave Function with Neural Components for Energy and Force Calculations”. In:Methods4 (2025), p. 4
2025
-
[35]
Preprint title/version of the transferable molecular neural-wavefunction work
Michael Scherbela, Leon Gerard, and Philipp Grohs.Towards a Foundation Model for Neural Network Wavefunctions. Preprint title/version of the transferable molecular neural-wavefunction work. 2023. doi:10.48550/arXiv.2303.09949. arXiv:2303.09949 [physics.chem-ph]
-
[36]
Space and Space-Time Topologies in a Type-II Hyperbolic Lattice
Michael Scherbela, Leon Gerard, and Philipp Grohs. “Towards a transferable fermionic neural wave- function for molecules”. In:Nature Communications15 (2024), p. 120.doi:10.1038/s41467- 023- 44216-9
-
[37]
Variational Monte Carlo on a Budget: Fine- tuning pre-trained Neural Wavefunctions
Michael Scherbela, Leon Gerard, and Philipp Grohs. “Variational Monte Carlo on a Budget: Fine- tuning pre-trained Neural Wavefunctions”. In:Advances in Neural Information Processing Systems. Vol. 36. 2023.doi:10.48550/arXiv.2307.09337. arXiv:2307.09337
-
[38]
Michael Scherbela et al.Accurate Ab-initio Neural-network Solutions to Large-Scale Electronic Struc- ture Problems. 2025. arXiv:2504.06087 [physics.comp-ph]
arXiv 2025
-
[39]
Nature Computational Science2(5), 331–341 (2022) https://doi.org/10.1038/s43588-022-00228-x
Michael Scherbela et al. “Solving the electronic Schr¨ odinger equation for multiple nuclear geometries with weight-sharing deep neural networks”. In:Nature Computational Science2.5 (2022), pp. 331–341. doi:10.1038/s43588-022-00228-x. arXiv:2105.08351
-
[40]
Spencer et al.Better, Faster Fermionic Neural Networks
James S. Spencer et al.Better, Faster Fermionic Neural Networks. 2020.doi:10.48550/arXiv.2011. 07125. arXiv:2011.07125 [physics.chem-ph]
-
[41]
Courier Corporation, 2012
Attila Szabo and Neil S Ostlund.Modern quantum chemistry: introduction to advanced electronic structure theory. Courier Corporation, 2012
2012
-
[42]
Gerald Teschl.Mathematical methods in quantum mechanics. Vol. 157. American Mathematical Soc., 2014. 31
2014
-
[43]
Introduction to the variational and diffusion Monte Carlo methods
Julien Toulouse, Roland Assaraf, and Cyrus J Umrigar. “Introduction to the variational and diffusion Monte Carlo methods”. In:Advances in quantum chemistry. Vol. 73. Elsevier, 2016, pp. 285–314
2016
-
[44]
Heavy-tailed random error in quantum Monte Carlo
JR Trail. “Heavy-tailed random error in quantum Monte Carlo”. In:Physical Review E—Statistical, Nonlinear, and Soft Matter Physics77.1 (2008), p. 016703
2008
-
[45]
Holger Wendland.Scattered data approximation. Vol. 17. Cambridge university press, 2004. A Additional Technical Results Lemma A.1(Vector Bernstein Inequality, see [24, Lemma 18]).LetX 1, . . . , Xn be independent vector-valued random variables with common dimensiond, satisfying E[Xi] = 0,|X i| ≤candE |Xi|2 ≤σ 2, for alli= 1, . . . , nand somec, σ >0. Then...
2004
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.