pith. sign in

arxiv: 2606.26009 · v1 · pith:EY7KNLTCnew · submitted 2026-06-24 · 💻 cs.LG

Is Variational Monte Carlo Robust? Sharp Moment Thresholds and Heavy-tailed Stochastic Optimization

Pith reviewed 2026-06-25 19:37 UTC · model grok-4.3

classification 💻 cs.LG
keywords variational monte carlonodal setsheavy-tailed estimatorsstochastic optimizationclipped VMCwave function ansatzeelectronic structure
0
0 comments X

The pith

Nodal geometry of trial wave functions determines integrability thresholds for the stochastic estimators that drive Variational Monte Carlo.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Variational Monte Carlo minimizes the Rayleigh quotient using stochastic estimates of local energy and gradients, yet the statistical behavior of these estimates is controlled by where the trial wave function vanishes. The paper proves that for Slater-Jastrow ansatze with variable-exponent Slater-type orbitals the estimators are generically heavy-tailed and lack moments of order greater than one. For general analytic ansatze the same analysis yields precise weak-moment regimes whose thresholds shift according to whether the nodal set is generic or degenerate. A clipped variant called PS-Clip-VMC is introduced and shown to converge both in expectation and with high probability inside the low-moment regime. This matters because modern neural-network ansatze are trained with exactly these estimators, and their tail behavior directly affects training stability.

Core claim

The paper claims that properties of the nodal set determine the integrability of the local energy and gradient estimators that drive VMC. For broad and practically relevant ansatz classes, including Slater-Jastrow wave functions with variable-exponent Slater-type orbitals, these estimators are generically heavy-tailed and fail to admit higher moments. At the same time, for general analytic ansatze, weak moment bounds are established and precise low-moment regimes are identified, showing how generic and degenerate nodal structures lead to different integrability thresholds. Building on this analysis, PS-Clip-VMC, which clips both the local energy and the gradient random variable, is proved to

What carries the argument

The nodal set of the wave function ansatz, whose local geometry controls the order of singularities in the local energy and thereby fixes the moment thresholds of the Monte Carlo estimators.

If this is right

  • Local energy and gradient estimators for Slater-Jastrow ansatze with variable-exponent STOs generically lack moments beyond order one.
  • PS-Clip-VMC converges both in expectation and with high probability whenever VMC operates in its weak-moment regime.
  • Generic versus degenerate nodal structures produce distinct integrability thresholds for analytic ansatze.
  • Preliminary numerical tests indicate that clipping improves robustness when training FermiNet on atoms with up to 18 electrons.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Neural-network ansatze may inherit similar heavy-tail behavior unless their learned nodal sets happen to be degenerate in the favorable sense.
  • Ansatz engineering that deliberately enforces more degenerate nodes could raise the moment thresholds without clipping.
  • The same nodal-analysis approach may apply to other local-energy-based quantum Monte Carlo methods that rely on the same estimators.

Load-bearing premise

The nodal structures of the specific ansatz classes examined are either generic or degenerate exactly as required to produce the stated integrability thresholds.

What would settle it

A direct Monte Carlo sampling experiment on a simple Slater-Jastrow wave function with variable-exponent STOs that measures whether the sample variance of the local energy remains finite or diverges as sample size grows.

Figures

Figures reproduced from arXiv: 2606.26009 by Davide Nobile, Philipp Grohs.

Figure 1
Figure 1. Figure 1: Training trajectories of Sulfur and Argon using a batchsize of 2048. The values are smoothed using [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
read the original abstract

Variational Monte Carlo (VMC) is a central algorithm in electronic structure theory and has gained renewed importance through modern neural-network ans\"atze such as FermiNet. At its core, VMC seeks ground states by minimizing the Rayleigh quotient by stochastic optimization. In this work, we show that the resulting stochastic optimization problem is intrinsically governed by the nodal geometry of the underlying wave function. More precisely, we establish that properties of the nodal set determine the integrability of the local energy and gradient estimators that drive VMC. For broad and practically relevant ansatz classes, including Slater-Jastrow wave functions with variable-exponent Slater-type orbitals, we prove that these estimators are generically heavy-tailed and fail to admit higher moments. At the same time, for general analytic ans\"atze, we prove weak moment bounds for the relevant estimators and identify precise low-moment regimes, showing how generic and degenerate nodal structures lead to different integrability thresholds. Building on this analysis, we introduce a new robust variant of VMC $\unicode{x2013}$ coined PS-Clip-VMC $\unicode{x2013}$ which is based on clipping both the local energy and the gradient random variable. We prove that PS-Clip-VMC converges both in expectation and with high probability in the weak moment regime of VMC. Preliminary experiments for training FermiNet on Atoms with up to 18 electrons suggest that PS-Clip-VMC is significantly more robust than standard methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper claims that Variational Monte Carlo (VMC) stochastic optimization is governed by the nodal geometry of the wave-function ansatz. Properties of the nodal set control the integrability of the local-energy and gradient estimators. For Slater-Jastrow ansatze with variable-exponent Slater-type orbitals the estimators are shown to be generically heavy-tailed and to lack higher moments; for general analytic ansatze weak-moment bounds are derived and precise low-moment regimes are identified according to whether nodal structures are generic or degenerate. A clipped variant (PS-Clip-VMC) is introduced and proved to converge in expectation and with high probability in the weak-moment regime. Preliminary experiments on FermiNet for atoms up to 18 electrons indicate improved robustness over standard VMC.

Significance. If the nodal-set analysis and convergence proofs hold, the work supplies a rigorous explanation for the origin of instability in VMC and supplies a provably convergent robust variant. The explicit moment-threshold results for practically relevant ansatz classes and the identification of the weak-moment regime constitute a substantive theoretical contribution to stochastic optimization in quantum Monte Carlo. The preliminary numerical evidence on neural-network ansatze is consistent with the theory but remains limited in scope.

minor comments (2)
  1. The abstract states that 'preliminary experiments' suggest improved robustness, yet no quantitative baselines, variance estimates, or statistical significance tests are referenced in the provided summary; adding these details would strengthen the empirical section.
  2. Notation for the clipping thresholds and the precise definition of the 'weak moment regime' should be introduced earlier and used consistently throughout the convergence statements.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, recognition of the theoretical contribution on nodal-set controlled moment thresholds, and the recommendation of minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper establishes its central claims via explicit mathematical proofs that nodal geometry controls integrability thresholds for local-energy and gradient estimators, with separate results for Slater-Jastrow variable-exponent STOs (heavy tails) and general analytic ansatze (weak-moment regimes), followed by a convergence argument for the clipped PS-Clip-VMC variant. No step reduces a derived quantity to a fitted parameter, self-definition, or load-bearing self-citation; the dependence on the declared ansatz classes is stated outright rather than smuggled in. The derivation is therefore self-contained against its own stated assumptions and external mathematical analysis of integrability.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the claims rest on unstated mathematical assumptions about wave-function classes and nodal sets that cannot be audited here.

pith-pipeline@v0.9.1-grok · 5792 in / 1139 out tokens · 18938 ms · 2026-06-25T19:37:55.534028+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 12 canonical work pages

  1. [1]

    Convergence of variational Monte Carlo simulation and scale-invariant pre-training

    Nilin Abrahamsen et al. “Convergence of variational Monte Carlo simulation and scale-invariant pre-training”. In:Journal of Computational Physics513 (2024), p. 113140.doi:10 . 1016 / j . jcp . 2024.113140

  2. [2]

    Bounds on exponential decay of eigenfunctions of Schr¨ odinger operators

    Shmuel Agmon. “Bounds on exponential decay of eigenfunctions of Schr¨ odinger operators”. In:Schr¨ odinger Operators: Lectures given at the 2nd 1984 Session of the Centro Internationale Matematico Estivo (CIME) held at Como, Italy, Aug. 26–Sept. 4, 1984. Springer, 2006, pp. 1–38

  3. [3]

    Bernstein operators for exponential polynomi- als

    JM Aldaz, Ognyan Kounchev, and Hermann Render. “Bernstein operators for exponential polynomi- als”. In:Constructive Approximation29.3 (2009), pp. 345–367

  4. [4]

    Functional Neural Wavefunction Optimization

    Victor Armegioiu et al. “Functional Neural Wavefunction Optimization”. In:arXiv preprint arXiv:2507.10835 (2025)

  5. [5]

    Cam- bridge University Press, 2017

    Federico Becca and Sandro Sorella.Quantum Monte Carlo approaches for correlated systems. Cam- bridge University Press, 2017

  6. [6]

    Bandits With Heavy Tail

    Sebastien Bubeck, Nicolo Cesa-Bianchi, and Gabor Lugosi. “Bandits With Heavy Tail”. In:IEEE Trans. Inf. Theor.59.11 (2013), pp. 7711–7717

  7. [7]

    Ground-state correlation energies for atomic ions with 3 to 18 electrons

    Subhas J. Chakravorty et al. “Ground-state correlation energies for atomic ions with 3 to 18 electrons”. In:Physical Review A47.5 (1993), pp. 3649–3670

  8. [8]

    Ab-Initio Potential Energy Surfaces by Pairing GNNs with Neural Wave Functions

    Nicholas Gao and Stephan G¨ unnemann. “Ab-Initio Potential Energy Surfaces by Pairing GNNs with Neural Wave Functions”. In:International Conference on Learning Representations. 2022.doi:10. 48550/arXiv.2110.05064. arXiv:2110.05064

  9. [9]

    Generalizing Neural Wave Functions

    Nicholas Gao and Stephan G¨ unnemann. “Generalizing Neural Wave Functions”. In:Proceedings of the 40th International Conference on Machine Learning. Vol. 202. Proceedings of Machine Learning Research. PMLR, 2023, pp. 10708–10726.doi:10.48550/arXiv.2302.04168. arXiv:2302.04168

  10. [10]

    Sampling-free Inference for Ab-Initio Potential Energy Sur- face Networks

    Nicholas Gao and Stephan G¨ unnemann. “Sampling-free Inference for Ab-Initio Potential Energy Sur- face Networks”. In:The Eleventh International Conference on Learning Representations. 2023.doi: 10.48550/arXiv.2205.14962. arXiv:2205.14962

  11. [11]

    Deep learning variational Monte Carlo for solving the electronic Schr¨ odinger equation

    Leon Gerard, Philipp Grohs, and Michael Scherbela. “Deep learning variational Monte Carlo for solving the electronic Schr¨ odinger equation”. In:Numerical Analysis Meets Machine Learning. Vol. 25. Hand- book of Numerical Analysis. North-Holland, 2024, pp. 231–292.doi:10.1016/bs.hna.2024.05.010

  12. [12]

    Gold-standard solutions to the Schr¨ odinger equation using deep learning: How much physics do we need?

    Leon Gerard et al. “Gold-standard solutions to the Schr¨ odinger equation using deep learning: How much physics do we need?” In:Advances in Neural Information Processing Systems. Vol. 35. 2022, pp. 10282–10294.doi:10.48550/arXiv.2205.09438. arXiv:2205.09438

  13. [13]

    Nature Computational Science5(12), 1147–1157 (2025) https://doi.org/10.1038/s43588-025-00872-z

    Leon Gerard et al. “Transferable neural wavefunctions for solids”. In:Nature Computational Science 5 (2025), pp. 1147–1157.doi:10.1038/s43588-025-00872-z. arXiv:2405.07599

  14. [14]

    A Self-Attention Ansatz for Ab-initio Quantum Chemistry

    Ingrid von Glehn, James S. Spencer, and David Pfau. “A Self-Attention Ansatz for Ab-initio Quantum Chemistry”. In:The Eleventh International Conference on Learning Representations. 2023.doi:10. 48550/arXiv.2211.13672. arXiv:2211.13672

  15. [15]

    GitHub repository

    Google DeepMind.FermiNet. GitHub repository. 2020.url:https://github.com/google-deepmind/ ferminet

  16. [16]

    Stable Gabor phase retrieval for multivariate functions

    Philipp Grohs and Martin Rathmair. “Stable Gabor phase retrieval for multivariate functions”. In: Journal of the European Mathematical Society24.5 (2021), pp. 1593–1615

  17. [17]

    Stephen J Gustafson et al.Mathematical concepts of quantum mechanics. Vol. 33. Springer, 2003

  18. [18]

    John Wiley & Sons, 2013

    Trygve Helgaker, Poul Jorgensen, and Jeppe Olsen.Molecular electronic-structure theory. John Wiley & Sons, 2013

  19. [19]

    Deep-neural-network solution of the electronic Schr¨ odinger equation

    Jan Hermann, Zeno Sch¨ atzle, and Frank No´ e. “Deep-neural-network solution of the electronic Schr¨ odinger equation”. In:Nature Chemistry12.10 (2020), pp. 891–897

  20. [20]

    Ab initio quantum chemistry with neural-network wavefunctions

    Jan Hermann et al. “Ab initio quantum chemistry with neural-network wavefunctions”. In:Nature Reviews Chemistry7 (2023), pp. 692–709.doi:10.1038/s41570-023-00516-8. 30

  21. [21]

    Springer Science & Business Media, 2012

    Morris W Hirsch.Differential topology. Springer Science & Business Media, 2012

  22. [22]

    On the eigenfunctions of many-particle systems in quantum mechanics

    Tosio Kato. “On the eigenfunctions of many-particle systems in quantum mechanics”. In:Communi- cations on Pure and Applied Mathematics10.2 (1957), pp. 151–177

  23. [23]

    Tosio Kato.Perturbation theory for linear operators. Second. Grundlehren der Mathematischen Wis- senschaften, Band 132. Springer-Verlag, Berlin-New York, 1976, pp. xxi+619

  24. [24]

    Sub-sampled cubic regularization for non-convex opti- mization

    Jonas Moritz Kohler and Aurelien Lucchi. “Sub-sampled cubic regularization for non-convex opti- mization”. In:Proceedings of the 34th International Conference on Machine Learning - Volume 70. ICML’17. 2017, pp. 1895–1904

  25. [25]

    Korteweg-de Vries Institute for Math- ematics Amsterdam, 2017

    Jacob Korevaar and Jan Wiegerinck.Several complex variables. Korteweg-de Vries Institute for Math- ematics Amsterdam, 2017

  26. [26]

    Tianyou Li et al.Convergence Analysis of Stochastic Gradient Descent with MCMC Estimators. 2024. arXiv:2303.10599

  27. [27]

    Explicitly antisymmetrized neural network layers for vari- ational Monte Carlo simulation

    Jeffmin Lin, Gil Goldshlager, and Lin Lin. “Explicitly antisymmetrized neural network layers for vari- ational Monte Carlo simulation”. In:Journal of Computational Physics474 (2023), p. 111765

  28. [28]

    The preparation theorem for differentiable functions

    Bernard Malgrange. “The preparation theorem for differentiable functions”. In:Differential Analysis, Bombay Colloq. 1964, pp. 203–208

  29. [29]

    Robust and Fast Training via Per-Sample Clipping

    Davide Nobile and Philipp Grohs. “Robust and Fast Training via Per-Sample Clipping”. In:arXiv preprint arXiv:2605.02701(2026)

  30. [30]

    2022.doi:10.48550/arXiv.2205.13205

    Tianyu Pang, Shuicheng Yan, and Min Lin.O(N 2)Universal Antisymmetry in Fermionic Neural Networks. 2022.doi:10.48550/arXiv.2205.13205. arXiv:2205.13205

  31. [31]

    Ab initio solution of the many-electron Schr¨ odinger equation with deep neural networks

    David Pfau et al. “Ab initio solution of the many-electron Schr¨ odinger equation with deep neural networks”. In:Physical review research2.3 (2020), p. 033429

  32. [32]

    Springer Science & Business Media, 2012

    Allan Pinkus.N-widths in Approximation Theory. Springer Science & Business Media, 2012

  33. [33]

    R Michael Range.Holomorphic functions and integral representations in several complex variables. Vol. 108. Springer Science & Business Media, 1998

  34. [34]

    QMCTorch: Molecular Wave Function with Neural Components for Energy and Force Calculations

    Nicolas Renaud. “QMCTorch: Molecular Wave Function with Neural Components for Energy and Force Calculations”. In:Methods4 (2025), p. 4

  35. [35]

    Preprint title/version of the transferable molecular neural-wavefunction work

    Michael Scherbela, Leon Gerard, and Philipp Grohs.Towards a Foundation Model for Neural Network Wavefunctions. Preprint title/version of the transferable molecular neural-wavefunction work. 2023. doi:10.48550/arXiv.2303.09949. arXiv:2303.09949 [physics.chem-ph]

  36. [36]

    Space and Space-Time Topologies in a Type-II Hyperbolic Lattice

    Michael Scherbela, Leon Gerard, and Philipp Grohs. “Towards a transferable fermionic neural wave- function for molecules”. In:Nature Communications15 (2024), p. 120.doi:10.1038/s41467- 023- 44216-9

  37. [37]

    Variational Monte Carlo on a Budget: Fine- tuning pre-trained Neural Wavefunctions

    Michael Scherbela, Leon Gerard, and Philipp Grohs. “Variational Monte Carlo on a Budget: Fine- tuning pre-trained Neural Wavefunctions”. In:Advances in Neural Information Processing Systems. Vol. 36. 2023.doi:10.48550/arXiv.2307.09337. arXiv:2307.09337

  38. [38]

    Michael Scherbela et al.Accurate Ab-initio Neural-network Solutions to Large-Scale Electronic Struc- ture Problems. 2025. arXiv:2504.06087 [physics.comp-ph]

  39. [39]

    Nature Computational Science2(5), 331–341 (2022) https://doi.org/10.1038/s43588-022-00228-x

    Michael Scherbela et al. “Solving the electronic Schr¨ odinger equation for multiple nuclear geometries with weight-sharing deep neural networks”. In:Nature Computational Science2.5 (2022), pp. 331–341. doi:10.1038/s43588-022-00228-x. arXiv:2105.08351

  40. [40]

    Spencer et al.Better, Faster Fermionic Neural Networks

    James S. Spencer et al.Better, Faster Fermionic Neural Networks. 2020.doi:10.48550/arXiv.2011. 07125. arXiv:2011.07125 [physics.chem-ph]

  41. [41]

    Courier Corporation, 2012

    Attila Szabo and Neil S Ostlund.Modern quantum chemistry: introduction to advanced electronic structure theory. Courier Corporation, 2012

  42. [42]

    Gerald Teschl.Mathematical methods in quantum mechanics. Vol. 157. American Mathematical Soc., 2014. 31

  43. [43]

    Introduction to the variational and diffusion Monte Carlo methods

    Julien Toulouse, Roland Assaraf, and Cyrus J Umrigar. “Introduction to the variational and diffusion Monte Carlo methods”. In:Advances in quantum chemistry. Vol. 73. Elsevier, 2016, pp. 285–314

  44. [44]

    Heavy-tailed random error in quantum Monte Carlo

    JR Trail. “Heavy-tailed random error in quantum Monte Carlo”. In:Physical Review E—Statistical, Nonlinear, and Soft Matter Physics77.1 (2008), p. 016703

  45. [45]

    Holger Wendland.Scattered data approximation. Vol. 17. Cambridge university press, 2004. A Additional Technical Results Lemma A.1(Vector Bernstein Inequality, see [24, Lemma 18]).LetX 1, . . . , Xn be independent vector-valued random variables with common dimensiond, satisfying E[Xi] = 0,|X i| ≤candE |Xi|2 ≤σ 2, for alli= 1, . . . , nand somec, σ >0. Then...