pith. sign in

arxiv: 2603.15460 · v2 · pith:RS27PI6Qnew · submitted 2026-03-16 · ⚛️ physics.comp-ph

Introduction to the artificial neural network-based variational Monte Carlo method

Pith reviewed 2026-05-21 11:07 UTC · model grok-4.3

classification ⚛️ physics.comp-ph
keywords variational Monte Carloneural networkstrial wave functionsunsupervised learningquantum ground statesYukawa potentialhydrogen moleculevariational optimization
0
0 comments X

The pith

Neural networks can represent quantum wave functions so that variational Monte Carlo optimization behaves as a stable unsupervised learning process.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to construct trial wave functions for quantum systems by using artificial neural networks inside the variational Monte Carlo framework. It lays out the mathematical mapping from network parameters to quantum states and explains why this representation brings advantages for optimization and insight. A central point is that the variational energy minimization can be understood as unsupervised learning, where the existence of many local minima helps rather than hinders the search for good approximations. Feature extraction from the trained network then supplies physical understanding of the system. The ideas are demonstrated on the Yukawa potential and the hydrogen molecule.

Core claim

The variational method with neural-network trial states functions as an unsupervised learning algorithm. The landscape of multiple minima in the variational energy is treated as an asset that produces stable optimization rather than trapping the procedure. The network parameters encode the quantum state, and the learned features inside the network allow extraction of physical information from the optimized trial function.

What carries the argument

Representation of a quantum many-body wave function as the output of an artificial neural network whose parameters are variationally optimized to minimize the energy expectation value.

If this is right

  • Optimization remains stable even when the energy surface contains many local minima.
  • Internal network features can be examined to extract physical properties of the modeled system.
  • The same construction applies to model problems such as the Yukawa potential and to simple molecules such as H2.
  • Standard machine-learning training procedures integrate directly with variational quantum calculations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method may extend to larger systems where conventional trial functions become impractical.
  • Similar network representations could be tested on time-dependent or excited-state problems.
  • Connections to other neural ansatze in quantum simulation might allow hybrid constructions.

Load-bearing premise

A neural network can capture the essential features of the true ground-state wave function without introducing uncontrolled bias into the computed variational energy.

What would settle it

For the hydrogen molecule, compare the neural-network variational energy against the known exact ground-state energy; a statistically significant deviation beyond Monte Carlo sampling error would falsify faithful representation.

Figures

Figures reproduced from arXiv: 2603.15460 by William Freitas.

Figure 1
Figure 1. Figure 1: FIG. 1. Schematic representations of a artificial neuron (left) [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. Analogies between the variational method and ma [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. Minimization process for the Morse oscillator. The [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6. ANN-based trial state description of the Yukawa po [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: FIG. 8. The minimization process for the H [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
read the original abstract

The construction of trial wave functions based on neural networks combined with the variational Monte Carlo method is discussed. The mathematical formulation for representing quantum states as artificial neural networks is introduced. The advantages of employing such trial states and how machine learning works are discussed. It is shown that the variational method is a kind of unsupervised learning algorithm, where the multiple minima landscape is used as an asset that leads to a stable optimization procedure. The feature representation plays an important role on interpretability and on extracting physical insights from nontrivial trial wave functions. The algorithm is illustrated for the Yukawa potential and the hydrogen molecule.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces the use of artificial neural networks to represent trial wave functions in the variational Monte Carlo (VMC) method for quantum systems. It presents the mathematical formulation for ANN-based quantum states, frames the variational optimization as an unsupervised learning procedure that treats the multiple-minima energy landscape as an asset for stable convergence, discusses advantages and the role of feature representation for interpretability and physical insight, and illustrates the approach on the Yukawa potential and the hydrogen molecule.

Significance. If the central claims are substantiated, the manuscript provides a pedagogical entry point to neural-network VMC that could help researchers in computational physics integrate machine-learning representations into variational calculations. The emphasis on unsupervised learning and interpretability through features offers a distinct perspective that may facilitate extraction of physical insights from nontrivial trial states, though the overall impact depends on whether the stability argument is demonstrated beyond the standard VMC setup.

major comments (2)
  1. [illustrations for the Yukawa potential and hydrogen molecule] The central claim that the variational method functions as an unsupervised learning algorithm where the multiple-minima landscape serves as an asset for stable optimization is not supported by the illustrations. The examples for the Yukawa potential and hydrogen molecule supply no energy-landscape visualizations, no ensemble statistics from varied initial conditions, and no comparisons to single-minimum or regularized baselines that would show multiplicity conferring stability rather than variability or trapping.
  2. [mathematical formulation and advantages of ANN trial states] The assumption that the chosen neural-network architectures faithfully capture essential features of the ground-state wave function without introducing uncontrolled variational bias is stated but not quantified. No systematic checks (e.g., comparison of variational energies against known exact or high-accuracy benchmarks, or variation of network depth/width) are reported to bound the representation error for the two example systems.
minor comments (2)
  1. [mathematical formulation] Notation for the neural-network parameters and the Monte Carlo sampling procedure should be introduced with explicit definitions and consistent symbols across the text and any equations.
  2. [illustrations] The manuscript would benefit from a short table summarizing the network architectures, hyperparameters, and obtained variational energies for the two illustrated systems to improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We value the assessment that the manuscript offers a pedagogical entry point to neural-network VMC and the emphasis on unsupervised learning and interpretability. We address the two major comments below, indicating the revisions we will make.

read point-by-point responses
  1. Referee: [illustrations for the Yukawa potential and hydrogen molecule] The central claim that the variational method functions as an unsupervised learning algorithm where the multiple-minima landscape serves as an asset for stable optimization is not supported by the illustrations. The examples for the Yukawa potential and hydrogen molecule supply no energy-landscape visualizations, no ensemble statistics from varied initial conditions, and no comparisons to single-minimum or regularized baselines that would show multiplicity conferring stability rather than variability or trapping.

    Authors: We agree that the current illustrations do not contain energy-landscape visualizations, ensemble statistics over varied initial conditions, or explicit comparisons against single-minimum or regularized baselines. The manuscript presents the multiple-minima landscape as an asset for stable convergence as a conceptual framing of the variational principle, drawing on the general structure of the energy functional rather than on new empirical demonstrations within the two examples. The Yukawa and hydrogen-molecule calculations are intended to illustrate the implementation and basic convergence behavior. We will revise the relevant sections to make this distinction explicit, add a concise discussion of why multiple random starts are commonly employed in VMC to reduce the risk of local-minimum trapping, and, if space allows, include a brief qualitative remark on the optimization trajectories observed in the reported runs. revision: partial

  2. Referee: [mathematical formulation and advantages of ANN trial states] The assumption that the chosen neural-network architectures faithfully capture essential features of the ground-state wave function without introducing uncontrolled variational bias is stated but not quantified. No systematic checks (e.g., comparison of variational energies against known exact or high-accuracy benchmarks, or variation of network depth/width) are reported to bound the representation error for the two example systems.

    Authors: We acknowledge that the manuscript does not provide systematic comparisons of the obtained variational energies to exact or high-accuracy reference values, nor does it explore variations in network depth or width to quantify representation error. Because the work is framed as an introduction, the emphasis lies on the mathematical formulation and the conceptual advantages of ANN trial states rather than on exhaustive benchmarking. For the hydrogen molecule the exact ground-state energy is known, and for the Yukawa potential accurate numerical references exist. We will revise the manuscript to report the variational energies obtained for both systems alongside the corresponding literature or exact values, and we will add a short paragraph discussing the architectural choices and the expected magnitude of the representation bias for these simple cases. revision: yes

Circularity Check

0 steps flagged

No circularity: variational principle invoked externally; reframing is interpretive

full rationale

The paper introduces ANN representations for trial wave functions in VMC and illustrates them on the Yukawa potential and H2. The central statement that the variational method acts as unsupervised learning with multiple minima as an asset is presented as a perspective on the standard variational principle from quantum mechanics, which is cited as an independent external benchmark rather than derived from the paper's own fits or definitions. No equations reduce to self-definition, no fitted parameters are relabeled as predictions, and no load-bearing uniqueness theorem or ansatz is imported via self-citation. The examples function as demonstrations without claiming to derive the optimization stability from the multiplicity by construction. The derivation chain therefore remains self-contained against external quantum-mechanical foundations.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper relies on standard quantum mechanics and machine-learning concepts without introducing new fitted parameters, axioms beyond established variational principles, or invented entities.

axioms (2)
  • standard math The variational principle provides an upper bound to the ground-state energy
    Invoked as the foundation for optimizing trial wave functions
  • domain assumption Neural networks are universal function approximators capable of representing quantum states
    Central premise for using ANNs as trial wave functions

pith-pipeline@v0.9.0 · 5610 in / 1286 out tokens · 32985 ms · 2026-05-21T11:07:38.190683+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 3 internal anchors

  1. [1]

    Introduction to the artificial neural network-based variational Monte Carlo method

    and particle physics [2] up to many-body quantum sys- tems [3], quantum chemistry [4, 5], statistical mechanics [6], and materials [7]. The absorption of artificial neu- ral networks (ANN) and other artificial intelligence (AI) tools in the physics research is natural in some respects. The reason is that those tools are built with the pur- pose of recogni...

  2. [2]

    Ntampaka, H

    M. Ntampaka, H. Trac, D. J. Sutherland, N. Battaglia, B. P´ oczos, and J. Schneider, The Astrophysical Journal 803, 50 (2015)

  3. [3]

    Guest, K

    D. Guest, K. Cranmer, and D. Whiteson, Annual Review of Nuclear and Particle Science68, 161 (2018)

  4. [4]

    D.-L. Deng, X. Li, and S. Das Sarma, Phys. Rev. X7, 021021 (2017)

  5. [5]

    D. Pfau, J. S. Spencer, A. G. D. G. Matthews, and W. M. C. Foulkes, Phys. Rev. Res.2, 033429 (2020)

  6. [6]

    Better, faster fermionic neural networks,

    J. S. Spencer, D. Pfau, A. Botev, and W. M. C. Foulkes, “Better, faster fermionic neural networks,” (2020), arXiv:2011.07125 [physics.comp-ph]

  7. [7]

    D. Wu, L. Wang, and P. Zhang, Phys. Rev. Lett.122, 080602 (2019)

  8. [8]

    T. Wen, L. Zhang, H. Wang, W. E, and D. J. Srolovitz, Materials Futures1, 022601 (2022)

  9. [9]

    Bardeen, L

    J. Bardeen, L. N. Cooper, and J. R. Schrieffer, Phys. Rev.108, 1175 (1957)

  10. [10]

    Anderson, Materials Research Bulletin8, 153 (1973)

    P. Anderson, Materials Research Bulletin8, 153 (1973)

  11. [11]

    R. B. Laughlin, Phys. Rev. Lett.50, 1395 (1983)

  12. [12]

    Cybenko, Mathematics of Control, Signals and Sys- tems2, 303 (1989)

    G. Cybenko, Mathematics of Control, Signals and Sys- tems2, 303 (1989)

  13. [13]

    Nagy and V

    A. Nagy and V. Savona, Physical Review Letters122, 250501 (2019)

  14. [14]

    C. Roth, A. Szab´ o, and A. H. MacDonald, Physical Re- view B108, 054410 (2023)

  15. [15]

    L. Yang, Z. Leng, G. Yu, A. Patel, W.-J. Hu, and H. Pu, Physical Review Research2, 012039 (2020)

  16. [16]

    Y. Qian, W. Fu, W. Ren, and J. Chen, The Journal of Chemical Physics157, 164104 (2022)

  17. [17]

    D. Pfau, S. Axelrod, H. Sutterud, I. von Glehn, and J. S. Spencer, Science385, eadn0137 (2024), publisher: American Association for the Advancement of Science

  18. [18]

    Pescia, J

    G. Pescia, J. Nys, J. Kim, A. Lovato, and G. Carleo, Phys. Rev. B110, 035108 (2024), publisher: American Physical Society

  19. [19]

    Cassella, H

    G. Cassella, H. Sutterud, S. Azadi, N. Drummond, D. Pfau, J. S. Spencer, and W. Foulkes, Phys. Rev. Lett. 130, 036401 (2023), publisher: American Physical Soci- ety

  20. [20]

    W. T. Lou, H. Sutterud, G. Cassella, W. Foulkes, J. Knolle, D. Pfau, and J. S. Spencer, Phys. Rev. X14, 021030 (2024), publisher: American Physical Society

  21. [21]

    Freitas and S

    W. Freitas and S. A. Vitiello, Quantum7, 1209 (2023)

  22. [22]

    Freitas, B

    W. Freitas, B. Abreu, and S. A. Vitiello, Journal of Low Temperature Physics (2024), 10.1007/s10909-024-03061- w

  23. [23]

    Pescia, J

    G. Pescia, J. Han, A. Lovato, J. Lu, and G. Carleo, Phys. Rev. Research4, 023138 (2022)

  24. [24]

    H. H. Goldstine,The computer from Pascal to von Neu- mann(Princeton University Press, 1993)

  25. [25]

    A. A. Lovelace, Taylor’s Scientific Memoirs3, 666 (1842)

  26. [26]

    Welchman,The Hut Six Story: Breaking the Enigma Codes(McGraw-Hill, 1982)

    G. Welchman,The Hut Six Story: Breaking the Enigma Codes(McGraw-Hill, 1982)

  27. [27]

    RANDELL, inA History of Computing in the Twentieth Century, edited by N

    B. RANDELL, inA History of Computing in the Twentieth Century, edited by N. METROPOLIS, J. HOWLETT, and G.-C. ROTA (Academic Press, San Diego, 1980) pp. 47–92

  28. [28]

    H. H. Goldstine and A. Goldstine, inThe Origins of Digi- tal Computers: Selected Papers(Springer, 1946) pp. 359– 373

  29. [29]

    Von Neumann, IEEE Annals of the History of Com- puting15, 27 (1993)

    J. Von Neumann, IEEE Annals of the History of Com- puting15, 27 (1993)

  30. [30]

    A. M. TURING, MindLIX, 433 (1950), https://academic.oup.com/mind/article- pdf/LIX/236/433/30123314/lix-236-433.pdf

  31. [31]

    Early science acceleration experiments with gpt-5,

    S. Bubeck, C. Coester, R. Eldan, T. Gowers, Y. T. Lee, A. Lupsasca, M. Sawhney, R. Scherrer, M. Sellke, B. K. Spears, D. Unutmaz, K. Weil, S. Yin, and N. Zhiv- otovskiy, “Early science acceleration experiments with gpt-5,” (2025), arXiv:2511.16072 [cs.CL]

  32. [32]

    DeepSeek-V3 Technical Report

    DeepSeek-AI, “Deepseek-v3 technical report,” (2025), arXiv:2412.19437 [cs.CL]

  33. [33]

    Uchendu, Z

    A. Uchendu, Z. Ma, T. Le, R. Zhang, and D. Lee, CoRR abs/2109.13296(2021)

  34. [34]

    Dargan, S

    S. Dargan, S. Bansal, M. Kumar, A. Mittal, and K. Ku- mar, ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING30, 1057 (2023)

  35. [35]

    Z. Wang, J. Zhan, C. Duan, X. Guan, P. Lu, and K. Yang, IEEE TRANSACTIONS ON NEURAL NET- WORKS AND LEARNING SYSTEMS34, 3811 (2023)

  36. [36]

    Raucci, A

    U. Raucci, A. Valentini, E. Pieri, H. Weir, S. Seritan, and T. J. Martinez, NATURE COMPUTATIONAL SCI- ENCE1, 42 (2021)

  37. [37]

    L. Li, B. Lei, and C. Mao, JOURNAL OF INDUSTRIAL INFORMATION INTEGRATION26(2022)

  38. [38]

    Moore, PROCEEDINGS OF THE IEEE86, 82 (1998)

    G. Moore, PROCEEDINGS OF THE IEEE86, 82 (1998)

  39. [39]

    Roser, H

    M. Roser, H. Ritchie, and E. Mathieu, Our World in Data (2023), https://ourworldindata.org/moores-law

  40. [40]

    Goodfellow, Y

    I. Goodfellow, Y. Bengio, and A. Courville,Deep learn- ing(MIT press, 2016)

  41. [41]

    W. S. McCulloch and W. Pitts, The bulletin of mathe- matical biophysics5, 115 (1943)

  42. [42]

    Rosenblatt, Psychological review65, 386 (1958)

    F. Rosenblatt, Psychological review65, 386 (1958)

  43. [43]

    Fukushima, Biological Cybernetics20, 121 (1975)

    K. Fukushima, Biological Cybernetics20, 121 (1975)

  44. [44]

    D. E. Rumelhart, J. L. McClelland, and P. R. Group, Parallel Distributed Processing, Volume 1: Explorations in the Microstructure of Cognition: Foundations(The MIT Press, 1986)

  45. [45]

    D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Nature323, 533 (1986)

  46. [46]

    G. E. Hinton, Progress in brain research165, 535 (2007)

  47. [47]

    LeCun, Y

    Y. LeCun, Y. Bengio,et al., Large-scale kernel machines 5, 127 (2007)

  48. [48]

    Carrasquilla and R

    J. Carrasquilla and R. G. Melko, Nature Physics13, 431 (2017)

  49. [49]

    Carleo, I

    G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, and L. Zdeborov´ a, Rev. Mod. Phys.91, 045002 (2019). 11

  50. [50]

    Gy¨ orgyi, Phys

    G. Gy¨ orgyi, Phys. Rev. A41, 7097 (1990)

  51. [51]

    Albert and R

    J. Albert and R. H. Swendsen, Physics Procedia57, 99 (2014), proceedings of the 27th Workshop on Com- puter Simulation Studies in Condensed Matter Physics (CSP2014)

  52. [52]

    Cohen-Tannoudji, B

    C. Cohen-Tannoudji, B. Diu, F. Laloe, and B. Dui, Quantum Mechanics (2 vol. set)(Wiley-Interscience, 2006)

  53. [53]

    Metropolis, A

    N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, The Journal of Chemical Physics21, 1087 (1953)

  54. [54]

    Mascagni and A

    M. Mascagni and A. Srinivasan, ACM Trans. Math. Softw.26, 436–461 (2000)

  55. [55]

    JAX: com- posable transformations of Python+NumPy programs,

    J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. Vander- Plas, S. Wanderman-Milne, and Q. Zhang, “JAX: com- posable transformations of Python+NumPy programs,” (2018)

  56. [56]

    Mitchell, Publisher: McGraw Hill (1997)

    T. Mitchell, Publisher: McGraw Hill (1997)

  57. [57]

    Adam: A Method for Stochastic Optimization

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” (2017), arXiv:1412.6980 [cs.LG]

  58. [58]

    Artificial Neural Network-based Variational Monte Carlo Introduction,

    W. Freitas, “Artificial Neural Network-based Variational Monte Carlo Introduction,” (2025)

  59. [59]

    Napsuciale and S

    M. Napsuciale and S. Rodr´ ıguez, Physics Letters B816, 136218 (2021)

  60. [60]

    Kosztin, B

    I. Kosztin, B. Faber, and K. Schulten, American Journal of Physics64, 633 (1996)

  61. [61]

    Ruggeri, S

    M. Ruggeri, S. Moroni, and M. Holzmann, Phys. Rev. Lett.120, 205302 (2018)

  62. [62]

    Freitas, B

    W. Freitas, B. Abreu, and S. A. Vitiello, Phys. Rev. B 112, 165109 (2025). Appendix A: Blocking method and standard error estimation Since the Monte Carlo integration is based on a Markov chain, successive samples are correlated. Therefore, the effective number of samples is smaller than the actual number of samplesM. Consequently, the usual standard erro...