pith. sign in

arxiv: 2605.29683 · v1 · pith:5J2FULSInew · submitted 2026-05-28 · ⚛️ physics.comp-ph

WF-Bench: A Benchmark for Neural Network WaveFunction Expressivity and Scaling Laws

Pith reviewed 2026-06-29 00:11 UTC · model grok-4.3

classification ⚛️ physics.comp-ph
keywords neural network wavefunctionsbenchmark datasetmany-body systemsscaling lawswavefunction fidelitytopological statesWigner crystalssuperconducting wavefunctions
0
0 comments X

The pith

WF-Bench supplies a dataset of exact many-body wavefunctions and a matching protocol that lets researchers compare neural network expressivity across quantum regimes using a single fidelity metric.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates WF-Bench, a collection of target wavefunctions drawn from topological states, Wigner crystals, and superconducting systems. It defines a reproducible protocol that trains neural networks to match these targets and measures success by wavefunction fidelity. From the resulting data the authors extract empirical scaling relations that link representability to system size, number of determinants, and model depth. When the same protocol is run on Psiformer and Ferminet, the benchmark produces consistent rankings and highlights how architecture choices affect performance. The central goal is to replace ad-hoc comparisons with a shared, dataset-driven standard for designing future neural wavefunction models.

Core claim

WF-Bench assembles target wavefunctions from multiple strongly correlated regimes and supplies a uniform matching protocol that uses fidelity to quantify how well a neural network reproduces each target; the resulting measurements yield scaling laws that describe how expressivity grows with system size and with model parameters such as determinant count and network depth.

What carries the argument

The WF-Bench dataset together with its fidelity-based matching protocol, which converts any neural wavefunction architecture into a numerical score against each target state.

If this is right

  • Architectures can be ranked on a common scale instead of separate ad-hoc tests.
  • Scaling laws give quantitative guidance on how many determinants or layers are needed for a given system size.
  • Future model design can be driven by performance gaps revealed on specific regimes such as topological or crystalline targets.
  • The protocol can be reapplied to any new neural ansatz without changing the evaluation metric.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same benchmark could be used to test whether hybrid classical-quantum circuits obey the same scaling trends.
  • If the fidelity metric correlates with variational energies on larger systems, the benchmark could shorten the search for good initial parameters.
  • Extending the dataset to time-dependent or open-system targets would test whether the scaling laws remain valid outside equilibrium ground states.

Load-bearing premise

The chosen quantum regimes and the single fidelity metric together form a representative test bed for general neural network wavefunction expressivity.

What would settle it

A new neural architecture that scores high on all WF-Bench targets yet shows systematically lower fidelity on an independent set of many-body states drawn from the same physical regimes would falsify the claim that the benchmark captures general expressivity.

Figures

Figures reproduced from arXiv: 2605.29683 by Di Luo, Guijing Duan, Lixing Zhang.

Figure 1
Figure 1. Figure 1: Schematic of the WF-Bench workflow. The dataset consists three categories of wavefunctions: topological states, Wigner crystals and superconducting wavefunctions. By using − log F as the loss function, NN wavefunctions are optimized to match both the phase and the amplitude of the target wavefunctions. This enables uniform representation power benchmarks across key system and model parameters, including nu… view at source ↗
Figure 2
Figure 2. Figure 2: Feature plots of target wavefunctions. (a) Amplitude and phase of topological states obtained by scanning the position of one electron while fixing all others. Blue dots indicate fixed electrons, and red crosses denote quasiholes. (b) Charge density patterns of different Wigner crystals. (c) Real space structure of the pairing functions for different superconducting states. passing networks for homogeneous… view at source ↗
Figure 3
Figure 3. Figure 3: The value of F(Ne = 8) for all 31 wavefunctions included in the dataset. via transfer learning. However, for topological wavefunc￾tions with complex phase windings, even transfer learning fails. One possible remedy is to match probability currents instead of fidelity , as proposed in (Nazaryan et al., 2025). However, we find the proposed protocol suffers from self￾trapping during early stage of training. S… view at source ↗
Figure 4
Figure 4. Figure 4: Fidelity scaling of 9 representative wavefunctions from superconductors (blue), topological states (red), and Wigner crystals (green). (a–c) Fidelity versus Ne for Ferminet with Ndet = 8 and Nlayer = 2. (d–f) Corresponding results for Psiformer with Ndet = 8 and Nlayer = 2. Dotted lines indicate empirical fits of the form F = 1 − α(Ne − 2)β . target wavefunctions for consistency). To impose physics prior t… view at source ↗
Figure 5
Figure 5. Figure 5: Fidelity scaling of bcs s and moore m2 with respect to Ndet. Ferminet (red squares) is evaluated at Ne = 10, and Psiformer (blue triangles) is evaluated at Ne = 14. Both Psiformer and Ferminet are set as Nlayer = 2. the overall weak correlations, we observe a faster fidelity decay for wc moire v1A due to the increased structural complexity of the moire orbital, as can be seen in [PITH_FULL_IMAGE:figures/f… view at source ↗
Figure 6
Figure 6. Figure 6: Fidelity scaling of bcs s (a) and moore m2 (b) as a function of the number of layers Nlayer. Ferminet (red squares) is evaluated at Ne = 10, and Psiformer (blue triangles) is evaluated at Ne = 14. Both Psiformer and Ferminet are set as: Ndet = 8. moore m2 as a function of the number of layers Nlayer. When Nlayer = 1, we observe the fidelity achieved by network is very low. In particular, for moore m2, both… view at source ↗
read the original abstract

We present a comprehensive benchmarking dataset and empirical scaling law analysis for neural network wavefunctions by matching them to a wide spectrum of famous many body target wavefunctions. The dataset, WF-Bench, spans multiple distinct regimes of strongly correlated quantum matter, including topological states, Wigner crystals, and superconducting wavefunctions, providing a diverse and challenging test bed for neural network wavefunction expressivity. We introduce a systematic and reproducible benchmarking protocol for target wavefunction matching, enabling consistent performance evaluation across different neural network wavefunction architectures. By using wavefunction fidelity as the uniform metric, we discover empirical scaling laws that characterize how representability depends on system size and key model parameters, including number of determinant and model depth. By applying our benchmark protocol on Psiformer and Ferminet, we show that WF-Bench establishes a unified dataset driven framework for evaluating and comparing neural network wavefunctions and for guiding the design of future architectures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript presents WF-Bench, a benchmarking dataset spanning topological states, Wigner crystals, and superconducting wavefunctions, together with a reproducible protocol for matching neural network wavefunctions (e.g., Psiformer, Ferminet) to target many-body states. It uses wavefunction fidelity as the uniform metric, reports empirical scaling laws relating representability to system size and model parameters (number of determinants, depth), and positions the benchmark as a unified, dataset-driven framework for evaluating and guiding neural-network ansatz design.

Significance. A well-validated, reproducible benchmark with documented scaling relations would be a useful contribution to the neural-network wavefunction literature, providing a common test bed and quantitative guidance on architecture choices. The paper's emphasis on multiple strongly correlated regimes and a single fidelity metric is a reasonable starting point, but the actual significance cannot be assessed without the methods, dataset construction details, and quantitative results.

minor comments (2)
  1. The abstract states that scaling laws are discovered, but the manuscript provides no concrete functional forms, fitting procedures, or error bars on the reported relations; this should be clarified in the results section.
  2. The choice of target wavefunctions and the precise definition of fidelity (overlap, log-overlap, or other) are central to the benchmark but are not described at a level that allows independent reproduction.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review of our manuscript on WF-Bench. The report notes that significance cannot be fully assessed without methods, dataset details, and quantitative results; we address this point directly below. No other major comments were enumerated in the report.

read point-by-point responses
  1. Referee: the actual significance cannot be assessed without the methods, dataset construction details, and quantitative results.

    Authors: The manuscript provides a full description of the reproducible benchmarking protocol for matching neural network wavefunctions (Psiformer, Ferminet) to target states, the construction of the WF-Bench dataset across topological states, Wigner crystals, and superconducting wavefunctions, and quantitative results including empirical scaling laws relating representability to system size, determinant count, and model depth, all evaluated under a uniform wavefunction fidelity metric. These elements are presented to support assessment of the benchmark as a dataset-driven framework. revision: no

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces WF-Bench as an empirical benchmarking dataset spanning quantum matter regimes and applies a fidelity-based protocol to architectures such as Psiformer and Ferminet. Scaling laws are presented as discovered empirical relations between representability, system size, determinant count, and model depth. No load-bearing derivation, uniqueness theorem, ansatz, or fitted parameter is shown to reduce by construction to the paper's own inputs or self-citations; the central claims rest on external target wavefunctions and observed performance metrics.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the chosen target wavefunctions adequately test expressivity and that the scaling laws generalize beyond the tested models (Psiformer and Ferminet). No free parameters or invented entities are mentioned.

axioms (1)
  • domain assumption Wavefunction fidelity serves as an appropriate uniform metric for evaluating expressivity across different neural network architectures and quantum regimes.
    The abstract states 'By using wavefunction fidelity as the uniform metric'.

pith-pipeline@v0.9.1-grok · 5687 in / 1290 out tokens · 34194 ms · 2026-06-29T00:11:44.436024+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 13 canonical work pages · 1 internal anchor

  1. [1]

    Jiang, X

    Jiang, D., Wen, X., Chen, Y ., Li, R., Fu, W., Pham, H. Q., Chen, J., He, D., Goddard III, W. A., Wang, L., et al. Neural scaling laws surpass chemical accuracy for the many-electron schr\” odinger equation.arXiv preprint arXiv:2508.02570,

  2. [2]

    Deep Learning and Quantum Entanglement: Fundamental Connections with Implications to Network Design

    Levine, Y ., Yakira, D., Cohen, N., and Shashua, A. Deep learning and quantum entanglement: Fundamental con- nections with implications to network design.arXiv preprint arXiv:1704.01552,

  3. [3]

    Symmetry enforced solution of the many- body schr\” odinger equation with deep neural network

    Li, Z., Lu, Z., Li, R., Wen, X., Li, X., Wang, L., Chen, J., and Ren, W. Symmetry enforced solution of the many- body schr\” odinger equation with deep neural network. arXiv preprint arXiv:2406.01222,

  4. [4]

    Lin, J., Luo, D., Yao, X., and Shanahan, P. E. Real-time dynamics of the schwinger model as an open quantum system with neural density operators.Journal of High Energy Physics, 2024(6):1–23,

  5. [5]

    Information- theoretic scaling laws of neural quantum states.arXiv preprint arXiv:2603.23468,

    Lu, Y ., Bharadwaj, S., Rathore, D., and Luo, D. Information- theoretic scaling laws of neural quantum states.arXiv preprint arXiv:2603.23468,

  6. [6]

    Luo, D., Chen, Z., Carrasquilla, J., and Clark, B. K. Au- toregressive neural network for simulating open quantum systems via a probabilistic formulation.Physical review letters, 128(9):090501, 2022a. Luo, D., Yuan, S., Stokes, J., and Clark, B. K. Gauge equivariant neural networks for 2+ 1d u (1) gauge theory simulations in hamiltonian formulation.arXiv ...

  7. [7]

    D., and Fu, L

    Luo, D., Dai, D. D., and Fu, L. Simulating moir \’e quantum matter with neural network.arXiv preprint arXiv:2406.17645,

  8. [8]

    Solving fractional electron states in twisted mote 2 with deep neural network.arXiv preprint arXiv:2503.13585,

    9 WF-Bench: A Benchmark for Neural Network WaveFunction Expressivity and Scaling Laws Luo, D., Zaklama, T., and Fu, L. Solving fractional electron states in twisted mote 2 with deep neural network.arXiv preprint arXiv:2503.13585,

  9. [9]

    Artificial intelligence for quantum matter: Finding a needle in a haystack.arXiv preprint arXiv:2507.13322,

    Nazaryan, K., Gaggioli, F., Teng, Y ., and Fu, L. Artificial intelligence for quantum matter: Finding a needle in a haystack.arXiv preprint arXiv:2507.13322,

  10. [10]

    Bound on entanglement in neural quantum states

    Paul, N. Bound on entanglement in neural quantum states. arXiv preprint arXiv:2510.11797,

  11. [11]

    von Glehn, J

    von Glehn, I., Spencer, J. S., and Pfau, D. A self-attention ansatz for ab-initio quantum chemistry.arXiv preprint arXiv:2211.13672,

  12. [12]

    When can classical neural networks represent quantum states?arXiv preprint arXiv:2410.23152,

    Yang, T.-H., Soleimanifar, M., Bergamaschi, T., and Preskill, J. When can classical neural networks represent quantum states?arXiv preprint arXiv:2410.23152,

  13. [13]

    and Luo, D

    Zhang, L. and Luo, D. Neural transformer backflow for solving momentum-resolved ground states of strongly correlated materials.arXiv preprint arXiv:2509.09275,

  14. [14]

    Erwin: A tree-based hierarchical transformer for large-scale phys- ical systems.arXiv preprint arXiv:2502.17019,

    Zhdanov, M., Welling, M., and van de Meent, J.-W. Erwin: A tree-based hierarchical transformer for large-scale phys- ical systems.arXiv preprint arXiv:2502.17019,