WF-Bench: A Benchmark for Neural Network WaveFunction Expressivity and Scaling Laws

Di Luo; Guijing Duan; Lixing Zhang

arxiv: 2605.29683 · v1 · pith:5J2FULSInew · submitted 2026-05-28 · ⚛️ physics.comp-ph

WF-Bench: A Benchmark for Neural Network WaveFunction Expressivity and Scaling Laws

Lixing Zhang , Guijing Duan , Di Luo This is my paper

Pith reviewed 2026-06-29 00:11 UTC · model grok-4.3

classification ⚛️ physics.comp-ph

keywords neural network wavefunctionsbenchmark datasetmany-body systemsscaling lawswavefunction fidelitytopological statesWigner crystalssuperconducting wavefunctions

0 comments

The pith

WF-Bench supplies a dataset of exact many-body wavefunctions and a matching protocol that lets researchers compare neural network expressivity across quantum regimes using a single fidelity metric.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates WF-Bench, a collection of target wavefunctions drawn from topological states, Wigner crystals, and superconducting systems. It defines a reproducible protocol that trains neural networks to match these targets and measures success by wavefunction fidelity. From the resulting data the authors extract empirical scaling relations that link representability to system size, number of determinants, and model depth. When the same protocol is run on Psiformer and Ferminet, the benchmark produces consistent rankings and highlights how architecture choices affect performance. The central goal is to replace ad-hoc comparisons with a shared, dataset-driven standard for designing future neural wavefunction models.

Core claim

WF-Bench assembles target wavefunctions from multiple strongly correlated regimes and supplies a uniform matching protocol that uses fidelity to quantify how well a neural network reproduces each target; the resulting measurements yield scaling laws that describe how expressivity grows with system size and with model parameters such as determinant count and network depth.

What carries the argument

The WF-Bench dataset together with its fidelity-based matching protocol, which converts any neural wavefunction architecture into a numerical score against each target state.

If this is right

Architectures can be ranked on a common scale instead of separate ad-hoc tests.
Scaling laws give quantitative guidance on how many determinants or layers are needed for a given system size.
Future model design can be driven by performance gaps revealed on specific regimes such as topological or crystalline targets.
The protocol can be reapplied to any new neural ansatz without changing the evaluation metric.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same benchmark could be used to test whether hybrid classical-quantum circuits obey the same scaling trends.
If the fidelity metric correlates with variational energies on larger systems, the benchmark could shorten the search for good initial parameters.
Extending the dataset to time-dependent or open-system targets would test whether the scaling laws remain valid outside equilibrium ground states.

Load-bearing premise

The chosen quantum regimes and the single fidelity metric together form a representative test bed for general neural network wavefunction expressivity.

What would settle it

A new neural architecture that scores high on all WF-Bench targets yet shows systematically lower fidelity on an independent set of many-body states drawn from the same physical regimes would falsify the claim that the benchmark captures general expressivity.

Figures

Figures reproduced from arXiv: 2605.29683 by Di Luo, Guijing Duan, Lixing Zhang.

**Figure 1.** Figure 1: Schematic of the WF-Bench workflow. The dataset consists three categories of wavefunctions: topological states, Wigner crystals and superconducting wavefunctions. By using − log F as the loss function, NN wavefunctions are optimized to match both the phase and the amplitude of the target wavefunctions. This enables uniform representation power benchmarks across key system and model parameters, including nu… view at source ↗

**Figure 2.** Figure 2: Feature plots of target wavefunctions. (a) Amplitude and phase of topological states obtained by scanning the position of one electron while fixing all others. Blue dots indicate fixed electrons, and red crosses denote quasiholes. (b) Charge density patterns of different Wigner crystals. (c) Real space structure of the pairing functions for different superconducting states. passing networks for homogeneous… view at source ↗

**Figure 3.** Figure 3: The value of F(Ne = 8) for all 31 wavefunctions included in the dataset. via transfer learning. However, for topological wavefunctions with complex phase windings, even transfer learning fails. One possible remedy is to match probability currents instead of fidelity , as proposed in (Nazaryan et al., 2025). However, we find the proposed protocol suffers from selftrapping during early stage of training. S… view at source ↗

**Figure 4.** Figure 4: Fidelity scaling of 9 representative wavefunctions from superconductors (blue), topological states (red), and Wigner crystals (green). (a–c) Fidelity versus Ne for Ferminet with Ndet = 8 and Nlayer = 2. (d–f) Corresponding results for Psiformer with Ndet = 8 and Nlayer = 2. Dotted lines indicate empirical fits of the form F = 1 − α(Ne − 2)β . target wavefunctions for consistency). To impose physics prior t… view at source ↗

**Figure 5.** Figure 5: Fidelity scaling of bcs s and moore m2 with respect to Ndet. Ferminet (red squares) is evaluated at Ne = 10, and Psiformer (blue triangles) is evaluated at Ne = 14. Both Psiformer and Ferminet are set as Nlayer = 2. the overall weak correlations, we observe a faster fidelity decay for wc moire v1A due to the increased structural complexity of the moire orbital, as can be seen in [PITH_FULL_IMAGE:figures/f… view at source ↗

**Figure 6.** Figure 6: Fidelity scaling of bcs s (a) and moore m2 (b) as a function of the number of layers Nlayer. Ferminet (red squares) is evaluated at Ne = 10, and Psiformer (blue triangles) is evaluated at Ne = 14. Both Psiformer and Ferminet are set as: Ndet = 8. moore m2 as a function of the number of layers Nlayer. When Nlayer = 1, we observe the fidelity achieved by network is very low. In particular, for moore m2, both… view at source ↗

read the original abstract

We present a comprehensive benchmarking dataset and empirical scaling law analysis for neural network wavefunctions by matching them to a wide spectrum of famous many body target wavefunctions. The dataset, WF-Bench, spans multiple distinct regimes of strongly correlated quantum matter, including topological states, Wigner crystals, and superconducting wavefunctions, providing a diverse and challenging test bed for neural network wavefunction expressivity. We introduce a systematic and reproducible benchmarking protocol for target wavefunction matching, enabling consistent performance evaluation across different neural network wavefunction architectures. By using wavefunction fidelity as the uniform metric, we discover empirical scaling laws that characterize how representability depends on system size and key model parameters, including number of determinant and model depth. By applying our benchmark protocol on Psiformer and Ferminet, we show that WF-Bench establishes a unified dataset driven framework for evaluating and comparing neural network wavefunctions and for guiding the design of future architectures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

WF-Bench creates a new dataset and matching protocol for neural wavefunctions across several quantum regimes and reports scaling trends on two existing models, but the supporting details are thin.

read the letter

The main contribution here is WF-Bench itself: a collection of target wavefunctions from topological states, Wigner crystals, and superconducting regimes, paired with a protocol that scores neural ansatze by fidelity. They run the protocol on Psiformer and Ferminet and extract empirical relations between representability, system size, determinant count, and model depth.

This is useful because the subfield has had no shared test cases for comparing expressivity. A reproducible protocol with a single metric lets people put different architectures on the same footing, which is a practical step.

The soft spots are the missing specifics. The abstract does not describe how the target wavefunctions are generated or validated, which matters when the benchmark is meant to be challenging. The scaling laws are presented as discoveries, yet only two models are shown; it is unclear whether the trends survive other architectures or different hyperparameter choices. The assumption that fidelity on these particular states is representative of general expressivity is stated but not stress-tested in the summary.

The work is aimed at people who build or compare neural wavefunction methods for many-body problems. A reader who needs a common reference set would find the dataset and protocol worth looking at, provided the targets and code are released.

It deserves peer review. A benchmark paper can organize evaluation even if the scaling claims need more runs and clearer documentation to hold up.

Referee Report

0 major / 2 minor

Summary. The manuscript presents WF-Bench, a benchmarking dataset spanning topological states, Wigner crystals, and superconducting wavefunctions, together with a reproducible protocol for matching neural network wavefunctions (e.g., Psiformer, Ferminet) to target many-body states. It uses wavefunction fidelity as the uniform metric, reports empirical scaling laws relating representability to system size and model parameters (number of determinants, depth), and positions the benchmark as a unified, dataset-driven framework for evaluating and guiding neural-network ansatz design.

Significance. A well-validated, reproducible benchmark with documented scaling relations would be a useful contribution to the neural-network wavefunction literature, providing a common test bed and quantitative guidance on architecture choices. The paper's emphasis on multiple strongly correlated regimes and a single fidelity metric is a reasonable starting point, but the actual significance cannot be assessed without the methods, dataset construction details, and quantitative results.

minor comments (2)

The abstract states that scaling laws are discovered, but the manuscript provides no concrete functional forms, fitting procedures, or error bars on the reported relations; this should be clarified in the results section.
The choice of target wavefunctions and the precise definition of fidelity (overlap, log-overlap, or other) are central to the benchmark but are not described at a level that allows independent reproduction.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review of our manuscript on WF-Bench. The report notes that significance cannot be fully assessed without methods, dataset details, and quantitative results; we address this point directly below. No other major comments were enumerated in the report.

read point-by-point responses

Referee: the actual significance cannot be assessed without the methods, dataset construction details, and quantitative results.

Authors: The manuscript provides a full description of the reproducible benchmarking protocol for matching neural network wavefunctions (Psiformer, Ferminet) to target states, the construction of the WF-Bench dataset across topological states, Wigner crystals, and superconducting wavefunctions, and quantitative results including empirical scaling laws relating representability to system size, determinant count, and model depth, all evaluated under a uniform wavefunction fidelity metric. These elements are presented to support assessment of the benchmark as a dataset-driven framework. revision: no

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces WF-Bench as an empirical benchmarking dataset spanning quantum matter regimes and applies a fidelity-based protocol to architectures such as Psiformer and Ferminet. Scaling laws are presented as discovered empirical relations between representability, system size, determinant count, and model depth. No load-bearing derivation, uniqueness theorem, ansatz, or fitted parameter is shown to reduce by construction to the paper's own inputs or self-citations; the central claims rest on external target wavefunctions and observed performance metrics.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the chosen target wavefunctions adequately test expressivity and that the scaling laws generalize beyond the tested models (Psiformer and Ferminet). No free parameters or invented entities are mentioned.

axioms (1)

domain assumption Wavefunction fidelity serves as an appropriate uniform metric for evaluating expressivity across different neural network architectures and quantum regimes.
The abstract states 'By using wavefunction fidelity as the uniform metric'.

pith-pipeline@v0.9.1-grok · 5687 in / 1290 out tokens · 34194 ms · 2026-06-29T00:11:44.436024+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 13 canonical work pages · 1 internal anchor

[1]

Jiang, X

Jiang, D., Wen, X., Chen, Y ., Li, R., Fu, W., Pham, H. Q., Chen, J., He, D., Goddard III, W. A., Wang, L., et al. Neural scaling laws surpass chemical accuracy for the many-electron schr\” odinger equation.arXiv preprint arXiv:2508.02570,

work page arXiv
[2]

Deep Learning and Quantum Entanglement: Fundamental Connections with Implications to Network Design

Levine, Y ., Yakira, D., Cohen, N., and Shashua, A. Deep learning and quantum entanglement: Fundamental con- nections with implications to network design.arXiv preprint arXiv:1704.01552,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Symmetry enforced solution of the many- body schr\” odinger equation with deep neural network

Li, Z., Lu, Z., Li, R., Wen, X., Li, X., Wang, L., Chen, J., and Ren, W. Symmetry enforced solution of the many- body schr\” odinger equation with deep neural network. arXiv preprint arXiv:2406.01222,

work page arXiv
[4]

Lin, J., Luo, D., Yao, X., and Shanahan, P. E. Real-time dynamics of the schwinger model as an open quantum system with neural density operators.Journal of High Energy Physics, 2024(6):1–23,

2024
[5]

Information- theoretic scaling laws of neural quantum states.arXiv preprint arXiv:2603.23468,

Lu, Y ., Bharadwaj, S., Rathore, D., and Luo, D. Information- theoretic scaling laws of neural quantum states.arXiv preprint arXiv:2603.23468,

work page arXiv
[6]

Luo, D., Chen, Z., Carrasquilla, J., and Clark, B. K. Au- toregressive neural network for simulating open quantum systems via a probabilistic formulation.Physical review letters, 128(9):090501, 2022a. Luo, D., Yuan, S., Stokes, J., and Clark, B. K. Gauge equivariant neural networks for 2+ 1d u (1) gauge theory simulations in hamiltonian formulation.arXiv ...

work page arXiv
[7]

D., and Fu, L

Luo, D., Dai, D. D., and Fu, L. Simulating moir \’e quantum matter with neural network.arXiv preprint arXiv:2406.17645,

work page arXiv
[8]

Solving fractional electron states in twisted mote 2 with deep neural network.arXiv preprint arXiv:2503.13585,

9 WF-Bench: A Benchmark for Neural Network WaveFunction Expressivity and Scaling Laws Luo, D., Zaklama, T., and Fu, L. Solving fractional electron states in twisted mote 2 with deep neural network.arXiv preprint arXiv:2503.13585,

work page arXiv
[9]

Artificial intelligence for quantum matter: Finding a needle in a haystack.arXiv preprint arXiv:2507.13322,

Nazaryan, K., Gaggioli, F., Teng, Y ., and Fu, L. Artificial intelligence for quantum matter: Finding a needle in a haystack.arXiv preprint arXiv:2507.13322,

work page arXiv
[10]

Bound on entanglement in neural quantum states

Paul, N. Bound on entanglement in neural quantum states. arXiv preprint arXiv:2510.11797,

work page arXiv
[11]

von Glehn, J

von Glehn, I., Spencer, J. S., and Pfau, D. A self-attention ansatz for ab-initio quantum chemistry.arXiv preprint arXiv:2211.13672,

work page arXiv
[12]

When can classical neural networks represent quantum states?arXiv preprint arXiv:2410.23152,

Yang, T.-H., Soleimanifar, M., Bergamaschi, T., and Preskill, J. When can classical neural networks represent quantum states?arXiv preprint arXiv:2410.23152,

work page arXiv
[13]

and Luo, D

Zhang, L. and Luo, D. Neural transformer backflow for solving momentum-resolved ground states of strongly correlated materials.arXiv preprint arXiv:2509.09275,

work page arXiv
[14]

Erwin: A tree-based hierarchical transformer for large-scale phys- ical systems.arXiv preprint arXiv:2502.17019,

Zhdanov, M., Welling, M., and van de Meent, J.-W. Erwin: A tree-based hierarchical transformer for large-scale phys- ical systems.arXiv preprint arXiv:2502.17019,

work page arXiv

[1] [1]

Jiang, X

Jiang, D., Wen, X., Chen, Y ., Li, R., Fu, W., Pham, H. Q., Chen, J., He, D., Goddard III, W. A., Wang, L., et al. Neural scaling laws surpass chemical accuracy for the many-electron schr\” odinger equation.arXiv preprint arXiv:2508.02570,

work page arXiv

[2] [2]

Deep Learning and Quantum Entanglement: Fundamental Connections with Implications to Network Design

Levine, Y ., Yakira, D., Cohen, N., and Shashua, A. Deep learning and quantum entanglement: Fundamental con- nections with implications to network design.arXiv preprint arXiv:1704.01552,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Symmetry enforced solution of the many- body schr\” odinger equation with deep neural network

Li, Z., Lu, Z., Li, R., Wen, X., Li, X., Wang, L., Chen, J., and Ren, W. Symmetry enforced solution of the many- body schr\” odinger equation with deep neural network. arXiv preprint arXiv:2406.01222,

work page arXiv

[4] [4]

Lin, J., Luo, D., Yao, X., and Shanahan, P. E. Real-time dynamics of the schwinger model as an open quantum system with neural density operators.Journal of High Energy Physics, 2024(6):1–23,

2024

[5] [5]

Information- theoretic scaling laws of neural quantum states.arXiv preprint arXiv:2603.23468,

Lu, Y ., Bharadwaj, S., Rathore, D., and Luo, D. Information- theoretic scaling laws of neural quantum states.arXiv preprint arXiv:2603.23468,

work page arXiv

[6] [6]

Luo, D., Chen, Z., Carrasquilla, J., and Clark, B. K. Au- toregressive neural network for simulating open quantum systems via a probabilistic formulation.Physical review letters, 128(9):090501, 2022a. Luo, D., Yuan, S., Stokes, J., and Clark, B. K. Gauge equivariant neural networks for 2+ 1d u (1) gauge theory simulations in hamiltonian formulation.arXiv ...

work page arXiv

[7] [7]

D., and Fu, L

Luo, D., Dai, D. D., and Fu, L. Simulating moir \’e quantum matter with neural network.arXiv preprint arXiv:2406.17645,

work page arXiv

[8] [8]

Solving fractional electron states in twisted mote 2 with deep neural network.arXiv preprint arXiv:2503.13585,

9 WF-Bench: A Benchmark for Neural Network WaveFunction Expressivity and Scaling Laws Luo, D., Zaklama, T., and Fu, L. Solving fractional electron states in twisted mote 2 with deep neural network.arXiv preprint arXiv:2503.13585,

work page arXiv

[9] [9]

Artificial intelligence for quantum matter: Finding a needle in a haystack.arXiv preprint arXiv:2507.13322,

Nazaryan, K., Gaggioli, F., Teng, Y ., and Fu, L. Artificial intelligence for quantum matter: Finding a needle in a haystack.arXiv preprint arXiv:2507.13322,

work page arXiv

[10] [10]

Bound on entanglement in neural quantum states

Paul, N. Bound on entanglement in neural quantum states. arXiv preprint arXiv:2510.11797,

work page arXiv

[11] [11]

von Glehn, J

von Glehn, I., Spencer, J. S., and Pfau, D. A self-attention ansatz for ab-initio quantum chemistry.arXiv preprint arXiv:2211.13672,

work page arXiv

[12] [12]

When can classical neural networks represent quantum states?arXiv preprint arXiv:2410.23152,

Yang, T.-H., Soleimanifar, M., Bergamaschi, T., and Preskill, J. When can classical neural networks represent quantum states?arXiv preprint arXiv:2410.23152,

work page arXiv

[13] [13]

and Luo, D

Zhang, L. and Luo, D. Neural transformer backflow for solving momentum-resolved ground states of strongly correlated materials.arXiv preprint arXiv:2509.09275,

work page arXiv

[14] [14]

Erwin: A tree-based hierarchical transformer for large-scale phys- ical systems.arXiv preprint arXiv:2502.17019,

Zhdanov, M., Welling, M., and van de Meent, J.-W. Erwin: A tree-based hierarchical transformer for large-scale phys- ical systems.arXiv preprint arXiv:2502.17019,

work page arXiv