WF-Bench: A Benchmark for Neural Network WaveFunction Expressivity and Scaling Laws
Pith reviewed 2026-06-29 00:11 UTC · model grok-4.3
The pith
WF-Bench supplies a dataset of exact many-body wavefunctions and a matching protocol that lets researchers compare neural network expressivity across quantum regimes using a single fidelity metric.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
WF-Bench assembles target wavefunctions from multiple strongly correlated regimes and supplies a uniform matching protocol that uses fidelity to quantify how well a neural network reproduces each target; the resulting measurements yield scaling laws that describe how expressivity grows with system size and with model parameters such as determinant count and network depth.
What carries the argument
The WF-Bench dataset together with its fidelity-based matching protocol, which converts any neural wavefunction architecture into a numerical score against each target state.
If this is right
- Architectures can be ranked on a common scale instead of separate ad-hoc tests.
- Scaling laws give quantitative guidance on how many determinants or layers are needed for a given system size.
- Future model design can be driven by performance gaps revealed on specific regimes such as topological or crystalline targets.
- The protocol can be reapplied to any new neural ansatz without changing the evaluation metric.
Where Pith is reading between the lines
- The same benchmark could be used to test whether hybrid classical-quantum circuits obey the same scaling trends.
- If the fidelity metric correlates with variational energies on larger systems, the benchmark could shorten the search for good initial parameters.
- Extending the dataset to time-dependent or open-system targets would test whether the scaling laws remain valid outside equilibrium ground states.
Load-bearing premise
The chosen quantum regimes and the single fidelity metric together form a representative test bed for general neural network wavefunction expressivity.
What would settle it
A new neural architecture that scores high on all WF-Bench targets yet shows systematically lower fidelity on an independent set of many-body states drawn from the same physical regimes would falsify the claim that the benchmark captures general expressivity.
Figures
read the original abstract
We present a comprehensive benchmarking dataset and empirical scaling law analysis for neural network wavefunctions by matching them to a wide spectrum of famous many body target wavefunctions. The dataset, WF-Bench, spans multiple distinct regimes of strongly correlated quantum matter, including topological states, Wigner crystals, and superconducting wavefunctions, providing a diverse and challenging test bed for neural network wavefunction expressivity. We introduce a systematic and reproducible benchmarking protocol for target wavefunction matching, enabling consistent performance evaluation across different neural network wavefunction architectures. By using wavefunction fidelity as the uniform metric, we discover empirical scaling laws that characterize how representability depends on system size and key model parameters, including number of determinant and model depth. By applying our benchmark protocol on Psiformer and Ferminet, we show that WF-Bench establishes a unified dataset driven framework for evaluating and comparing neural network wavefunctions and for guiding the design of future architectures.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents WF-Bench, a benchmarking dataset spanning topological states, Wigner crystals, and superconducting wavefunctions, together with a reproducible protocol for matching neural network wavefunctions (e.g., Psiformer, Ferminet) to target many-body states. It uses wavefunction fidelity as the uniform metric, reports empirical scaling laws relating representability to system size and model parameters (number of determinants, depth), and positions the benchmark as a unified, dataset-driven framework for evaluating and guiding neural-network ansatz design.
Significance. A well-validated, reproducible benchmark with documented scaling relations would be a useful contribution to the neural-network wavefunction literature, providing a common test bed and quantitative guidance on architecture choices. The paper's emphasis on multiple strongly correlated regimes and a single fidelity metric is a reasonable starting point, but the actual significance cannot be assessed without the methods, dataset construction details, and quantitative results.
minor comments (2)
- The abstract states that scaling laws are discovered, but the manuscript provides no concrete functional forms, fitting procedures, or error bars on the reported relations; this should be clarified in the results section.
- The choice of target wavefunctions and the precise definition of fidelity (overlap, log-overlap, or other) are central to the benchmark but are not described at a level that allows independent reproduction.
Simulated Author's Rebuttal
We thank the referee for their review of our manuscript on WF-Bench. The report notes that significance cannot be fully assessed without methods, dataset details, and quantitative results; we address this point directly below. No other major comments were enumerated in the report.
read point-by-point responses
-
Referee: the actual significance cannot be assessed without the methods, dataset construction details, and quantitative results.
Authors: The manuscript provides a full description of the reproducible benchmarking protocol for matching neural network wavefunctions (Psiformer, Ferminet) to target states, the construction of the WF-Bench dataset across topological states, Wigner crystals, and superconducting wavefunctions, and quantitative results including empirical scaling laws relating representability to system size, determinant count, and model depth, all evaluated under a uniform wavefunction fidelity metric. These elements are presented to support assessment of the benchmark as a dataset-driven framework. revision: no
Circularity Check
No significant circularity detected
full rationale
The paper introduces WF-Bench as an empirical benchmarking dataset spanning quantum matter regimes and applies a fidelity-based protocol to architectures such as Psiformer and Ferminet. Scaling laws are presented as discovered empirical relations between representability, system size, determinant count, and model depth. No load-bearing derivation, uniqueness theorem, ansatz, or fitted parameter is shown to reduce by construction to the paper's own inputs or self-citations; the central claims rest on external target wavefunctions and observed performance metrics.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Wavefunction fidelity serves as an appropriate uniform metric for evaluating expressivity across different neural network architectures and quantum regimes.
Reference graph
Works this paper leans on
- [1]
-
[2]
Deep Learning and Quantum Entanglement: Fundamental Connections with Implications to Network Design
Levine, Y ., Yakira, D., Cohen, N., and Shashua, A. Deep learning and quantum entanglement: Fundamental con- nections with implications to network design.arXiv preprint arXiv:1704.01552,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
Symmetry enforced solution of the many- body schr\” odinger equation with deep neural network
Li, Z., Lu, Z., Li, R., Wen, X., Li, X., Wang, L., Chen, J., and Ren, W. Symmetry enforced solution of the many- body schr\” odinger equation with deep neural network. arXiv preprint arXiv:2406.01222,
-
[4]
Lin, J., Luo, D., Yao, X., and Shanahan, P. E. Real-time dynamics of the schwinger model as an open quantum system with neural density operators.Journal of High Energy Physics, 2024(6):1–23,
2024
-
[5]
Information- theoretic scaling laws of neural quantum states.arXiv preprint arXiv:2603.23468,
Lu, Y ., Bharadwaj, S., Rathore, D., and Luo, D. Information- theoretic scaling laws of neural quantum states.arXiv preprint arXiv:2603.23468,
-
[6]
Luo, D., Chen, Z., Carrasquilla, J., and Clark, B. K. Au- toregressive neural network for simulating open quantum systems via a probabilistic formulation.Physical review letters, 128(9):090501, 2022a. Luo, D., Yuan, S., Stokes, J., and Clark, B. K. Gauge equivariant neural networks for 2+ 1d u (1) gauge theory simulations in hamiltonian formulation.arXiv ...
-
[7]
Luo, D., Dai, D. D., and Fu, L. Simulating moir \’e quantum matter with neural network.arXiv preprint arXiv:2406.17645,
-
[8]
9 WF-Bench: A Benchmark for Neural Network WaveFunction Expressivity and Scaling Laws Luo, D., Zaklama, T., and Fu, L. Solving fractional electron states in twisted mote 2 with deep neural network.arXiv preprint arXiv:2503.13585,
-
[9]
Nazaryan, K., Gaggioli, F., Teng, Y ., and Fu, L. Artificial intelligence for quantum matter: Finding a needle in a haystack.arXiv preprint arXiv:2507.13322,
-
[10]
Bound on entanglement in neural quantum states
Paul, N. Bound on entanglement in neural quantum states. arXiv preprint arXiv:2510.11797,
-
[11]
von Glehn, I., Spencer, J. S., and Pfau, D. A self-attention ansatz for ab-initio quantum chemistry.arXiv preprint arXiv:2211.13672,
-
[12]
When can classical neural networks represent quantum states?arXiv preprint arXiv:2410.23152,
Yang, T.-H., Soleimanifar, M., Bergamaschi, T., and Preskill, J. When can classical neural networks represent quantum states?arXiv preprint arXiv:2410.23152,
-
[13]
Zhang, L. and Luo, D. Neural transformer backflow for solving momentum-resolved ground states of strongly correlated materials.arXiv preprint arXiv:2509.09275,
-
[14]
Zhdanov, M., Welling, M., and van de Meent, J.-W. Erwin: A tree-based hierarchical transformer for large-scale phys- ical systems.arXiv preprint arXiv:2502.17019,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.