arxiv: 2604.03294 · v2 · submitted 2026-03-27 · ❄️ cond-mat.str-el · cond-mat.dis-nn· cs.LG· quant-ph

Recognition: no theorem link

Expressibility of neural quantum states: a Walsh-complexity perspective

Taige Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-14 23:00 UTC · model grok-4.3

classification ❄️ cond-mat.str-el cond-mat.dis-nncs.LGquant-ph

keywords neural quantum statesWalsh complexityexpressibilityfeed-forward networksmany-body wavefunctionsvariational methodsshort-range entanglement

0 comments

The pith

Shallow additive networks cannot efficiently represent quantum states with uniform Walsh spectra even when entanglement is short-ranged.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines Walsh complexity as the spread of a wavefunction across parity patterns and shows that nearly uniform spectra force any accurate approximant to pay an exponential cost in parameters. It proves that shallow additive feed-forward networks with tame activations such as polynomials and subexponential scaling cannot meet this cost. A dimerized state built from a single layer of disjoint controlled-Z gates achieves maximal Walsh complexity while remaining short-range entangled and tensor-network simple. Fitting experiments across system sizes confirm that polynomial activations succeed only once depth reaches logarithmic scaling in N, whereas tanh activations produce an abrupt jump at depth three.

Core claim

Walsh complexity measures how broadly a many-body wavefunction distributes its amplitude over parity (Walsh) basis functions. States whose Walsh spectrum is close to uniform require exponentially large resources from any good neural approximant. Shallow additive feed-forward networks operating in the tame regime, such as polynomial activations with subexponential parameter counts, cannot generate this resource; depth must grow at least logarithmically with system size N. The dimerized controlled-Z state provides an explicit counter-example that is simple by entanglement and tensor-network standards yet maximal by Walsh complexity.

What carries the argument

Walsh complexity, the basis-dependent measure of spread over parity patterns that forces exponential cost on shallow tame networks for uniform spectra.

If this is right

Any state with a nearly uniform Walsh spectrum demands either exponential parameters or at least logarithmic depth from additive networks.
The dimerized controlled-Z state serves as a minimal, short-range-entangled witness that depth is required beyond what entanglement measures detect.
Polynomial activations succeed only when depth scales as log N; tanh activations cross a sharp threshold already at depth three.
Walsh complexity supplies an expressibility axis orthogonal to entanglement for judging variational wavefunctions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Architectures that add non-additive layers or use different activations may evade the logarithmic-depth barrier identified here.
Walsh complexity could be applied to other variational families to reveal similar hidden depth costs in low-entanglement regimes.
The same parity-spread measure might illuminate expressivity limits in classical learning tasks that rely on parity functions.

Load-bearing premise

Additive feed-forward networks with tame activations and subexponential parameter counts are the relevant class whose limitations bound the expressibility of neural quantum states.

What would settle it

A shallow additive network using polynomial activations and fewer than exponentially many parameters that reaches high fidelity on the dimerized controlled-Z state for large N would falsify the claimed depth requirement.

Figures

Figures reproduced from arXiv: 2604.03294 by Taige Wang.

**Figure 2.** Figure 2: FIG. 2 [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

Neural quantum states are powerful variational wavefunctions, but it remains unclear which many-body states can be represented efficiently by modern additive architectures. We introduce Walsh complexity, a basis-dependent measure of how broadly a wavefunction is spread over parity patterns. States with an almost uniform Walsh spectrum require exponentially large Walsh complexity from any good approximant. We show that shallow additive feed-forward networks cannot generate such complexity in the tame regime, e.g. polynomial activations with subexponential parameter scaling. As a concrete example, we construct a simple dimerized state prepared by a single layer of disjoint controlled-$Z$ gates. Although it has only short-range entanglement and a simple tensor-network description, its Walsh complexity is maximal. Full-cube fits across system size and depth are consistent with the complexity bound: for polynomial activations, successful fitting appears only once depth reaches a logarithmic scale in $N$, whereas activation saturation in $\tanh$ produces a sharp threshold-like jump already at depth $3$. Walsh complexity therefore provides an expressibility axis complementary to entanglement and clarifies when depth becomes an essential resource for additive neural quantum states.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Walsh complexity gives a clean new axis for when shallow additive NQS hit expressibility walls, even on low-entanglement states.

read the letter

The paper defines Walsh complexity as the spread of a state over the Walsh (parity) basis and shows that states with nearly flat spectra force exponential resources from any good approximant built from shallow additive feed-forward networks with tame activations. The key example is a dimerized state made by one layer of disjoint controlled-Z gates: it has only short-range entanglement and a trivial tensor-network form, yet its Walsh spectrum is maximal. Full-cube numerical fits then track how polynomial activations require logarithmic depth in N while tanh jumps at depth 3, matching the predicted bound inside the stated regime. That construction and the numerical consistency are the parts that actually land. The argument stays inside additive networks with subexponential parameters, so the claim is scoped rather than universal. The abstract leaves the full derivations and fitting details implicit, which is the main soft spot; without seeing the error analysis and exact network definitions it is hard to judge how tight the thresholds really are. Still, nothing in the stated scope looks circular or contradictory. This is for people who build or analyze variational wavefunctions for condensed-matter problems and want a handle on depth beyond entanglement. It is worth sending to a serious referee because the new measure is well-motivated, the example is concrete and checkable, and the numerics give something falsifiable to test.

Referee Report

2 major / 3 minor

Summary. The manuscript introduces Walsh complexity as a basis-dependent measure quantifying how broadly a many-body wavefunction spreads over parity patterns in the Walsh basis. It argues that shallow additive feed-forward networks with tame activations (polynomials with subexponential parameter counts) cannot achieve high Walsh complexity, using an explicit dimerized state prepared by a single layer of disjoint controlled-Z gates as a counterexample: this state has only short-range entanglement and a simple tensor-network representation yet possesses maximal Walsh complexity. Numerical full-cube fits across system sizes and network depths are reported to be consistent with the bound, showing that polynomial activations require logarithmic depth in N while tanh activations exhibit a sharp threshold at depth 3.

Significance. If the central claims hold, Walsh complexity supplies a useful complementary axis to entanglement entropy for diagnosing expressibility limits of additive neural quantum states. The concrete dimerized-CZ construction demonstrates that low-entanglement states can still demand depth, and the numerical consistency across sizes provides falsifiable support for the scaling predictions. This strengthens the case for architecture-specific expressibility diagnostics in variational quantum many-body methods.

major comments (2)

[§4] §4, dimerized-state construction: the claim that the Walsh spectrum is maximal (uniform) is load-bearing for the expressibility argument, yet the manuscript provides only the preparation circuit without an explicit Walsh-transform calculation or closed-form spectrum; this step must be supplied to confirm the state indeed saturates the bound independently of entanglement measures.
[Numerical fits] Numerical fits section (around the full-cube results): the reported consistency with the logarithmic-depth bound for polynomials and the depth-3 jump for tanh lacks tabulated error metrics, fit residuals, or the precise functional form used for the complexity scaling; without these, it is difficult to assess whether the numerics genuinely corroborate the analytic limitation or merely show qualitative agreement.

minor comments (3)

[Introduction] Introduction: the phrase 'tame regime' (polynomial activations, subexponential parameters) is used before being defined; add an explicit definition and contrast with 'wild' activations at first use.
[§3] Notation: the definition of Walsh complexity (Eq. (3) or equivalent) should include a brief reminder that it is basis-dependent, to avoid reader confusion with entanglement-based measures.
[Figures] Figure captions: the Walsh-spectrum plots would benefit from an overlay of the uniform (maximal-complexity) distribution for direct visual comparison with the dimerized state.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive report and positive assessment of the work. We address each major comment below and will revise the manuscript accordingly to improve clarity and rigor.

read point-by-point responses

Referee: [§4] §4, dimerized-state construction: the claim that the Walsh spectrum is maximal (uniform) is load-bearing for the expressibility argument, yet the manuscript provides only the preparation circuit without an explicit Walsh-transform calculation or closed-form spectrum; this step must be supplied to confirm the state indeed saturates the bound independently of entanglement measures.

Authors: We agree that an explicit derivation is required. In the revised manuscript we will insert a new subsection in §4 that computes the Walsh transform of the dimerized state in closed form. The calculation proceeds by observing that each disjoint CZ gate multiplies selected computational-basis states by −1, which in the Walsh (parity) basis corresponds to a uniform phase shift across all 2^N patterns; the resulting spectrum is exactly flat with magnitude 2^{−N/2}. This derivation is independent of the short-range entanglement structure and directly confirms saturation of the complexity bound. revision: yes
Referee: [Numerical fits] Numerical fits section (around the full-cube results): the reported consistency with the logarithmic-depth bound for polynomials and the depth-3 jump for tanh lacks tabulated error metrics, fit residuals, or the precise functional form used for the complexity scaling; without these, it is difficult to assess whether the numerics genuinely corroborate the analytic limitation or merely show qualitative agreement.

Authors: We acknowledge the need for quantitative documentation. The revised version will add a supplementary table (and corresponding text) reporting mean-squared errors, maximum residuals, and the exact functional forms fitted: for polynomial activations we use C(N,d) = a log_2(N) + b with fitted coefficients and R^2 values; for tanh we document the step-like transition at depth 3 together with the associated error metrics. These additions will allow readers to judge the strength of the numerical support for the analytic bounds. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation introduces Walsh complexity directly from the Walsh-Hadamard basis expansion of the wavefunction and proves an expressivity bound for shallow additive feed-forward networks by analyzing how polynomial or tanh activations compose under addition and limited depth. The key supporting construction is an explicit dimerized state generated by disjoint CZ gates whose Walsh spectrum is computed to be maximal, providing an independent counter-example rather than a fitted or self-referential result. Numerical full-cube fits are described only as consistency checks with the analytic bound, not as the source of the bound itself. No self-citations, ansatzes, or redefinitions reduce the central claim to its own inputs; the argument remains self-contained within the restricted class of additive architectures and tame activations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the newly introduced definition of Walsh complexity and standard assumptions about feed-forward network architectures and activation functions.

axioms (1)

standard math Properties of the Walsh transform as an orthogonal basis for functions over the hypercube
Invoked to define the spread of the wavefunction over parity patterns.

invented entities (1)

Walsh complexity no independent evidence
purpose: Quantify breadth of wavefunction over parity patterns
Newly defined measure introduced in the paper.

pith-pipeline@v0.9.0 · 5492 in / 1315 out tokens · 99353 ms · 2026-05-14T23:00:29.786000+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 5 internal anchors

[1]

Solving the Quantum Many-Body Problem with Artificial Neural Networks

G. Carleo and M. Troyer, Solving the quantum many- body problem with artificial neural networks, Science 355, 602 (2017), arXiv:1606.02318 [cond-mat.dis-nn]

work page internal anchor Pith review Pith/arXiv arXiv 2017
[2]

D.-L. Deng, X. Li, and S. Das Sarma, Quantum entanglement in neural network states, Physical Review X7, 021021 (2017), arXiv:1701.04844 [cond-mat.dis-nn]

work page internal anchor Pith review Pith/arXiv arXiv 2017
[3]

Sharir, A

O. Sharir, A. Shashua, and G. Carleo, Neural tensor contractions and the expressive power of deep neural quantum states, Physical Review B106, 205136 (2022), arXiv:2103.10293 [quant-ph]

work page arXiv 2022
[4]

Efficient Representation of Quantum Many-body States with Deep Neural Networks

X. Gao and L.-M. Duan, Efficient representation of quantum many-body states with deep neural networks, Nature Communications8, 662 (2017), arXiv:1701.05039 [cond-mat.dis-nn]

work page internal anchor Pith review Pith/arXiv arXiv 2017
[5]

Jia, Y.-H

Z.-A. Jia, Y.-H. Zhang, Y.-C. Wu, L. Kong, G.- C. Guo, and G.-P. Guo, Efficient machine-learning representations of a surface code with boundaries, defects, domain walls, and twists, Physical Review A99, 012307 (2019)

work page 2019
[6]

S. Lu, X. Gao, and L.-M. Duan, Efficient representa- tion of topologically ordered states with restricted boltz- mann machines, Physical Review B99, 155136 (2019), arXiv:1810.02352 [quant-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2019
[7]

Sprague and S

K. Sprague and S. Czischek, Variational monte carlo with large patched transformers, Communications Physics7, 90 (2024)

work page 2024
[8]

Near coefficient zeros the logarithm is singular and can worsen numerical stability during training, so the rewriting is not innocuous from the optimization viewpoint either [3, 24]

work page
[9]

Paul, Bound on entanglement in neural quantum states (2025), arXiv:2510.11797 [quant-ph]

N. Paul, Bound on entanglement in neural quantum states (2025), arXiv:2510.11797 [quant-ph]

work page arXiv 2025
[10]

O’Donnell,Analysis of Boolean Functions(Cambridge University Press, 2014)

R. O’Donnell,Analysis of Boolean Functions(Cambridge University Press, 2014)

work page 2014
[11]

Viola,The Complexity of Hardness Amplification and Derandomization, Ph.D

E. Viola,The Complexity of Hardness Amplification and Derandomization, Ph.D. thesis, Harvard University (2006), ph.D. thesis

work page 2006
[12]

E. Allender, Circuit complexity before the dawn of the new millennium, inFoundations of Software Technology and Theoretical Computer Science: 16th Conference, FSTTCS 1996, Hyderabad, India, December 18–20, 1996, Proceedings, Lecture Notes in Computer Science, Vol. 1180 (Springer, 1996) pp. 1–18, survey

work page 1996
[13]

L. Chen, Z. Lu, X. Lyu, and I. C. Oliveira, Majority vs. approximate linear sum and average-case complexity be- lowN C 1, in48th International Colloquium on Automata, Languages, and Programming (ICALP 2021), Leibniz In- ternational Proceedings in Informatics (LIPIcs), Vol. 198, edited by N. Bansal, E. Merelli, and J. Worrell (Schloss Dagstuhl – Leibniz-Ze...

work page 2021
[14]

Parham, Quantum circuit lower bounds in the magic hierarchy, arXiv preprint arXiv:2504.19966 (2025), arXiv:2504.19966 [quant-ph]

N. Parham, Quantum circuit lower bounds in the magic hierarchy, arXiv preprint arXiv:2504.19966 (2025), arXiv:2504.19966 [quant-ph]

work page arXiv 2025
[15]

A. A. Razborov and S. Rudich, Natural proofs, Journal of Computer and System Sciences55, 24 (1997)

work page 1997
[16]

M. Hein, J. Eisert, and H. J. Briegel, Multiparty entanglement in graph states, Physical Review A69, 062311 (2004), arXiv:quant-ph/0307130

work page internal anchor Pith review Pith/arXiv arXiv 2004
[17]

O. S. Rothaus, On “bent” functions, Journal of Combi- natorial Theory, Series A20, 300 (1976)

work page 1976
[18]

Canteaut and P

A. Canteaut and P. Charpin, Decomposing bent func- tions, IEEE Transactions on Information Theory49, 2004 (2003)

work page 2004
[19]

A. R. Barron, Universal approximation bounds for super- positions of a sigmoidal function, IEEE Transactions on Information Theory39, 930 (1993)

work page 1993
[20]

Bach, Breaking the curse of dimensionality with convex neural networks, Journal of Machine Learning Research18, 1 (2017)

F. Bach, Breaking the curse of dimensionality with convex neural networks, Journal of Machine Learning Research18, 1 (2017)

work page 2017
[21]

Hajnal, W

A. Hajnal, W. Maass, P. Pudl´ ak, M. Szegedy, and G. Tur´ an, Threshold circuits of bounded depth, Journal of Computer and System Sciences46, 129 (1993)

work page 1993
[22]

Amano, On the size of depth-two threshold circuits for the inner product mod 2 function, inLanguage and Automata Theory and Applications, Lecture Notes in Computer Science, Vol

K. Amano, On the size of depth-two threshold circuits for the inner product mod 2 function, inLanguage and Automata Theory and Applications, Lecture Notes in Computer Science, Vol. 12038 (Springer, 2020) pp. 235– 247

work page 2020
[23]

Krause and S

M. Krause and S. Lucks, Pseudorandom functions inT C 0 and cryptographic limitations to proving lower bounds, Computational Complexity10, 297 (2001)

work page 2001
[24]

Goodfellow, Y

I. Goodfellow, Y. Bengio, and A. Courville,Deep Learning(The MIT Press, 2016)

work page 2016