Recognition: no theorem link
Expressibility of neural quantum states: a Walsh-complexity perspective
Pith reviewed 2026-05-14 23:00 UTC · model grok-4.3
The pith
Shallow additive networks cannot efficiently represent quantum states with uniform Walsh spectra even when entanglement is short-ranged.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Walsh complexity measures how broadly a many-body wavefunction distributes its amplitude over parity (Walsh) basis functions. States whose Walsh spectrum is close to uniform require exponentially large resources from any good neural approximant. Shallow additive feed-forward networks operating in the tame regime, such as polynomial activations with subexponential parameter counts, cannot generate this resource; depth must grow at least logarithmically with system size N. The dimerized controlled-Z state provides an explicit counter-example that is simple by entanglement and tensor-network standards yet maximal by Walsh complexity.
What carries the argument
Walsh complexity, the basis-dependent measure of spread over parity patterns that forces exponential cost on shallow tame networks for uniform spectra.
If this is right
- Any state with a nearly uniform Walsh spectrum demands either exponential parameters or at least logarithmic depth from additive networks.
- The dimerized controlled-Z state serves as a minimal, short-range-entangled witness that depth is required beyond what entanglement measures detect.
- Polynomial activations succeed only when depth scales as log N; tanh activations cross a sharp threshold already at depth three.
- Walsh complexity supplies an expressibility axis orthogonal to entanglement for judging variational wavefunctions.
Where Pith is reading between the lines
- Architectures that add non-additive layers or use different activations may evade the logarithmic-depth barrier identified here.
- Walsh complexity could be applied to other variational families to reveal similar hidden depth costs in low-entanglement regimes.
- The same parity-spread measure might illuminate expressivity limits in classical learning tasks that rely on parity functions.
Load-bearing premise
Additive feed-forward networks with tame activations and subexponential parameter counts are the relevant class whose limitations bound the expressibility of neural quantum states.
What would settle it
A shallow additive network using polynomial activations and fewer than exponentially many parameters that reaches high fidelity on the dimerized controlled-Z state for large N would falsify the claimed depth requirement.
Figures
read the original abstract
Neural quantum states are powerful variational wavefunctions, but it remains unclear which many-body states can be represented efficiently by modern additive architectures. We introduce Walsh complexity, a basis-dependent measure of how broadly a wavefunction is spread over parity patterns. States with an almost uniform Walsh spectrum require exponentially large Walsh complexity from any good approximant. We show that shallow additive feed-forward networks cannot generate such complexity in the tame regime, e.g. polynomial activations with subexponential parameter scaling. As a concrete example, we construct a simple dimerized state prepared by a single layer of disjoint controlled-$Z$ gates. Although it has only short-range entanglement and a simple tensor-network description, its Walsh complexity is maximal. Full-cube fits across system size and depth are consistent with the complexity bound: for polynomial activations, successful fitting appears only once depth reaches a logarithmic scale in $N$, whereas activation saturation in $\tanh$ produces a sharp threshold-like jump already at depth $3$. Walsh complexity therefore provides an expressibility axis complementary to entanglement and clarifies when depth becomes an essential resource for additive neural quantum states.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Walsh complexity as a basis-dependent measure quantifying how broadly a many-body wavefunction spreads over parity patterns in the Walsh basis. It argues that shallow additive feed-forward networks with tame activations (polynomials with subexponential parameter counts) cannot achieve high Walsh complexity, using an explicit dimerized state prepared by a single layer of disjoint controlled-Z gates as a counterexample: this state has only short-range entanglement and a simple tensor-network representation yet possesses maximal Walsh complexity. Numerical full-cube fits across system sizes and network depths are reported to be consistent with the bound, showing that polynomial activations require logarithmic depth in N while tanh activations exhibit a sharp threshold at depth 3.
Significance. If the central claims hold, Walsh complexity supplies a useful complementary axis to entanglement entropy for diagnosing expressibility limits of additive neural quantum states. The concrete dimerized-CZ construction demonstrates that low-entanglement states can still demand depth, and the numerical consistency across sizes provides falsifiable support for the scaling predictions. This strengthens the case for architecture-specific expressibility diagnostics in variational quantum many-body methods.
major comments (2)
- [§4] §4, dimerized-state construction: the claim that the Walsh spectrum is maximal (uniform) is load-bearing for the expressibility argument, yet the manuscript provides only the preparation circuit without an explicit Walsh-transform calculation or closed-form spectrum; this step must be supplied to confirm the state indeed saturates the bound independently of entanglement measures.
- [Numerical fits] Numerical fits section (around the full-cube results): the reported consistency with the logarithmic-depth bound for polynomials and the depth-3 jump for tanh lacks tabulated error metrics, fit residuals, or the precise functional form used for the complexity scaling; without these, it is difficult to assess whether the numerics genuinely corroborate the analytic limitation or merely show qualitative agreement.
minor comments (3)
- [Introduction] Introduction: the phrase 'tame regime' (polynomial activations, subexponential parameters) is used before being defined; add an explicit definition and contrast with 'wild' activations at first use.
- [§3] Notation: the definition of Walsh complexity (Eq. (3) or equivalent) should include a brief reminder that it is basis-dependent, to avoid reader confusion with entanglement-based measures.
- [Figures] Figure captions: the Walsh-spectrum plots would benefit from an overlay of the uniform (maximal-complexity) distribution for direct visual comparison with the dimerized state.
Simulated Author's Rebuttal
We thank the referee for the constructive report and positive assessment of the work. We address each major comment below and will revise the manuscript accordingly to improve clarity and rigor.
read point-by-point responses
-
Referee: [§4] §4, dimerized-state construction: the claim that the Walsh spectrum is maximal (uniform) is load-bearing for the expressibility argument, yet the manuscript provides only the preparation circuit without an explicit Walsh-transform calculation or closed-form spectrum; this step must be supplied to confirm the state indeed saturates the bound independently of entanglement measures.
Authors: We agree that an explicit derivation is required. In the revised manuscript we will insert a new subsection in §4 that computes the Walsh transform of the dimerized state in closed form. The calculation proceeds by observing that each disjoint CZ gate multiplies selected computational-basis states by −1, which in the Walsh (parity) basis corresponds to a uniform phase shift across all 2^N patterns; the resulting spectrum is exactly flat with magnitude 2^{−N/2}. This derivation is independent of the short-range entanglement structure and directly confirms saturation of the complexity bound. revision: yes
-
Referee: [Numerical fits] Numerical fits section (around the full-cube results): the reported consistency with the logarithmic-depth bound for polynomials and the depth-3 jump for tanh lacks tabulated error metrics, fit residuals, or the precise functional form used for the complexity scaling; without these, it is difficult to assess whether the numerics genuinely corroborate the analytic limitation or merely show qualitative agreement.
Authors: We acknowledge the need for quantitative documentation. The revised version will add a supplementary table (and corresponding text) reporting mean-squared errors, maximum residuals, and the exact functional forms fitted: for polynomial activations we use C(N,d) = a log_2(N) + b with fitted coefficients and R^2 values; for tanh we document the step-like transition at depth 3 together with the associated error metrics. These additions will allow readers to judge the strength of the numerical support for the analytic bounds. revision: yes
Circularity Check
No significant circularity detected
full rationale
The derivation introduces Walsh complexity directly from the Walsh-Hadamard basis expansion of the wavefunction and proves an expressivity bound for shallow additive feed-forward networks by analyzing how polynomial or tanh activations compose under addition and limited depth. The key supporting construction is an explicit dimerized state generated by disjoint CZ gates whose Walsh spectrum is computed to be maximal, providing an independent counter-example rather than a fitted or self-referential result. Numerical full-cube fits are described only as consistency checks with the analytic bound, not as the source of the bound itself. No self-citations, ansatzes, or redefinitions reduce the central claim to its own inputs; the argument remains self-contained within the restricted class of additive architectures and tame activations.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Properties of the Walsh transform as an orthogonal basis for functions over the hypercube
invented entities (1)
-
Walsh complexity
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Solving the Quantum Many-Body Problem with Artificial Neural Networks
G. Carleo and M. Troyer, Solving the quantum many- body problem with artificial neural networks, Science 355, 602 (2017), arXiv:1606.02318 [cond-mat.dis-nn]
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[2]
D.-L. Deng, X. Li, and S. Das Sarma, Quantum entanglement in neural network states, Physical Review X7, 021021 (2017), arXiv:1701.04844 [cond-mat.dis-nn]
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [3]
-
[4]
Efficient Representation of Quantum Many-body States with Deep Neural Networks
X. Gao and L.-M. Duan, Efficient representation of quantum many-body states with deep neural networks, Nature Communications8, 662 (2017), arXiv:1701.05039 [cond-mat.dis-nn]
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [5]
-
[6]
S. Lu, X. Gao, and L.-M. Duan, Efficient representa- tion of topologically ordered states with restricted boltz- mann machines, Physical Review B99, 155136 (2019), arXiv:1810.02352 [quant-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[7]
K. Sprague and S. Czischek, Variational monte carlo with large patched transformers, Communications Physics7, 90 (2024)
work page 2024
-
[8]
Near coefficient zeros the logarithm is singular and can worsen numerical stability during training, so the rewriting is not innocuous from the optimization viewpoint either [3, 24]
-
[9]
Paul, Bound on entanglement in neural quantum states (2025), arXiv:2510.11797 [quant-ph]
N. Paul, Bound on entanglement in neural quantum states (2025), arXiv:2510.11797 [quant-ph]
-
[10]
O’Donnell,Analysis of Boolean Functions(Cambridge University Press, 2014)
R. O’Donnell,Analysis of Boolean Functions(Cambridge University Press, 2014)
work page 2014
-
[11]
Viola,The Complexity of Hardness Amplification and Derandomization, Ph.D
E. Viola,The Complexity of Hardness Amplification and Derandomization, Ph.D. thesis, Harvard University (2006), ph.D. thesis
work page 2006
-
[12]
E. Allender, Circuit complexity before the dawn of the new millennium, inFoundations of Software Technology and Theoretical Computer Science: 16th Conference, FSTTCS 1996, Hyderabad, India, December 18–20, 1996, Proceedings, Lecture Notes in Computer Science, Vol. 1180 (Springer, 1996) pp. 1–18, survey
work page 1996
-
[13]
L. Chen, Z. Lu, X. Lyu, and I. C. Oliveira, Majority vs. approximate linear sum and average-case complexity be- lowN C 1, in48th International Colloquium on Automata, Languages, and Programming (ICALP 2021), Leibniz In- ternational Proceedings in Informatics (LIPIcs), Vol. 198, edited by N. Bansal, E. Merelli, and J. Worrell (Schloss Dagstuhl – Leibniz-Ze...
work page 2021
-
[14]
N. Parham, Quantum circuit lower bounds in the magic hierarchy, arXiv preprint arXiv:2504.19966 (2025), arXiv:2504.19966 [quant-ph]
-
[15]
A. A. Razborov and S. Rudich, Natural proofs, Journal of Computer and System Sciences55, 24 (1997)
work page 1997
-
[16]
M. Hein, J. Eisert, and H. J. Briegel, Multiparty entanglement in graph states, Physical Review A69, 062311 (2004), arXiv:quant-ph/0307130
work page internal anchor Pith review Pith/arXiv arXiv 2004
-
[17]
O. S. Rothaus, On “bent” functions, Journal of Combi- natorial Theory, Series A20, 300 (1976)
work page 1976
-
[18]
A. Canteaut and P. Charpin, Decomposing bent func- tions, IEEE Transactions on Information Theory49, 2004 (2003)
work page 2004
-
[19]
A. R. Barron, Universal approximation bounds for super- positions of a sigmoidal function, IEEE Transactions on Information Theory39, 930 (1993)
work page 1993
-
[20]
F. Bach, Breaking the curse of dimensionality with convex neural networks, Journal of Machine Learning Research18, 1 (2017)
work page 2017
- [21]
-
[22]
K. Amano, On the size of depth-two threshold circuits for the inner product mod 2 function, inLanguage and Automata Theory and Applications, Lecture Notes in Computer Science, Vol. 12038 (Springer, 2020) pp. 235– 247
work page 2020
-
[23]
M. Krause and S. Lucks, Pseudorandom functions inT C 0 and cryptographic limitations to proving lower bounds, Computational Complexity10, 297 (2001)
work page 2001
-
[24]
I. Goodfellow, Y. Bengio, and A. Courville,Deep Learning(The MIT Press, 2016)
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.