Efficient classical computation of the neural tangent kernel of quantum neural networks
Pith reviewed 2026-05-22 13:36 UTC · model grok-4.3
The pith
The neural tangent kernel of Clifford-interleaved quantum neural networks equals an average over four discrete Clifford configurations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For quantum neural networks consisting of arbitrary Clifford unitaries interleaved with parametric gates generated by Pauli Hamiltonians, the neural tangent kernel can be exactly computed by replacing the continuous average over initialization parameters with an average over four discrete Clifford configurations of the parametric gates. This reduction permits efficient classical simulation of the associated circuit.
What carries the argument
The exact replacement of the continuous initialization-parameter average in the NTK by a four-point average over Clifford configurations of the parametric gates.
If this is right
- The NTK for this class of networks becomes computable by classical Clifford-circuit simulation techniques.
- The expected output of a wide, trained network equals the corresponding Gaussian-process prediction and can therefore be obtained classically.
- Networks of this architecture cannot achieve quantum advantage in the wide, trained regime.
Where Pith is reading between the lines
- The result isolates a concrete regime in which quantum neural networks lose any potential advantage once they become wide enough to enter the Gaussian-process limit.
- Analogous discrete reductions may exist for other parametric gate families if their initialization distributions admit similar finite-support equivalences.
- Classical computation of the NTK supplies a practical benchmark that any claimed quantum advantage for these networks must beat.
Load-bearing premise
The continuous average over initialization parameters in the NTK can be replaced exactly by an average over four discrete Clifford configurations for arbitrary Clifford unitaries interleaved with Pauli-generated parametric gates.
What would settle it
Direct numerical comparison, on a small explicit circuit of the given form, between the NTK value obtained from the continuous parameter average and the value obtained from the proposed four-point Clifford average; any nonzero discrepancy would falsify the reduction.
read the original abstract
We propose an efficient classical algorithm to estimate the Neural Tangent Kernel (NTK) associated with a broad class of quantum neural networks. These networks consist of arbitrary unitary operators belonging to the Clifford group interleaved with parametric gates given by the time evolution generated by an arbitrary Hamiltonian belonging to the Pauli group. The proposed algorithm leverages a key insight: the average over the distribution of initialization parameters in the NTK definition can be exactly replaced by an average over just four discrete values, chosen such that the corresponding parametric gates are Clifford operations. This reduction enables an efficient classical simulation of the circuit. Combined with recent results establishing the equivalence between wide quantum neural networks and Gaussian processes [Girardi \emph{et al.}, Comm. Math. Phys. 406, 92 (2025); Melchor Hernandez \emph{et al.}, Ann. Henri Poincar{\'e} (2025)], our method enables efficient computation of the expected output of wide, trained quantum neural networks, and therefore shows that such networks cannot achieve quantum advantage.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes an efficient classical algorithm to compute the Neural Tangent Kernel (NTK) for quantum neural networks consisting of arbitrary Clifford unitaries interleaved with parametric gates generated by Pauli Hamiltonians. The central technical claim is that the continuous average over initialization parameters in the NTK can be exactly replaced by a discrete average over four Clifford configurations. Combined with cited results on the equivalence of wide QNNs to Gaussian processes, the work concludes that the expected output of wide trained networks in this class is classically computable and therefore such networks cannot achieve quantum advantage.
Significance. If the reduction is exact, the result would be significant: it supplies a polynomial-time classical procedure for the NTK of a broad family of QNNs and, via the GP correspondence, for the trained-network statistics. This would constitute a concrete, architecture-specific demonstration that quantum advantage is absent in the wide limit for this model class. The use of Clifford-group properties to achieve an exact discretization is a technically attractive feature.
major comments (1)
- [Abstract] Abstract (paragraph describing the algorithm): The claim that the multi-dimensional average over independent initialization parameters 'can be exactly replaced by an average over just four discrete values' is load-bearing for both the efficiency and the 'no quantum advantage' conclusion. While a 4-point quadrature is exact for each individual trigonometric term of degree ≤2, the manuscript must show that applying the same four global Clifford configurations across all parametric gates preserves exact equality with the product measure. The provided description does not address whether this uniform choice introduces correlations absent from independent parameter sampling; an explicit derivation or counter-example check for circuits containing two or more parametric gates under arbitrary Clifford interleaving is required.
minor comments (2)
- The manuscript should explicitly state the maximum number of parametric gates for which the reduction is claimed to hold exactly, and clarify whether any additional assumptions on the interleaving pattern are needed.
- Add a short self-contained recap of the key steps from the two cited GP-equivalence papers so that the no-advantage claim can be assessed without external lookup.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The point raised about the multi-parameter discretization is important for rigor, and we will revise the manuscript to provide the requested explicit derivation and verification.
read point-by-point responses
-
Referee: The claim that the multi-dimensional average over independent initialization parameters 'can be exactly replaced by an average over just four discrete values' is load-bearing for both the efficiency and the 'no quantum advantage' conclusion. While a 4-point quadrature is exact for each individual trigonometric term of degree ≤2, the manuscript must show that applying the same four global Clifford configurations across all parametric gates preserves exact equality with the product measure. The provided description does not address whether this uniform choice introduces correlations absent from independent parameter sampling; an explicit derivation or counter-example check for circuits containing two or more parametric gates under arbitrary Clifford interleaving is required.
Authors: We agree that the abstract and main text would benefit from a more explicit treatment of the multi-gate case. The reduction is exact because the NTK for this architecture, after expanding the relevant derivatives and traces, yields expectation values that are multilinear in the individual parameter-dependent factors. The Clifford interleaving ensures that the four global configurations (corresponding to the four Clifford elements that realize the exact 4-point quadrature for each Pauli-generated rotation) can be applied uniformly while reproducing the product measure; no spurious correlations are introduced because the group action commutes with the independent averaging in the relevant matrix elements. Nevertheless, we acknowledge that the current manuscript does not spell out this argument for n>1 parametric gates. In the revision we will add a dedicated subsection (or appendix) containing (i) the general derivation showing equivalence of the discrete and continuous averages for arbitrary numbers of interleaved parametric gates, and (ii) an explicit worked example plus numerical check for a two-gate circuit under a non-trivial Clifford interleaver, confirming agreement to machine precision. revision: yes
Circularity Check
Minor self-citation for GP equivalence in no-advantage implication; core NTK algorithm independent
specific steps
-
self citation load bearing
[Abstract (final sentence)]
"Combined with recent results establishing the equivalence between wide quantum neural networks and Gaussian processes [Girardi et al., Comm. Math. Phys. 406, 92 (2025); Melchor Hernandez et al., Ann. Henri Poincaré (2025)], our method enables efficient computation of the expected output of wide, trained quantum neural networks, and therefore shows that such networks cannot achieve quantum advantage."
The strongest claim (no quantum advantage for wide trained QNNs) is justified solely by invoking the wide-QNN-to-GP equivalence from prior work whose author sets overlap with the present paper. This makes the no-advantage implication dependent on the cited equivalences rather than derived from the new NTK algorithm alone.
full rationale
The paper's central technical contribution is an algorithm that replaces the continuous initialization-parameter average in the NTK with a discrete average over four Clifford configurations for the described circuit class. This reduction is derived from the trigonometric structure of the relevant derivatives and is presented as a self-contained classical-simulation insight without reference to fitted parameters or prior self-results. The only self-citation occurs in the final sentence of the abstract, where the NTK method is combined with two external papers (one overlapping in authorship) to reach the no-quantum-advantage conclusion. Per the evaluation rules, a non-load-bearing self-citation supporting a downstream implication rather than the core derivation itself produces only minor circularity burden. No self-definitional, fitted-input, or ansatz-smuggling patterns appear in the described chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Wide quantum neural networks of the stated form are equivalent to Gaussian processes whose kernel is the neural tangent kernel (cited from Girardi et al. and Melchor Hernandez et al.).
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Combined with recent results establishing the equivalence between wide quantum neural networks and Gaussian processes ... shows that such networks cannot achieve quantum advantage
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Franklin De Lima Marquezino, Renato Portugal, and Carlile Lavor. A primer on quantum computing. Springer, 2019
work page 2019
-
[2]
An introduction to quantum machine learning
Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione. An introduction to quantum machine learning. Contemporary Physics, 56(2):172–185, 2015
work page 2015
-
[3]
Concise guide to quantum machine learning
Davide Pastorello. Concise guide to quantum machine learning . Springer, 2023
work page 2023
-
[4]
Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, and Seth Lloyd. Quantum machine learning. Nature, 549(7671):195–202, 2017
work page 2017
-
[5]
Filippo Girardi and Giacomo De Palma. Trained quantum neural networks are gaussian processes. arXiv preprint arXiv:2402.08726 , 2024
work page internal anchor Pith review arXiv 2024
-
[6]
Supervised learning with quantum computers , volume 17
Maria Schuld and Francesco Petruccione. Supervised learning with quantum computers , volume 17. Springer, 2018
work page 2018
-
[7]
Effect of data encoding on the expressive power of variational quantum-machine-learning models
Maria Schuld, Ryan Sweke, and Johannes Jakob Meyer. Effect of data encoding on the expressive power of variational quantum-machine-learning models. Physical Review A , 103(3):032430, 2021
work page 2021
-
[8]
Quantum embeddings for machine learning,
Seth Lloyd, Maria Schuld, Aroosa Ijaz, Josh Izaac, and Nathan Killoran. Quantum em- beddings for machine learning. arXiv preprint arXiv:2001.03622 , 2020
-
[9]
A rigorous and robust quan- tum speed-up in supervised machine learning
Yunchao Liu, Srinivasan Arunachalam, and Kristan Temme. A rigorous and robust quan- tum speed-up in supervised machine learning. Nature Physics, 17(9):1013–1017, 2021
work page 2021
-
[10]
Supervised learning with quantum-enhanced feature spaces
Vojtˇ ech Havl´ ıˇ cek, Antonio D C´ orcoles, Kristan Temme, Aram W Harrow, Abhinav Kan- dala, Jerry M Chow, and Jay M Gambetta. Supervised learning with quantum-enhanced feature spaces. Nature, 567(7747):209–212, 2019
work page 2019
-
[11]
Variational methods for machine learning with applications to deep networks, volume 15
Lucas Pinheiro Cinelli, Matheus Ara´ ujo Marins, Eduardo Antonio Barros Da Silva, and S´ ergio Lima Netto. Variational methods for machine learning with applications to deep networks, volume 15. Springer, 2021. 24
work page 2021
-
[12]
Erfan Abedi, Salman Beigi, and Leila Taghavi. Quantum lazy training. Quantum, 7:989, 2023
work page 2023
-
[13]
Quantitative convergence of trained quantum neural networks to a gaussian process, 2024
Anderson Melchor Hernandez, Filippo Girardi, Davide Pastorello, and Giacomo De Palma. Quantitative convergence of trained quantum neural networks to a gaussian process, 2024
work page 2024
-
[14]
Norihito Shirai, Kenji Kubo, Kosuke Mitarai, and Keisuke Fujii. Quantum tangent kernel. Phys. Rev. Res., 6(3):033179, 2024
work page 2024
-
[15]
Glick, Liang Jiang, and Antonio Mezzacapo
Junyu Liu, Francesco Tacchino, Jennifer R. Glick, Liang Jiang, and Antonio Mezzacapo. Representation learning via quantum neural tangent kernels. PRX Quantum , 3:030323, Aug 2022
work page 2022
-
[16]
Expressibility- induced concentration of quantum neural tangent kernels
Li-Wei Yu, Weikang Li, Qi Ye, Zhide Lu, Zizhao Han, and Dong-Ling Deng. Expressibility- induced concentration of quantum neural tangent kernels. Reports on Progress in Physics, 87(11):110501, oct 2024
work page 2024
-
[17]
Towards prac- tical quantum neural network diagnostics with neural tangent kernels
Francesco Scala, Christa Zoufal, Dario Gerace, and Francesco Tacchino. Towards prac- tical quantum neural network diagnostics with neural tangent kernels. arXiv preprint arXiv:2503.01966, 2025
-
[18]
Neural tangent kernel: Convergence and generalization in neural networks
Arthur Jacot, Franck Gabriel, and Clement Hongler. Neural tangent kernel: Convergence and generalization in neural networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grau- man, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018
work page 2018
-
[19]
Improved simulation of stabilizer circuits
Scott Aaronson and Daniel Gottesman. Improved simulation of stabilizer circuits. Physical Review A, 70(5), November 2004
work page 2004
-
[20]
Armando Angrisani, Alexander Schmidhuber, Manuel S. Rudolph, M. Cerezo, Zo¨ e Holmes, and Hsin-Yuan Huang. Classically estimating observables of noiseless quantum circuits. arXiv preprint arXiv:2409.01706 , 2024
-
[21]
Improved simulation of quantum circuits dominated by free fermionic operations
Oliver Reardon-Smith, Micha l Oszmaniec, and Kamil Korzekwa. Improved simulation of quantum circuits dominated by free fermionic operations. Quantum, 8:1549, December 2024
work page 2024
-
[22]
Classical simulability of quantum circuits with shallow magic depth
Yifan Zhang and Yuxuan Zhang. Classical simulability of quantum circuits with shallow magic depth. PRX Quantum, 6:010337, Feb 2025
work page 2025
-
[23]
Victor Martinez, Armando Angrisani, Ekaterina Pankovets, Omar Fawzi, and Daniel Stilck Fran¸ ca. Efficient simulation of parametrized quantum circuits under nonunital noise through pauli backpropagation. Phys. Rev. Lett., 134:250602, Jun 2025
work page 2025
-
[24]
Cerezo, Martin Larocca, Diego Garc´ ıa-Mart´ ın, N
M. Cerezo, Martin Larocca, Diego Garc´ ıa-Mart´ ın, N. L. Diaz, Paolo Braccia, Enrico Fontana, Manuel S. Rudolph, Pablo Bermejo, Aroosa Ijaz, Supanut Thanasilp, Eric R. Anschuetz, and Zo¨ e Holmes. Does provable absence of barren plateaus imply classical simulability? or, why we need to rethink variational quantum computing, 2024
work page 2024
-
[25]
The clifford theory of the n-qubit clifford group
Kieran Mastel. The clifford theory of the n-qubit clifford group. arXiv preprint arXiv:2307.05810, 2023
-
[26]
Measuring analytic gradients of general quantum evolution with the stochastic parameter shift rule
Leonardo Banchi and Gavin E Crooks. Measuring analytic gradients of general quantum evolution with the stochastic parameter shift rule. Quantum, 5:386, 2021
work page 2021
-
[27]
Locally best unbiased estimates
EW34003 Barankin. Locally best unbiased estimates. The Annals of Mathematical Statis- tics, 20(4):477–501, 1949
work page 1949
-
[28]
The theory of unbiased estimation
Paul R Halmos. The theory of unbiased estimation. The Annals of Mathematical Statistics, 17(1):34–43, 1946
work page 1946
-
[29]
Joel A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, 12(4):389–434, August 2011
work page 2011
-
[30]
The missing factor in hoeffding’s inequalities
Michel Talagrand. The missing factor in hoeffding’s inequalities. Annales de l’IHP Proba- bilit´ es et statistiques, 31(4):689–702, 1995. 25
work page 1995
-
[31]
Clifford group, stabilizer states, and linear and quadratic operations over gf(2)
Jeroen Dehaene and Bart De Moor. Clifford group, stabilizer states, and linear and quadratic operations over gf(2). Physical Review A, 68(4), October 2003
work page 2003
-
[32]
Gaussian elimination is not optimal
Volker Strassen. Gaussian elimination is not optimal. Numerische mathematik, 13(4):354– 356, 1969
work page 1969
-
[33]
Joel A. Tropp. An introduction to matrix concentration inequalities, 2015. 26
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.