pith. sign in

arxiv: 2508.04498 · v2 · pith:DAQ5JBATnew · submitted 2025-08-06 · 🪐 quant-ph · math-ph· math.MP

Efficient classical computation of the neural tangent kernel of quantum neural networks

Pith reviewed 2026-05-22 13:36 UTC · model grok-4.3

classification 🪐 quant-ph math-phmath.MP
keywords neural tangent kernelquantum neural networksclassical simulationClifford groupGaussian processesquantum advantagePauli Hamiltonians
0
0 comments X

The pith

The neural tangent kernel of Clifford-interleaved quantum neural networks equals an average over four discrete Clifford configurations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that for quantum neural networks built from arbitrary Clifford unitaries interleaved with parametric gates generated by Pauli Hamiltonians, the continuous average over random initialization parameters in the neural tangent kernel definition can be replaced exactly by an average over four specific Clifford gates. The replacement turns the kernel computation into an efficiently classically simulable task. When combined with prior results that equate wide quantum networks to Gaussian processes, the method yields the expected output of a wide trained network without running the quantum circuit. This directly implies that networks in this family cannot deliver a quantum advantage over classical computation in the wide limit.

Core claim

For quantum neural networks consisting of arbitrary Clifford unitaries interleaved with parametric gates generated by Pauli Hamiltonians, the neural tangent kernel can be exactly computed by replacing the continuous average over initialization parameters with an average over four discrete Clifford configurations of the parametric gates. This reduction permits efficient classical simulation of the associated circuit.

What carries the argument

The exact replacement of the continuous initialization-parameter average in the NTK by a four-point average over Clifford configurations of the parametric gates.

If this is right

  • The NTK for this class of networks becomes computable by classical Clifford-circuit simulation techniques.
  • The expected output of a wide, trained network equals the corresponding Gaussian-process prediction and can therefore be obtained classically.
  • Networks of this architecture cannot achieve quantum advantage in the wide, trained regime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The result isolates a concrete regime in which quantum neural networks lose any potential advantage once they become wide enough to enter the Gaussian-process limit.
  • Analogous discrete reductions may exist for other parametric gate families if their initialization distributions admit similar finite-support equivalences.
  • Classical computation of the NTK supplies a practical benchmark that any claimed quantum advantage for these networks must beat.

Load-bearing premise

The continuous average over initialization parameters in the NTK can be replaced exactly by an average over four discrete Clifford configurations for arbitrary Clifford unitaries interleaved with Pauli-generated parametric gates.

What would settle it

Direct numerical comparison, on a small explicit circuit of the given form, between the NTK value obtained from the continuous parameter average and the value obtained from the proposed four-point Clifford average; any nonzero discrepancy would falsify the reduction.

read the original abstract

We propose an efficient classical algorithm to estimate the Neural Tangent Kernel (NTK) associated with a broad class of quantum neural networks. These networks consist of arbitrary unitary operators belonging to the Clifford group interleaved with parametric gates given by the time evolution generated by an arbitrary Hamiltonian belonging to the Pauli group. The proposed algorithm leverages a key insight: the average over the distribution of initialization parameters in the NTK definition can be exactly replaced by an average over just four discrete values, chosen such that the corresponding parametric gates are Clifford operations. This reduction enables an efficient classical simulation of the circuit. Combined with recent results establishing the equivalence between wide quantum neural networks and Gaussian processes [Girardi \emph{et al.}, Comm. Math. Phys. 406, 92 (2025); Melchor Hernandez \emph{et al.}, Ann. Henri Poincar{\'e} (2025)], our method enables efficient computation of the expected output of wide, trained quantum neural networks, and therefore shows that such networks cannot achieve quantum advantage.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes an efficient classical algorithm to compute the Neural Tangent Kernel (NTK) for quantum neural networks consisting of arbitrary Clifford unitaries interleaved with parametric gates generated by Pauli Hamiltonians. The central technical claim is that the continuous average over initialization parameters in the NTK can be exactly replaced by a discrete average over four Clifford configurations. Combined with cited results on the equivalence of wide QNNs to Gaussian processes, the work concludes that the expected output of wide trained networks in this class is classically computable and therefore such networks cannot achieve quantum advantage.

Significance. If the reduction is exact, the result would be significant: it supplies a polynomial-time classical procedure for the NTK of a broad family of QNNs and, via the GP correspondence, for the trained-network statistics. This would constitute a concrete, architecture-specific demonstration that quantum advantage is absent in the wide limit for this model class. The use of Clifford-group properties to achieve an exact discretization is a technically attractive feature.

major comments (1)
  1. [Abstract] Abstract (paragraph describing the algorithm): The claim that the multi-dimensional average over independent initialization parameters 'can be exactly replaced by an average over just four discrete values' is load-bearing for both the efficiency and the 'no quantum advantage' conclusion. While a 4-point quadrature is exact for each individual trigonometric term of degree ≤2, the manuscript must show that applying the same four global Clifford configurations across all parametric gates preserves exact equality with the product measure. The provided description does not address whether this uniform choice introduces correlations absent from independent parameter sampling; an explicit derivation or counter-example check for circuits containing two or more parametric gates under arbitrary Clifford interleaving is required.
minor comments (2)
  1. The manuscript should explicitly state the maximum number of parametric gates for which the reduction is claimed to hold exactly, and clarify whether any additional assumptions on the interleaving pattern are needed.
  2. Add a short self-contained recap of the key steps from the two cited GP-equivalence papers so that the no-advantage claim can be assessed without external lookup.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and constructive review. The point raised about the multi-parameter discretization is important for rigor, and we will revise the manuscript to provide the requested explicit derivation and verification.

read point-by-point responses
  1. Referee: The claim that the multi-dimensional average over independent initialization parameters 'can be exactly replaced by an average over just four discrete values' is load-bearing for both the efficiency and the 'no quantum advantage' conclusion. While a 4-point quadrature is exact for each individual trigonometric term of degree ≤2, the manuscript must show that applying the same four global Clifford configurations across all parametric gates preserves exact equality with the product measure. The provided description does not address whether this uniform choice introduces correlations absent from independent parameter sampling; an explicit derivation or counter-example check for circuits containing two or more parametric gates under arbitrary Clifford interleaving is required.

    Authors: We agree that the abstract and main text would benefit from a more explicit treatment of the multi-gate case. The reduction is exact because the NTK for this architecture, after expanding the relevant derivatives and traces, yields expectation values that are multilinear in the individual parameter-dependent factors. The Clifford interleaving ensures that the four global configurations (corresponding to the four Clifford elements that realize the exact 4-point quadrature for each Pauli-generated rotation) can be applied uniformly while reproducing the product measure; no spurious correlations are introduced because the group action commutes with the independent averaging in the relevant matrix elements. Nevertheless, we acknowledge that the current manuscript does not spell out this argument for n>1 parametric gates. In the revision we will add a dedicated subsection (or appendix) containing (i) the general derivation showing equivalence of the discrete and continuous averages for arbitrary numbers of interleaved parametric gates, and (ii) an explicit worked example plus numerical check for a two-gate circuit under a non-trivial Clifford interleaver, confirming agreement to machine precision. revision: yes

Circularity Check

1 steps flagged

Minor self-citation for GP equivalence in no-advantage implication; core NTK algorithm independent

specific steps
  1. self citation load bearing [Abstract (final sentence)]
    "Combined with recent results establishing the equivalence between wide quantum neural networks and Gaussian processes [Girardi et al., Comm. Math. Phys. 406, 92 (2025); Melchor Hernandez et al., Ann. Henri Poincaré (2025)], our method enables efficient computation of the expected output of wide, trained quantum neural networks, and therefore shows that such networks cannot achieve quantum advantage."

    The strongest claim (no quantum advantage for wide trained QNNs) is justified solely by invoking the wide-QNN-to-GP equivalence from prior work whose author sets overlap with the present paper. This makes the no-advantage implication dependent on the cited equivalences rather than derived from the new NTK algorithm alone.

full rationale

The paper's central technical contribution is an algorithm that replaces the continuous initialization-parameter average in the NTK with a discrete average over four Clifford configurations for the described circuit class. This reduction is derived from the trigonometric structure of the relevant derivatives and is presented as a self-contained classical-simulation insight without reference to fitted parameters or prior self-results. The only self-citation occurs in the final sentence of the abstract, where the NTK method is combined with two external papers (one overlapping in authorship) to reach the no-quantum-advantage conclusion. Per the evaluation rules, a non-load-bearing self-citation supporting a downstream implication rather than the core derivation itself produces only minor circularity burden. No self-definitional, fitted-input, or ansatz-smuggling patterns appear in the described chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on two external results for the Gaussian-process equivalence and on the unstated but load-bearing claim that the four discrete points exactly reproduce the continuous average for any interleaving of Clifford and Pauli-parametric gates.

axioms (1)
  • domain assumption Wide quantum neural networks of the stated form are equivalent to Gaussian processes whose kernel is the neural tangent kernel (cited from Girardi et al. and Melchor Hernandez et al.).
    Invoked in the final paragraph to translate NTK computation into a statement about trained-network outputs.

pith-pipeline@v0.9.0 · 5712 in / 1346 out tokens · 31761 ms · 2026-05-22T13:36:47.238095+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 1 internal anchor

  1. [1]

    A primer on quantum computing

    Franklin De Lima Marquezino, Renato Portugal, and Carlile Lavor. A primer on quantum computing. Springer, 2019

  2. [2]

    An introduction to quantum machine learning

    Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione. An introduction to quantum machine learning. Contemporary Physics, 56(2):172–185, 2015

  3. [3]

    Concise guide to quantum machine learning

    Davide Pastorello. Concise guide to quantum machine learning . Springer, 2023

  4. [4]

    Quantum machine learning

    Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, and Seth Lloyd. Quantum machine learning. Nature, 549(7671):195–202, 2017

  5. [5]

    Girardi and G

    Filippo Girardi and Giacomo De Palma. Trained quantum neural networks are gaussian processes. arXiv preprint arXiv:2402.08726 , 2024

  6. [6]

    Supervised learning with quantum computers , volume 17

    Maria Schuld and Francesco Petruccione. Supervised learning with quantum computers , volume 17. Springer, 2018

  7. [7]

    Effect of data encoding on the expressive power of variational quantum-machine-learning models

    Maria Schuld, Ryan Sweke, and Johannes Jakob Meyer. Effect of data encoding on the expressive power of variational quantum-machine-learning models. Physical Review A , 103(3):032430, 2021

  8. [8]

    Quantum embeddings for machine learning,

    Seth Lloyd, Maria Schuld, Aroosa Ijaz, Josh Izaac, and Nathan Killoran. Quantum em- beddings for machine learning. arXiv preprint arXiv:2001.03622 , 2020

  9. [9]

    A rigorous and robust quan- tum speed-up in supervised machine learning

    Yunchao Liu, Srinivasan Arunachalam, and Kristan Temme. A rigorous and robust quan- tum speed-up in supervised machine learning. Nature Physics, 17(9):1013–1017, 2021

  10. [10]

    Supervised learning with quantum-enhanced feature spaces

    Vojtˇ ech Havl´ ıˇ cek, Antonio D C´ orcoles, Kristan Temme, Aram W Harrow, Abhinav Kan- dala, Jerry M Chow, and Jay M Gambetta. Supervised learning with quantum-enhanced feature spaces. Nature, 567(7747):209–212, 2019

  11. [11]

    Variational methods for machine learning with applications to deep networks, volume 15

    Lucas Pinheiro Cinelli, Matheus Ara´ ujo Marins, Eduardo Antonio Barros Da Silva, and S´ ergio Lima Netto. Variational methods for machine learning with applications to deep networks, volume 15. Springer, 2021. 24

  12. [12]

    Quantum lazy training

    Erfan Abedi, Salman Beigi, and Leila Taghavi. Quantum lazy training. Quantum, 7:989, 2023

  13. [13]

    Quantitative convergence of trained quantum neural networks to a gaussian process, 2024

    Anderson Melchor Hernandez, Filippo Girardi, Davide Pastorello, and Giacomo De Palma. Quantitative convergence of trained quantum neural networks to a gaussian process, 2024

  14. [14]

    Quantum tangent kernel

    Norihito Shirai, Kenji Kubo, Kosuke Mitarai, and Keisuke Fujii. Quantum tangent kernel. Phys. Rev. Res., 6(3):033179, 2024

  15. [15]

    Glick, Liang Jiang, and Antonio Mezzacapo

    Junyu Liu, Francesco Tacchino, Jennifer R. Glick, Liang Jiang, and Antonio Mezzacapo. Representation learning via quantum neural tangent kernels. PRX Quantum , 3:030323, Aug 2022

  16. [16]

    Expressibility- induced concentration of quantum neural tangent kernels

    Li-Wei Yu, Weikang Li, Qi Ye, Zhide Lu, Zizhao Han, and Dong-Ling Deng. Expressibility- induced concentration of quantum neural tangent kernels. Reports on Progress in Physics, 87(11):110501, oct 2024

  17. [17]

    Towards prac- tical quantum neural network diagnostics with neural tangent kernels

    Francesco Scala, Christa Zoufal, Dario Gerace, and Francesco Tacchino. Towards prac- tical quantum neural network diagnostics with neural tangent kernels. arXiv preprint arXiv:2503.01966, 2025

  18. [18]

    Neural tangent kernel: Convergence and generalization in neural networks

    Arthur Jacot, Franck Gabriel, and Clement Hongler. Neural tangent kernel: Convergence and generalization in neural networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grau- man, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018

  19. [19]

    Improved simulation of stabilizer circuits

    Scott Aaronson and Daniel Gottesman. Improved simulation of stabilizer circuits. Physical Review A, 70(5), November 2004

  20. [20]

    Angrisani, A

    Armando Angrisani, Alexander Schmidhuber, Manuel S. Rudolph, M. Cerezo, Zo¨ e Holmes, and Hsin-Yuan Huang. Classically estimating observables of noiseless quantum circuits. arXiv preprint arXiv:2409.01706 , 2024

  21. [21]

    Improved simulation of quantum circuits dominated by free fermionic operations

    Oliver Reardon-Smith, Micha l Oszmaniec, and Kamil Korzekwa. Improved simulation of quantum circuits dominated by free fermionic operations. Quantum, 8:1549, December 2024

  22. [22]

    Classical simulability of quantum circuits with shallow magic depth

    Yifan Zhang and Yuxuan Zhang. Classical simulability of quantum circuits with shallow magic depth. PRX Quantum, 6:010337, Feb 2025

  23. [23]

    Efficient simulation of parametrized quantum circuits under nonunital noise through pauli backpropagation

    Victor Martinez, Armando Angrisani, Ekaterina Pankovets, Omar Fawzi, and Daniel Stilck Fran¸ ca. Efficient simulation of parametrized quantum circuits under nonunital noise through pauli backpropagation. Phys. Rev. Lett., 134:250602, Jun 2025

  24. [24]

    Cerezo, Martin Larocca, Diego Garc´ ıa-Mart´ ın, N

    M. Cerezo, Martin Larocca, Diego Garc´ ıa-Mart´ ın, N. L. Diaz, Paolo Braccia, Enrico Fontana, Manuel S. Rudolph, Pablo Bermejo, Aroosa Ijaz, Supanut Thanasilp, Eric R. Anschuetz, and Zo¨ e Holmes. Does provable absence of barren plateaus imply classical simulability? or, why we need to rethink variational quantum computing, 2024

  25. [25]

    The clifford theory of the n-qubit clifford group

    Kieran Mastel. The clifford theory of the n-qubit clifford group. arXiv preprint arXiv:2307.05810, 2023

  26. [26]

    Measuring analytic gradients of general quantum evolution with the stochastic parameter shift rule

    Leonardo Banchi and Gavin E Crooks. Measuring analytic gradients of general quantum evolution with the stochastic parameter shift rule. Quantum, 5:386, 2021

  27. [27]

    Locally best unbiased estimates

    EW34003 Barankin. Locally best unbiased estimates. The Annals of Mathematical Statis- tics, 20(4):477–501, 1949

  28. [28]

    The theory of unbiased estimation

    Paul R Halmos. The theory of unbiased estimation. The Annals of Mathematical Statistics, 17(1):34–43, 1946

  29. [29]

    Joel A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, 12(4):389–434, August 2011

  30. [30]

    The missing factor in hoeffding’s inequalities

    Michel Talagrand. The missing factor in hoeffding’s inequalities. Annales de l’IHP Proba- bilit´ es et statistiques, 31(4):689–702, 1995. 25

  31. [31]

    Clifford group, stabilizer states, and linear and quadratic operations over gf(2)

    Jeroen Dehaene and Bart De Moor. Clifford group, stabilizer states, and linear and quadratic operations over gf(2). Physical Review A, 68(4), October 2003

  32. [32]

    Gaussian elimination is not optimal

    Volker Strassen. Gaussian elimination is not optimal. Numerische mathematik, 13(4):354– 356, 1969

  33. [33]

    Joel A. Tropp. An introduction to matrix concentration inequalities, 2015. 26