On a Central Limit Theorem and Sanov's principle for quantum neural networks

Anderson Melchor Hernandez

arxiv: 2606.21721 · v1 · pith:X2TFVSP2new · submitted 2026-06-19 · 🪐 quant-ph · math-ph· math.MP· math.PR

On a Central Limit Theorem and Sanov's principle for quantum neural networks

Anderson Melchor Hernandez This is my paper

Pith reviewed 2026-06-26 13:39 UTC · model grok-4.3

classification 🪐 quant-ph math-phmath.MPmath.PR

keywords quantum neural networksmixture of expertscentral limit theoremSanov's principleneural tangent kernelgradient flowlarge deviationssupervised learning

0 comments

The pith

A quantum neural network's mixture of experts satisfies a central limit theorem and Sanov's principle as the number of experts diverges.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines the statistical fluctuations in a mixture of experts model produced by a quantum neural network trained with gradient flow on supervised learning tasks. It proves that in the limit of many experts, the empirical measure of the experts' parameters obeys both a central limit theorem for typical fluctuations and Sanov's principle for rare large deviations. These fluctuations are shown to obey a linear transport equation, while the overall model converges to a deterministic limit function whose time evolution is set by the network's neural tangent kernel. A sympathetic reader would care because the results give precise asymptotic control over how such quantum models scale, which could inform their design and reliability for large-scale applications.

Core claim

The paper establishes the Central Limit Theorem and Sanov's principle for an MoE generated by a quantum neural network as the number of experts diverges. The fluctuations of the empirical measure of its parameters around its corresponding limit probability measure solve a linear transport equation. As a byproduct, the MoE converges to a limit function which solves an evolution equation governed by the neural tangent kernel associated with the quantum neural network.

What carries the argument

The mixture of experts generated by the quantum neural network, whose empirical parameter measure converges to a limit probability measure whose fluctuations obey a linear transport equation.

If this is right

The fluctuations of the empirical measure solve a linear transport equation.
The mixture of experts converges to a limit function solving an evolution equation governed by the neural tangent kernel.
These limit theorems hold when the quantum neural network is trained via gradient flow on supervised learning problems.
Sanov's principle governs the large-deviation behavior of the empirical measure in this setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar scaling limits could be derived for other quantum architectures or loss functions beyond supervised gradient flow.
The transport equation might be used to predict finite-expert corrections or generalization error in practical quantum models.
The results suggest a route to compare quantum neural networks with their classical counterparts through shared mean-field and kernel structures.

Load-bearing premise

The empirical measure of the experts' parameters admits a well-defined limit probability measure as the number of experts diverges.

What would settle it

A numerical experiment on a trained quantum neural network showing that the variance or distribution of parameter fluctuations fails to satisfy the predicted linear transport equation once the number of experts exceeds a few hundred.

read the original abstract

In this work, we study the fluctuations of a Mixture of Experts (MoE) generated by a quantum neural network trained via gradient flow on supervised learning problems. Our main results establish the Central Limit Theorem (CLT), and Sanov's principle for an MoE as the number of experts diverges. We demonstrate that the fluctuations of the empirical measure of its parameters close to its corresponding limit probability measure solve a linear transport equation. As a byproduct, we show that the MoE converges to a limit function which solves an evolution equation governed by the neural tangent kernel associated with the quantum neural network.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper claims to extend the CLT and Sanov's principle to quantum neural network MoEs via gradient flow, but only the abstract is available so none of it can be checked.

read the letter

The main thing here is the claim that a mixture of experts from a quantum neural network obeys a central limit theorem and Sanov's principle as the number of experts diverges, with parameter fluctuations solving a linear transport equation and the overall model converging under a quantum neural tangent kernel.

What is new is the direct transfer of these classical large-deviation and NTK ideas to the quantum setting for supervised learning. The abstract states the results and the byproduct evolution equation without extra fluff.

The paper does well in naming the key objects (empirical measure, limit probability measure, transport equation) and tying them to gradient flow training.

The soft spots are obvious and central: we have no derivations, no definition of the quantum neural network, and no check on whether the empirical measure actually converges to something usable. The assumption that such a limit probability measure exists is taken as given, but that step is usually where these arguments live or die, and nothing lets us see if the quantum case introduces extra issues. Soundness cannot be assessed at all.

This is for specialists in theoretical quantum machine learning who already know the classical CLT/Sanov/NTK literature and want to see whether the extension works. A reader gets almost no value from the current text.

Send the full manuscript with proofs to a serious referee if it arrives; the topic is worth checking even if the present version is too thin to evaluate.

Referee Report

1 major / 0 minor

Summary. The paper studies fluctuations of a Mixture of Experts (MoE) generated by a quantum neural network trained via gradient flow on supervised learning problems. It claims to establish the Central Limit Theorem (CLT) and Sanov's principle for the MoE as the number of experts diverges, showing that fluctuations of the empirical measure of parameters solve a linear transport equation, and that the MoE converges to a limit function solving an evolution equation governed by the neural tangent kernel associated with the quantum neural network.

Significance. If the claimed CLT, Sanov's principle, transport equation for fluctuations, and NTK-governed limit evolution hold with rigorous proofs, the work would contribute to the theoretical analysis of scaling limits and fluctuations in quantum neural networks and MoE architectures. This could inform understanding of convergence and generalization in quantum machine learning. However, with only the abstract available and no access to derivations, assumptions, or proofs, the actual significance cannot be evaluated.

major comments (1)

The full manuscript text is not available (only the abstract is provided), so no derivations, assumptions, or proofs can be checked for gaps, consistency with the stated claims, or validity of the CLT/Sanov application, linear transport equation, or NTK evolution. This prevents any technical assessment of the central results.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review of our manuscript. The primary concern is the apparent unavailability of the full text for technical assessment. We address this below and confirm that the complete paper with all derivations is accessible.

read point-by-point responses

Referee: The full manuscript text is not available (only the abstract is provided), so no derivations, assumptions, or proofs can be checked for gaps, consistency with the stated claims, or validity of the CLT/Sanov application, linear transport equation, or NTK evolution. This prevents any technical assessment of the central results.

Authors: The complete manuscript, including all assumptions, derivations, and proofs of the CLT, Sanov's principle, the linear transport equation for fluctuations, and the NTK-governed limit, is publicly available on arXiv at arXiv:2606.21721. It appears the referee may have encountered an access limitation that restricted visibility to the abstract only. We are happy to provide the full PDF directly to the referee or editor to enable a full technical evaluation. revision: no

Circularity Check

0 steps flagged

No circularity identified; full text unavailable for analysis

full rationale

The query provides only the abstract and notes that the full manuscript text is available in an external cacheable tool description which is not present here. Without the paper's equations, derivations, self-citations, or parameter-fitting steps, no load-bearing reductions to inputs can be quoted or exhibited. The abstract describes standard applications of CLT and Sanov's principle to an MoE limit without any visible self-definitional or fitted-input structure. This is the expected honest non-finding when source material for inspection is absent.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no free parameters, axioms, or invented entities can be extracted.

pith-pipeline@v0.9.1-grok · 5621 in / 1038 out tokens · 21710 ms · 2026-06-26T13:39:36.948756+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references

[1]

Ambrosio, N

L. Ambrosio, N. Gigli, and G. Savaré,Gradient flows: in metric spaces and in the space of probability measures, Springer Science & Business Media, 2008. [3]D. Araújo, R. I. Oliveira, and D. Yukimura,A mean-field limit for certain deep neural networks, 2019

2008
[2]

Biamonte, P

J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd,Quantum machine learning, Nature, 549 (2017), pp. 195–202

2017
[3]

Brezis and H

H. Brezis and H. Brézis,Functional analysis, Sobolev spaces and partial differential equations, vol. 2, Springer, 2011

2011
[4]

Cerezo, A

M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J. Coles,Cost function dependent barren plateaus in shallow parametrized quantum circuits, Nature communications, 12 (2021), p. 1791

2021
[5]

L. P. Cinelli, M. A. Marins, E. A. B. Da Silva, and S. L. Netto,Variational methods for machine learning with applications to deep networks, vol. 15, Springer, 2021. [8]D. Cioranescu and P. Donato,An introduction to homogenization, Oxford university press, 1999

2021
[6]

D. A. Dawson and J. Gärtner,Large deviations from the McKean-Vlasov limit for weakly interacting diffusions, Stochastics, 20 (1987), pp. 247–308. [10]F. De Lima Marquezino, R. Portugal, and C. Lavor,A primer on quantum computing, Springer, 2019. 22

1987
[7]

Dembo and O

A. Dembo and O. Zeitouni,Large deviations techniques and applications (1998), Applications of Mathematics, 38 (2011)

1998
[8]

Ferland, X

R. Ferland, X. Fernique, and G. Giroux,Compactness of the fluctuations associated with some generalized nonlinear boltzmann equations, Canadian journal of mathematics, 44 (1992), pp. 1192–1205

1992
[9]

Girardi and G

F. Girardi and G. De Palma,Trained quantum neural networks are gaussian processes, Communications in Mathematical Physics, 406 (2025)

2025
[10]

Graham,Mckean-vlasov itô-skorohod equations, and nonlinear diffusions with discrete jump sets, Stochastic processes and their applications, 40 (1992), pp

C. Graham,Mckean-vlasov itô-skorohod equations, and nonlinear diffusions with discrete jump sets, Stochastic processes and their applications, 40 (1992), pp. 69–82

1992
[11]

Graham, T

C. Graham, T. G. Kurtz, S. Méléard, P. E. Protter, M. Pulvirenti, D. Talay, and S. Méléard,Asymptotic behaviour of some interacting particle systems; mckean-vlasov and boltzmann models, Probabilistic Models for Nonlinear Partial Differential Equations: Lectures given at the 1st Session of the Centro Internazionale Matematico Estivo (CIME) held in Montecat...

1995
[12]

A. M. Hernandez, D. Pastorello, and G. De Palma,Mean-field limit from general mixtures of experts to quantum neural networks, Lett. Math. Phys., 116 (2026), pp. Paper No. 42, 23

2026
[13]

B. T. Kiani, G. De Palma, M. Marvian, Z.-W. Liu, and S. Lloyd,Learning quantum data with the quantum earth mover’s distance, Quantum Science and Technology, 7 (2022), p. 045002

2022
[14]

A. V. Kolesnikov and M. Röckner,On continuity equations in infinite dimensions with non-gaussian reference measure, Journal of Functional Analysis, 266 (2014), pp. 4490–4537

2014
[15]

Larocca, S

M. Larocca, S. Thanasilp, S. W ang, K. Sharma, J. Biamonte, P. J. Coles, L. Cincio, J. R. McClean, Z. Holmes, and M. Cerezo,A review of barren plateaus in variational quantum computing, arXiv preprint arXiv:2405.00781, (2024)

arXiv 2024
[16]

Lloyd, M

S. Lloyd, M. Schuld, A. Ijaz, J. Izaac, and N. Killoran,Quantum embeddings for machine learning, arXiv preprint arXiv:2001.03622, (2020)

arXiv 2001
[17]

Y. Lu, C. Ma, Y. Lu, J. Lu, and L. Ying,A mean-field analysis of deep resnet and beyond: Towards provable optimization via overparameterization from depth, 2020

2020
[18]

S. Mei, T. Misiakiewicz, and A. Montanari,Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit, 2019

2019
[19]

Melchor Hernandez, F

A. Melchor Hernandez, F. Girardi, D. Pastorello, and G. De Palma,Quantitative convergence of trained quantum neural networks to a gaussian process: A. melchor hernandez et al., in Annales Henri Poincaré, Springer, 2025, pp. 1–57

2025
[20]

Melchor Hernandez, D

A. Melchor Hernandez, D. Pastorello, and G. De Palma,Efficient classical computation of the neural tangent kernel of quantum neural networks, Quantum, 10 (2026), p. 2118. [25]P.-M. Nguyen,Mean field limit of the learning dynamics of multilayer neural networks, 2019

2026
[21]

Nguyen and H

P.-M. Nguyen and H. T. Pham,A rigorous framework for the mean field limit of multilayer neural networks, Mathematical Statistics and Learning, 6 (2023), pp. 201–357. [27]V. M. Panaretos and Y. Zemel,An invitation to statistics in Wasserstein space, Springer Nature, 2020. [28]D. Pastorello,Concise guide to quantum machine learning, Springer, 2023

2023
[22]

Rotskoff and E

G. Rotskoff and E. V anden-Eijnden,Trainability and accuracy of artificial neural networks: An interacting particle system approach, Communications on Pure and Applied Mathematics, 75 (2022), p. 1889–1935. [30]F. Santambrogio,Optimal transport for applied mathematicians, Birkäuser, NY, 55 (2015), p. 94. [31]M. Schuld and F. Petruccione,Supervised learning...

2022
[23]

Schuld, I

M. Schuld, I. Sinayskiy, and F. Petruccione,An introduction to quantum machine learning, Contemporary Physics, 56 (2015), pp. 172–185

2015
[24]

Schuld, R

M. Schuld, R. Sweke, and J. J. Meyer,Effect of data encoding on the expressive power of variational quantum- machine-learning models, Physical Review A, 103 (2021), p. 032430. [34]J. Sirignano and K. Spiliopoulos,Mean field analysis of deep neural networks, 2021

2021
[25]

Sznitman,Topics in propagation of chaos, Ecole d’été de probabilités de Saint-Flour XIX—1989, 1464 (1991), pp

A.-S. Sznitman,Topics in propagation of chaos, Ecole d’été de probabilités de Saint-Flour XIX—1989, 1464 (1991), pp. 165–251. (A. Melchor Hernandez)Dipartimento di Matematica, Via Zamboni, 33, 40126, Bologna (Italy) Email address:anderson.melchor@unibo.it 23

1989

[1] [1]

Ambrosio, N

L. Ambrosio, N. Gigli, and G. Savaré,Gradient flows: in metric spaces and in the space of probability measures, Springer Science & Business Media, 2008. [3]D. Araújo, R. I. Oliveira, and D. Yukimura,A mean-field limit for certain deep neural networks, 2019

2008

[2] [2]

Biamonte, P

J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd,Quantum machine learning, Nature, 549 (2017), pp. 195–202

2017

[3] [3]

Brezis and H

H. Brezis and H. Brézis,Functional analysis, Sobolev spaces and partial differential equations, vol. 2, Springer, 2011

2011

[4] [4]

Cerezo, A

M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J. Coles,Cost function dependent barren plateaus in shallow parametrized quantum circuits, Nature communications, 12 (2021), p. 1791

2021

[5] [5]

L. P. Cinelli, M. A. Marins, E. A. B. Da Silva, and S. L. Netto,Variational methods for machine learning with applications to deep networks, vol. 15, Springer, 2021. [8]D. Cioranescu and P. Donato,An introduction to homogenization, Oxford university press, 1999

2021

[6] [6]

D. A. Dawson and J. Gärtner,Large deviations from the McKean-Vlasov limit for weakly interacting diffusions, Stochastics, 20 (1987), pp. 247–308. [10]F. De Lima Marquezino, R. Portugal, and C. Lavor,A primer on quantum computing, Springer, 2019. 22

1987

[7] [7]

Dembo and O

A. Dembo and O. Zeitouni,Large deviations techniques and applications (1998), Applications of Mathematics, 38 (2011)

1998

[8] [8]

Ferland, X

R. Ferland, X. Fernique, and G. Giroux,Compactness of the fluctuations associated with some generalized nonlinear boltzmann equations, Canadian journal of mathematics, 44 (1992), pp. 1192–1205

1992

[9] [9]

Girardi and G

F. Girardi and G. De Palma,Trained quantum neural networks are gaussian processes, Communications in Mathematical Physics, 406 (2025)

2025

[10] [10]

Graham,Mckean-vlasov itô-skorohod equations, and nonlinear diffusions with discrete jump sets, Stochastic processes and their applications, 40 (1992), pp

C. Graham,Mckean-vlasov itô-skorohod equations, and nonlinear diffusions with discrete jump sets, Stochastic processes and their applications, 40 (1992), pp. 69–82

1992

[11] [11]

Graham, T

C. Graham, T. G. Kurtz, S. Méléard, P. E. Protter, M. Pulvirenti, D. Talay, and S. Méléard,Asymptotic behaviour of some interacting particle systems; mckean-vlasov and boltzmann models, Probabilistic Models for Nonlinear Partial Differential Equations: Lectures given at the 1st Session of the Centro Internazionale Matematico Estivo (CIME) held in Montecat...

1995

[12] [12]

A. M. Hernandez, D. Pastorello, and G. De Palma,Mean-field limit from general mixtures of experts to quantum neural networks, Lett. Math. Phys., 116 (2026), pp. Paper No. 42, 23

2026

[13] [13]

B. T. Kiani, G. De Palma, M. Marvian, Z.-W. Liu, and S. Lloyd,Learning quantum data with the quantum earth mover’s distance, Quantum Science and Technology, 7 (2022), p. 045002

2022

[14] [14]

A. V. Kolesnikov and M. Röckner,On continuity equations in infinite dimensions with non-gaussian reference measure, Journal of Functional Analysis, 266 (2014), pp. 4490–4537

2014

[15] [15]

Larocca, S

M. Larocca, S. Thanasilp, S. W ang, K. Sharma, J. Biamonte, P. J. Coles, L. Cincio, J. R. McClean, Z. Holmes, and M. Cerezo,A review of barren plateaus in variational quantum computing, arXiv preprint arXiv:2405.00781, (2024)

arXiv 2024

[16] [16]

Lloyd, M

S. Lloyd, M. Schuld, A. Ijaz, J. Izaac, and N. Killoran,Quantum embeddings for machine learning, arXiv preprint arXiv:2001.03622, (2020)

arXiv 2001

[17] [17]

Y. Lu, C. Ma, Y. Lu, J. Lu, and L. Ying,A mean-field analysis of deep resnet and beyond: Towards provable optimization via overparameterization from depth, 2020

2020

[18] [18]

S. Mei, T. Misiakiewicz, and A. Montanari,Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit, 2019

2019

[19] [19]

Melchor Hernandez, F

A. Melchor Hernandez, F. Girardi, D. Pastorello, and G. De Palma,Quantitative convergence of trained quantum neural networks to a gaussian process: A. melchor hernandez et al., in Annales Henri Poincaré, Springer, 2025, pp. 1–57

2025

[20] [20]

Melchor Hernandez, D

A. Melchor Hernandez, D. Pastorello, and G. De Palma,Efficient classical computation of the neural tangent kernel of quantum neural networks, Quantum, 10 (2026), p. 2118. [25]P.-M. Nguyen,Mean field limit of the learning dynamics of multilayer neural networks, 2019

2026

[21] [21]

Nguyen and H

P.-M. Nguyen and H. T. Pham,A rigorous framework for the mean field limit of multilayer neural networks, Mathematical Statistics and Learning, 6 (2023), pp. 201–357. [27]V. M. Panaretos and Y. Zemel,An invitation to statistics in Wasserstein space, Springer Nature, 2020. [28]D. Pastorello,Concise guide to quantum machine learning, Springer, 2023

2023

[22] [22]

Rotskoff and E

G. Rotskoff and E. V anden-Eijnden,Trainability and accuracy of artificial neural networks: An interacting particle system approach, Communications on Pure and Applied Mathematics, 75 (2022), p. 1889–1935. [30]F. Santambrogio,Optimal transport for applied mathematicians, Birkäuser, NY, 55 (2015), p. 94. [31]M. Schuld and F. Petruccione,Supervised learning...

2022

[23] [23]

Schuld, I

M. Schuld, I. Sinayskiy, and F. Petruccione,An introduction to quantum machine learning, Contemporary Physics, 56 (2015), pp. 172–185

2015

[24] [24]

Schuld, R

M. Schuld, R. Sweke, and J. J. Meyer,Effect of data encoding on the expressive power of variational quantum- machine-learning models, Physical Review A, 103 (2021), p. 032430. [34]J. Sirignano and K. Spiliopoulos,Mean field analysis of deep neural networks, 2021

2021

[25] [25]

Sznitman,Topics in propagation of chaos, Ecole d’été de probabilités de Saint-Flour XIX—1989, 1464 (1991), pp

A.-S. Sznitman,Topics in propagation of chaos, Ecole d’été de probabilités de Saint-Flour XIX—1989, 1464 (1991), pp. 165–251. (A. Melchor Hernandez)Dipartimento di Matematica, Via Zamboni, 33, 40126, Bologna (Italy) Email address:anderson.melchor@unibo.it 23

1989