Mean-field limit from general mixtures of experts to quantum neural networks

Anderson Melchor Hernandez; Davide Pastorello; Giacomo De Palma

arxiv: 2501.14660 · v2 · submitted 2025-01-24 · 🧮 math-ph · cs.LG· math.MP· math.PR

Mean-field limit from general mixtures of experts to quantum neural networks

Anderson Melchor Hernandez , Davide Pastorello , Giacomo De Palma This is my paper

Pith reviewed 2026-05-23 05:30 UTC · model grok-4.3

classification 🧮 math-ph cs.LGmath.MPmath.PR

keywords mixture of expertspropagation of chaosmean-field limitnonlinear continuity equationgradient flowquantum neural networkssupervised learning

0 comments

The pith

Mixtures of experts trained by gradient flow exhibit propagation of chaos, with parameter empirical measures converging to a nonlinear continuity equation at a rate depending only on expert count.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that for a mixture of experts in a supervised learning setting, as the number of experts grows large the individual expert parameters behave collectively like samples from a deterministic limiting measure. This limiting measure satisfies a nonlinear continuity equation that arises from the gradient-flow dynamics on the loss. The authors supply an explicit rate of closeness between the finite-expert empirical measure and the limiting measure, and they show the same limit holds when the experts are produced by a quantum neural network. A reader would care because the result supplies a tractable infinite-expert description that can be used to understand or approximate the behavior of ever-larger expert ensembles without tracking every parameter.

Core claim

Our main result establishes the propagation of chaos for a MoE as the number of experts diverges. We demonstrate that the corresponding empirical measure of their parameters is close to a probability measure that solves a nonlinear continuity equation, and we provide an explicit convergence rate that depends solely on the number of experts. We apply our results to a MoE generated by a quantum neural network.

What carries the argument

The nonlinear continuity equation satisfied by the limiting probability measure of the expert parameters under the gradient-flow dynamics of the supervised loss.

If this is right

Finite mixtures of experts can be approximated quantitatively by the deterministic mean-field PDE instead of by direct simulation of every expert.
The error bound between the finite system and the limit depends only on the number of experts, giving a uniform control independent of other model details.
The same mean-field description applies when the experts are realized by a quantum neural network, allowing the limit to be used for quantum-generated ensembles.
Training dynamics of large expert systems remain consistent with the infinite-expert continuity equation once the number of experts is sufficiently large.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Solving the continuity equation numerically could predict the collective behavior of mixtures with thousands of experts without ever training the full finite system.
The same propagation-of-chaos argument might be checked for other training methods such as stochastic gradient descent if the required regularity can be verified.
Direct comparison of the predicted mean-field trajectory against actual training runs on hardware with increasing expert counts would provide a practical test of the rate.

Load-bearing premise

The supervised learning loss and the expert functions satisfy sufficient regularity so that the empirical measure converges to a solution of the nonlinear continuity equation at the stated rate.

What would settle it

A numerical experiment in which the distance between the finite-expert empirical measure and the solution of the continuity equation fails to shrink at the explicit rate when the number of experts is increased while all other parameters are held fixed.

read the original abstract

In this work, we study the asymptotic behavior of Mixture of Experts (MoE) trained via gradient flow on supervised learning problems. Our main result establishes the propagation of chaos for a MoE as the number of experts diverges. We demonstrate that the corresponding empirical measure of their parameters is close to a probability measure that solves a nonlinear continuity equation, and we provide an explicit convergence rate that depends solely on the number of experts. We apply our results to a MoE generated by a quantum neural network.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proves propagation of chaos for MoE gradient flows with an explicit rate depending only on the number of experts and verifies the conditions hold for a quantum neural network construction.

read the letter

The main result here is a quantitative mean-field limit for mixtures of experts under gradient flow on supervised losses. As the number of experts grows, the empirical measure on parameters converges to a solution of a nonlinear continuity equation, and the rate depends only on the expert count rather than other parameters. They then check that a quantum neural network MoE satisfies the needed regularity (bounded derivatives, Lipschitz conditions) so the same limit applies directly to that setting. This is the part worth noting: an explicit rate plus the quantum specialization, both absent from the earlier mean-field neural network literature they cite. The derivation follows standard propagation-of-chaos techniques but is carried through cleanly for the general MoE case. The assumptions on the loss and expert maps are listed explicitly, and the quantum example is shown to meet them without extra fitting. That keeps the argument self-contained. The main limitation is that everything rests on the gradient-flow dynamics and the regularity hypotheses; if those fail in a concrete training run the rate does not apply. No circularity or hidden self-reference appears in the continuity equation or the estimate. The paper is for readers working on mean-field limits in machine learning or on quantum neural networks who want a scaling law they can cite. It is a solid, self-contained mathematical contribution that deserves a serious referee; the proofs and assumption checks are the kind of thing that should be verified in review rather than rejected outright.

Referee Report

0 major / 2 minor

Summary. The paper studies the asymptotic behavior of Mixture of Experts (MoE) trained via gradient flow on supervised learning problems. Its main result establishes propagation of chaos as the number of experts diverges: the empirical measure of expert parameters converges to a probability measure solving a nonlinear continuity equation, with an explicit convergence rate depending only on the number of experts. The result is applied to an MoE generated by a quantum neural network, after verifying that the quantum construction satisfies the required regularity hypotheses.

Significance. If the central derivation holds, the work supplies a quantitative mean-field limit for general MoE gradient flows together with an explicit rate that depends solely on the number of experts. The explicit verification that the quantum-neural-network construction meets the Lipschitz and bounded-derivative hypotheses is a concrete strength, as is the derivation of the limiting nonlinear continuity equation from the finite-expert gradient-flow dynamics.

minor comments (2)

§2 (or the statement of the main theorem): the precise list of regularity assumptions on the loss and expert maps should be collected in one place rather than scattered across the hypotheses of several lemmas, to make the applicability check for the quantum case easier to follow.
Notation for the empirical measure and the limiting measure is introduced without a dedicated table or list of symbols; adding one would improve readability for readers outside the immediate subfield.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading and positive assessment of our manuscript on the mean-field limit and propagation of chaos for gradient-flow trained mixtures of experts, including the explicit rate and the verification for the quantum neural network construction. The recommendation of minor revision is noted. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity; standard mean-field convergence result

full rationale

The paper's central claim is a quantitative propagation-of-chaos theorem: as the number of experts N diverges, the empirical measure of parameters converges to a solution of a nonlinear continuity equation at a rate depending only on N. This follows from standard techniques for gradient flows on interacting particle systems once explicit regularity hypotheses (Lipschitz continuity, bounded derivatives of loss and expert maps) are imposed. The quantum-neural-network application consists solely of verifying that those hypotheses hold for the given construction. No equation reduces to a fitted quantity by construction, no load-bearing step relies on a self-citation chain, and the derivation remains independent of any particular data set or parameter values. The result is therefore self-contained against external mathematical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The result rests on standard regularity assumptions for gradient flow and continuity equations that are typical in mean-field analysis but not enumerated in the abstract.

axioms (1)

domain assumption The loss function and expert parameterizations are sufficiently regular for the gradient-flow dynamics to be well-defined and for the empirical measure to satisfy a nonlinear continuity equation in the limit.
Required for the propagation-of-chaos statement to hold.

pith-pipeline@v0.9.0 · 5612 in / 1192 out tokens · 34449 ms · 2026-05-23T05:30:10.010955+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 1 internal anchor

[1]

Araújo, R

D. Araújo, R. I. Oliveira, and D. Yukimura , A mean-ﬁeld limit for certain deep neural networks , 2019

work page 2019
[2]

Arjovsky, S

M. Arjovsky, S. Chintala, and L. Bottou , Wasserstein generative adversarial networks , in International conference on machine learning, PMLR, 2017, pp. 214–223

work page 2017
[3]

Berlyand and P.-E

L. Berlyand and P.-E. Jabin , Mathematics of deep learning: An introduction , de Gruyter, 2023

work page 2023
[4]

Biamonte, P

J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe , and S. Lloyd , Quantum machine learning , Nature, 549 (2017), pp. 195–202

work page 2017
[5]

Billingsley , Convergence of Probability Measures , Wiley Series in Probability and Statistics, John Wiley & So ns

P. Billingsley , Convergence of Probability Measures , Wiley Series in Probability and Statistics, John Wiley & So ns

work page
[6]

C. M. Bishop , Pattern recognition and machine learning , Springer google schola, 2 (2006), pp. 1122–1128

work page 2006
[7]

W. Cai, J. Jiang, F. W ang, J. Tang, S. Kim, and J. Huang , A survey on mixture of experts , 2024

work page 2024
[8]

Chaintron and A

L.-P. Chaintron and A. Diez , Propagation of chaos: A review of models, methods and applic ations. i. models and methods, Kinetic and Related Models, 15 (2022), p. 895

work page 2022
[9]

applications , Kinetic and Related Models, 15 (2022), p

, Propagation of chaos: A review of models, methods and applic ations ii. applications , Kinetic and Related Models, 15 (2022), p. 1017

work page 2022
[10]

Cheng, B

C. Cheng, B. Zhou, G. Ma, D. Wu, and Y. Yuan , Wasserstein distance based deep adversarial transfer lear ning for intelligent fault diagnosis with unlabeled or insuﬃcie nt labeled data , Neurocomputing, 409 (2020), pp. 35–45

work page 2020
[11]

L. P. Cinelli, M. A. Marins, E. A. B. Da Silva, and S. L. Netto , Variational methods for machine learning with applications to deep networks , vol. 15, Springer, 2021

work page 2021
[12]

De Lima Marquezino, R

F. De Lima Marquezino, R. Portugal, and C. Lavor , A primer on quantum computing , Springer, 2019

work page 2019
[13]

De Palma and D

G. De Palma and D. Trevisan , Quantum optimal transport with quantum channels , in Annales Henri Poincaré, vol. 22, Springer, 2021, pp. 3199–3234

work page 2021
[14]

Learning Factored Representations in a Deep Mixture of Experts

D. Eigen, M. Ranzato, and I. Sutskever , Learning factored representations in a deep mixture of expe rts, arXiv preprint arXiv:1312.4314, (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013
[15]

Erny , Well-posedness and propagation of chaos for mckean-vlasov equations with jumps and locally lipschitz coeﬃcients, 2022

X. Erny , Well-posedness and propagation of chaos for mckean-vlasov equations with jumps and locally lipschitz coeﬃcients, 2022

work page 2022
[16]

S. N. Evans and F. A. Matsen , The phylogenetic kantorovich–rubinstein metric for envir onmental sequence samples , Journal of the Royal Statistical Society Series B: Statisti cal Methodology, 74 (2012), pp. 569–592

work page 2012
[17]

Fournier and A

N. Fournier and A. Guillin , On the rate of convergence in wasserstein distance of the emp irical measure, 2013

work page 2013
[18]

Frogner, C

C. Frogner, C. Zhang, H. Mobahi, M. Araya, and T. A. Poggio , Learning with a wasserstein loss , Advances in neural information processing systems, 28 (2015)

work page 2015
[19]

Girardi and G

F. Girardi and G. De Palma , Trained quantum neural networks are gaussian processes , arXiv preprint arXiv:2402.08726, (2024)

work page arXiv 2024
[20]

Graham , Mckean-vlasov itô-skorohod equations, and nonlinear diﬀu sions with discrete jump sets , Stochastic pro- cesses and their applications, 40 (1992), pp

C. Graham , Mckean-vlasov itô-skorohod equations, and nonlinear diﬀu sions with discrete jump sets , Stochastic pro- cesses and their applications, 40 (1992), pp. 69–82

work page 1992
[21]

Graham, T

C. Graham, T. G. Kurtz, S. Méléard, P. E. Protter, M. Pulvirenti, D . Talay, and S. Méléard , Asymptotic behaviour of some interacting particle systems; mckean-vl asov and boltzmann models , Probabilistic Models for Nonlin- ear Partial Diﬀerential Equations: Lectures given at the 1s t Session of the Centro Internazionale Matematico Estivo (CIME) held in M...

work page 1995
[22]

Havlicek, A

V. Havlicek, A. D. Corcoles, K. Temme, A. W. Harrow, A. Kandala , J. M. Chow, and J. M. Gambetta , Supervised learning with quantum-enhanced feature spaces , Nature, 567 (2019), p. 209–212

work page 2019
[23]

A. M. Hernandez, F. Girardi, D. Pastorello, and G. D. Palma , Quantitative convergence of trained quantum neural networks to a gaussian process , 2024

work page 2024
[24]

R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton , Adaptive mixtures of local experts , Neural compu- tation, 3 (1991), pp. 79–87

work page 1991
[25]

L. V. Kantorovich , Mathematical methods of organizing and planning productio n, Management Science, 6 (1960), pp. 366–422

work page 1960
[26]

B. T. Kiani, G. De Palma, M. Marvian, Z.-W. Liu, and S. Lloyd , Learning quantum data with the quantum earth mover’s distance , Quantum Science and Technology, 7 (2022), p. 045002

work page 2022
[27]

Y. Liu, S. Arunachalam, and K. Temme , A rigorous and robust quantum speed-up in supervised machin e learning , Nature Physics, 17 (2021), pp. 1013–1017

work page 2021
[28]

Lloyd, M

S. Lloyd, M. Schuld, A. Ijaz, J. Izaac, and N. Killoran , Quantum embeddings for machine learning , arXiv preprint arXiv:2001.03622, (2020)

work page arXiv 2001
[29]

Y. Lu, C. Ma, Y. Lu, J. Lu, and L. Ying , A mean-ﬁeld analysis of deep resnet and beyond: Towards prov able optimization via overparameterization from depth , 2020

work page 2020
[30]

S. Mei, T. Misiakiewicz, and A. Montanari , Mean-ﬁeld theory of two-layers neural networks: dimension -free bounds and kernel limit , 2019

work page 2019
[31]

Nguyen , Mean ﬁeld limit of the learning dynamics of multilayer neura l networks , 2019

P.-M. Nguyen , Mean ﬁeld limit of the learning dynamics of multilayer neura l networks , 2019

work page 2019
[32]

Nguyen and H

P.-M. Nguyen and H. T. Pham , A rigorous framework for the mean ﬁeld limit of multilayer ne ural networks , Mathematical Statistics and Learning, 6 (2023), pp. 201–35 7

work page 2023
[33]

V. M. Panaretos and Y. Zemel , An invitation to statistics in Wasserstein space , Springer Nature, 2020

work page 2020
[34]

Pastorello, Concise guide to quantum machine learning , Springer, 2023

D. Pastorello, Concise guide to quantum machine learning , Springer, 2023

work page 2023
[35]

Peyré and M

G. Peyré and M. Cuturi , Computational optimal transport: With applications to dat a science , Foundations and Trends® in Machine Learning, 11 (2019), pp. 355–607. 14

work page 2019
[36]

S. T. Rachev, S. V. Stoyanov, and F. J. F abozzi , A probability metrics approach to ﬁnancial risk measures , John Wiley & Sons, 2011

work page 2011
[37]

Rasmussen and Z

C. Rasmussen and Z. Ghahramani , Inﬁnite mixtures of gaussian process experts , Advances in neural information processing systems, 14 (2001)

work page 2001
[38]

Rotskoff and E

G. Rotskoff and E. V anden-Eijnden, Trainability and accuracy of artiﬁcial neural networks: An interacting particle system approach, Communications on Pure and Applied Mathematics, 75 (2022) , p. 1889–1935

work page 2022
[39]

S. J. Russell and P. Norvig , Artiﬁcial intelligence: a modern approach , Pearson, 2016

work page 2016
[40]

Santambrogio , Optimal transport for applied mathematicians , Birkäuser, NY, 55 (2015), p

F. Santambrogio , Optimal transport for applied mathematicians , Birkäuser, NY, 55 (2015), p. 94

work page 2015
[41]

Schuld and F

M. Schuld and F. Petruccione , Supervised learning with quantum computers , vol. 17, Springer, 2018

work page 2018
[42]

Schuld, I

M. Schuld, I. Sinayskiy, and F. Petruccione , An introduction to quantum machine learning , Contemporary Physics, 56 (2015), pp. 172–185

work page 2015
[43]

Schuld, R

M. Schuld, R. Sweke, and J. J. Meyer , Eﬀect of data encoding on the expressive power of variationa l quantum- machine-learning models, Physical Review A, 103 (2021), p. 032430

work page 2021
[44]

Sirignano and K

J. Sirignano and K. Spiliopoulos , Mean ﬁeld analysis of deep neural networks , 2021

work page 2021
[45]

Sznitman , Topics in propagation of chaos , Ecole d’été de probabilités de Saint-Flour XIX—1989, 1464 (1991), pp

A.-S. Sznitman , Topics in propagation of chaos , Ecole d’été de probabilités de Saint-Flour XIX—1989, 1464 (1991), pp. 165–251

work page 1989
[46]

Villani , Optimal transport, vol

C. Villani , Optimal transport, vol. 338 of Grundlehren der mathematischen Wissenschafte n [Fundamental Principles of Mathematical Sciences], Springer-Verlag, Berlin, 2009 . Old and new. (A. Melchor Hernandez) Dipartimento di Matematica, Via Zamboni, 33, 40126, Bologn a (Italy) (D. Pastorello) Dipartimento di Matematica, Università di Bologna, Via Zamb on...

work page 2009

[1] [1]

Araújo, R

D. Araújo, R. I. Oliveira, and D. Yukimura , A mean-ﬁeld limit for certain deep neural networks , 2019

work page 2019

[2] [2]

Arjovsky, S

M. Arjovsky, S. Chintala, and L. Bottou , Wasserstein generative adversarial networks , in International conference on machine learning, PMLR, 2017, pp. 214–223

work page 2017

[3] [3]

Berlyand and P.-E

L. Berlyand and P.-E. Jabin , Mathematics of deep learning: An introduction , de Gruyter, 2023

work page 2023

[4] [4]

Biamonte, P

J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe , and S. Lloyd , Quantum machine learning , Nature, 549 (2017), pp. 195–202

work page 2017

[5] [5]

Billingsley , Convergence of Probability Measures , Wiley Series in Probability and Statistics, John Wiley & So ns

P. Billingsley , Convergence of Probability Measures , Wiley Series in Probability and Statistics, John Wiley & So ns

work page

[6] [6]

C. M. Bishop , Pattern recognition and machine learning , Springer google schola, 2 (2006), pp. 1122–1128

work page 2006

[7] [7]

W. Cai, J. Jiang, F. W ang, J. Tang, S. Kim, and J. Huang , A survey on mixture of experts , 2024

work page 2024

[8] [8]

Chaintron and A

L.-P. Chaintron and A. Diez , Propagation of chaos: A review of models, methods and applic ations. i. models and methods, Kinetic and Related Models, 15 (2022), p. 895

work page 2022

[9] [9]

applications , Kinetic and Related Models, 15 (2022), p

, Propagation of chaos: A review of models, methods and applic ations ii. applications , Kinetic and Related Models, 15 (2022), p. 1017

work page 2022

[10] [10]

Cheng, B

C. Cheng, B. Zhou, G. Ma, D. Wu, and Y. Yuan , Wasserstein distance based deep adversarial transfer lear ning for intelligent fault diagnosis with unlabeled or insuﬃcie nt labeled data , Neurocomputing, 409 (2020), pp. 35–45

work page 2020

[11] [11]

L. P. Cinelli, M. A. Marins, E. A. B. Da Silva, and S. L. Netto , Variational methods for machine learning with applications to deep networks , vol. 15, Springer, 2021

work page 2021

[12] [12]

De Lima Marquezino, R

F. De Lima Marquezino, R. Portugal, and C. Lavor , A primer on quantum computing , Springer, 2019

work page 2019

[13] [13]

De Palma and D

G. De Palma and D. Trevisan , Quantum optimal transport with quantum channels , in Annales Henri Poincaré, vol. 22, Springer, 2021, pp. 3199–3234

work page 2021

[14] [14]

Learning Factored Representations in a Deep Mixture of Experts

D. Eigen, M. Ranzato, and I. Sutskever , Learning factored representations in a deep mixture of expe rts, arXiv preprint arXiv:1312.4314, (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013

[15] [15]

Erny , Well-posedness and propagation of chaos for mckean-vlasov equations with jumps and locally lipschitz coeﬃcients, 2022

X. Erny , Well-posedness and propagation of chaos for mckean-vlasov equations with jumps and locally lipschitz coeﬃcients, 2022

work page 2022

[16] [16]

S. N. Evans and F. A. Matsen , The phylogenetic kantorovich–rubinstein metric for envir onmental sequence samples , Journal of the Royal Statistical Society Series B: Statisti cal Methodology, 74 (2012), pp. 569–592

work page 2012

[17] [17]

Fournier and A

N. Fournier and A. Guillin , On the rate of convergence in wasserstein distance of the emp irical measure, 2013

work page 2013

[18] [18]

Frogner, C

C. Frogner, C. Zhang, H. Mobahi, M. Araya, and T. A. Poggio , Learning with a wasserstein loss , Advances in neural information processing systems, 28 (2015)

work page 2015

[19] [19]

Girardi and G

F. Girardi and G. De Palma , Trained quantum neural networks are gaussian processes , arXiv preprint arXiv:2402.08726, (2024)

work page arXiv 2024

[20] [20]

Graham , Mckean-vlasov itô-skorohod equations, and nonlinear diﬀu sions with discrete jump sets , Stochastic pro- cesses and their applications, 40 (1992), pp

C. Graham , Mckean-vlasov itô-skorohod equations, and nonlinear diﬀu sions with discrete jump sets , Stochastic pro- cesses and their applications, 40 (1992), pp. 69–82

work page 1992

[21] [21]

Graham, T

C. Graham, T. G. Kurtz, S. Méléard, P. E. Protter, M. Pulvirenti, D . Talay, and S. Méléard , Asymptotic behaviour of some interacting particle systems; mckean-vl asov and boltzmann models , Probabilistic Models for Nonlin- ear Partial Diﬀerential Equations: Lectures given at the 1s t Session of the Centro Internazionale Matematico Estivo (CIME) held in M...

work page 1995

[22] [22]

Havlicek, A

V. Havlicek, A. D. Corcoles, K. Temme, A. W. Harrow, A. Kandala , J. M. Chow, and J. M. Gambetta , Supervised learning with quantum-enhanced feature spaces , Nature, 567 (2019), p. 209–212

work page 2019

[23] [23]

A. M. Hernandez, F. Girardi, D. Pastorello, and G. D. Palma , Quantitative convergence of trained quantum neural networks to a gaussian process , 2024

work page 2024

[24] [24]

R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton , Adaptive mixtures of local experts , Neural compu- tation, 3 (1991), pp. 79–87

work page 1991

[25] [25]

L. V. Kantorovich , Mathematical methods of organizing and planning productio n, Management Science, 6 (1960), pp. 366–422

work page 1960

[26] [26]

B. T. Kiani, G. De Palma, M. Marvian, Z.-W. Liu, and S. Lloyd , Learning quantum data with the quantum earth mover’s distance , Quantum Science and Technology, 7 (2022), p. 045002

work page 2022

[27] [27]

Y. Liu, S. Arunachalam, and K. Temme , A rigorous and robust quantum speed-up in supervised machin e learning , Nature Physics, 17 (2021), pp. 1013–1017

work page 2021

[28] [28]

Lloyd, M

S. Lloyd, M. Schuld, A. Ijaz, J. Izaac, and N. Killoran , Quantum embeddings for machine learning , arXiv preprint arXiv:2001.03622, (2020)

work page arXiv 2001

[29] [29]

Y. Lu, C. Ma, Y. Lu, J. Lu, and L. Ying , A mean-ﬁeld analysis of deep resnet and beyond: Towards prov able optimization via overparameterization from depth , 2020

work page 2020

[30] [30]

S. Mei, T. Misiakiewicz, and A. Montanari , Mean-ﬁeld theory of two-layers neural networks: dimension -free bounds and kernel limit , 2019

work page 2019

[31] [31]

Nguyen , Mean ﬁeld limit of the learning dynamics of multilayer neura l networks , 2019

P.-M. Nguyen , Mean ﬁeld limit of the learning dynamics of multilayer neura l networks , 2019

work page 2019

[32] [32]

Nguyen and H

P.-M. Nguyen and H. T. Pham , A rigorous framework for the mean ﬁeld limit of multilayer ne ural networks , Mathematical Statistics and Learning, 6 (2023), pp. 201–35 7

work page 2023

[33] [33]

V. M. Panaretos and Y. Zemel , An invitation to statistics in Wasserstein space , Springer Nature, 2020

work page 2020

[34] [34]

Pastorello, Concise guide to quantum machine learning , Springer, 2023

D. Pastorello, Concise guide to quantum machine learning , Springer, 2023

work page 2023

[35] [35]

Peyré and M

G. Peyré and M. Cuturi , Computational optimal transport: With applications to dat a science , Foundations and Trends® in Machine Learning, 11 (2019), pp. 355–607. 14

work page 2019

[36] [36]

S. T. Rachev, S. V. Stoyanov, and F. J. F abozzi , A probability metrics approach to ﬁnancial risk measures , John Wiley & Sons, 2011

work page 2011

[37] [37]

Rasmussen and Z

C. Rasmussen and Z. Ghahramani , Inﬁnite mixtures of gaussian process experts , Advances in neural information processing systems, 14 (2001)

work page 2001

[38] [38]

Rotskoff and E

G. Rotskoff and E. V anden-Eijnden, Trainability and accuracy of artiﬁcial neural networks: An interacting particle system approach, Communications on Pure and Applied Mathematics, 75 (2022) , p. 1889–1935

work page 2022

[39] [39]

S. J. Russell and P. Norvig , Artiﬁcial intelligence: a modern approach , Pearson, 2016

work page 2016

[40] [40]

Santambrogio , Optimal transport for applied mathematicians , Birkäuser, NY, 55 (2015), p

F. Santambrogio , Optimal transport for applied mathematicians , Birkäuser, NY, 55 (2015), p. 94

work page 2015

[41] [41]

Schuld and F

M. Schuld and F. Petruccione , Supervised learning with quantum computers , vol. 17, Springer, 2018

work page 2018

[42] [42]

Schuld, I

M. Schuld, I. Sinayskiy, and F. Petruccione , An introduction to quantum machine learning , Contemporary Physics, 56 (2015), pp. 172–185

work page 2015

[43] [43]

Schuld, R

M. Schuld, R. Sweke, and J. J. Meyer , Eﬀect of data encoding on the expressive power of variationa l quantum- machine-learning models, Physical Review A, 103 (2021), p. 032430

work page 2021

[44] [44]

Sirignano and K

J. Sirignano and K. Spiliopoulos , Mean ﬁeld analysis of deep neural networks , 2021

work page 2021

[45] [45]

Sznitman , Topics in propagation of chaos , Ecole d’été de probabilités de Saint-Flour XIX—1989, 1464 (1991), pp

A.-S. Sznitman , Topics in propagation of chaos , Ecole d’été de probabilités de Saint-Flour XIX—1989, 1464 (1991), pp. 165–251

work page 1989

[46] [46]

Villani , Optimal transport, vol

C. Villani , Optimal transport, vol. 338 of Grundlehren der mathematischen Wissenschafte n [Fundamental Principles of Mathematical Sciences], Springer-Verlag, Berlin, 2009 . Old and new. (A. Melchor Hernandez) Dipartimento di Matematica, Via Zamboni, 33, 40126, Bologn a (Italy) (D. Pastorello) Dipartimento di Matematica, Università di Bologna, Via Zamb on...

work page 2009