Control and optimization for Neural Partial Differential Equations in Supervised Learning

Alain Bensoussan; Bangjie Wang; Minh-Binh Tran

arxiv: 2506.20764 · v2 · pith:ZBPHRCCInew · submitted 2025-06-25 · 🧮 math.OC · cs.LG

Control and optimization for Neural Partial Differential Equations in Supervised Learning

Alain Bensoussan , Minh-Binh Tran , Bangjie Wang This is my paper

Pith reviewed 2026-05-22 01:00 UTC · model grok-4.3

classification 🧮 math.OC cs.LG

keywords neural networkssupervised learningparabolic PDE controlcoefficient optimizationdual formulationexistence of minimizershyperbolic approximation

0 comments

The pith

Neural networks viewed as PDEs allow proving existence of optimal coefficient controls for supervised learning tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reframes supervised learning as transporting data through layers by treating neural networks as parabolic or hyperbolic PDEs whose coefficients are the controls to optimize. It introduces a dual system formulation specifically for the parabolic case and uses it to prove that minimizers exist for the associated control problem. This reformulation shifts the focus from discrete layer-by-layer training to continuous coefficient optimization, which the authors argue can support more efficient numerical schemes. For hyperbolic PDEs the work establishes existence only for an approximated version of the control problem. A sympathetic reader would care because the result connects control theory directly to the design of neural architectures and suggests that existence guarantees from PDE analysis might transfer to machine-learning optimization.

Core claim

The authors interpret the forward pass of a neural network as the solution of a parabolic PDE whose coefficients encode the network weights; they then formulate the training objective as a coefficient-control problem and prove that this problem admits at least one minimizer by constructing a dual system whose optimality conditions characterize the solution. The same perspective is applied to hyperbolic operators, where existence is shown for a suitably approximated control problem.

What carries the argument

The dual system formulation that recasts coefficient optimization for parabolic PDEs as a controlled evolution problem and supplies the necessary optimality conditions for proving minimizer existence.

If this is right

The parabolic coefficient-control problem arising from neural networks possesses at least one minimizer.
The dual formulation supplies first-order optimality conditions that can be used to construct numerical schemes.
An approximated control problem for the corresponding hyperbolic PDE also admits solutions.
The continuous PDE viewpoint makes it possible to import existence and approximation results from PDE control theory into supervised learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The dual-system approach may allow gradient-based training algorithms to be replaced or accelerated by PDE-solver techniques that exploit the continuous structure.
If the modeling assumption holds, residual networks and other continuous-depth models could inherit existence guarantees that are unavailable in the discrete setting.
The framework suggests testing whether standard back-propagation converges to the same limits as the PDE-optimal controls when the number of layers grows large.

Load-bearing premise

Supervised learning with neural networks can be represented as a coefficient-control problem for parabolic or hyperbolic PDEs without losing essential features of the original discrete-layer model.

What would settle it

A concrete parabolic PDE coefficient-control problem derived from a simple feed-forward network for which no minimizer exists in the function space would falsify the existence claim.

read the original abstract

Although there is a substantial body of literature on control and optimization problems for parabolic and hyperbolic systems, the specific problem of controlling and optimizing the coefficients of the associated operators within such systems has not yet been thoroughly explored. In this work, we aim to initiate a line of research in control theory focused on optimizing and controlling the coefficients of these operators-a problem that naturally arises in the context of neural networks and supervised learning. In supervised learning, the primary objective is to transport initial data toward target data through the layers of a neural network. We propose a novel perspective: neural networks can be interpreted as partial differential equations (PDEs). From this viewpoint, the control problem traditionally studied in the context of ordinary differential equations (ODEs) is reformulated as a control problem for PDEs, specifically targeting the optimization and control of coefficients in parabolic and hyperbolic operators. To the best of our knowledge, this specific problem has not yet been systematically addressed in the control theory of PDEs. To this end, we propose a dual system formulation for the control and optimization problem associated with parabolic PDEs, laying the groundwork for the development of efficient numerical schemes in future research. We also provide a theoretical proof showing that the control and optimization problem for parabolic PDEs admits minimizers. Finally, we investigate the control problem associated with hyperbolic PDEs and prove the existence of solutions for a corresponding approximated control problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper establishes existence of minimizers for a coefficient-control problem in parabolic PDEs modeling neural nets, but the link from discrete layers to the PDE is not yet backed by convergence results.

read the letter

The main takeaway is that this work recasts supervised learning as a coefficient-control problem for parabolic PDEs, proves that minimizers exist, and sets up a dual system that could support later numerics. They do the same for an approximated hyperbolic version. That is the concrete output: existence theorems in a setting that treats the operator coefficients as the controls, which aligns with weights in a network but shifts the usual ODE control view to PDEs. The dual formulation follows the standard pattern in parabolic control and looks workable for deriving optimality conditions here. The hyperbolic approximation result is narrower but still gives something explicit to work with. This is new in the PDE control literature, where coefficient optimization has not been the main target before. The setup stays within established existence theory, so the derivations themselves appear to rest on solid ground once the problem is posed. The soft spot sits in the modeling step. The paper assumes the discrete-layer forward map and loss can be faithfully replaced by the continuous PDE control problem, yet it supplies no error estimates, no consistency theorem for the discretization, and no check that critical points or minima correspond. Without that, the existence result stays somewhat detached from actual network training. No numerical examples appear to test the gap either. The math and citations look standard for this intersection of control and continuous limits of networks. This is for readers already working on PDE control or on continuous formulations of neural nets. A theorist might pick up the dual system as a tool to think about optimality, but a practitioner or someone needing algorithms will find little immediate use. I would send it for peer review. The existence claims are definite enough to merit referee time, and the direction is worth checking even if the discrete-to-continuous bridge needs more work.

Referee Report

2 major / 1 minor

Summary. The manuscript interprets supervised learning with neural networks as a coefficient-control problem for parabolic and hyperbolic PDEs. It proposes a dual-system formulation for the parabolic case, proves existence of minimizers for the parabolic control-and-optimization problem, and establishes existence of solutions for an approximated control problem in the hyperbolic setting.

Significance. If the discrete-to-continuous passage is shown to be consistent, the work could open a new interface between PDE control theory and neural-network training, supplying analytical tools and numerical schemes that are currently unavailable. The existence result for the parabolic problem is a concrete contribution, but its relevance to actual supervised learning depends on the fidelity of the modeling step.

major comments (2)

[Introduction and the PDE-reformulation section] The central modeling claim—that the supervised-learning objective for discrete NN layers is faithfully recovered by a coefficient-control problem for a parabolic PDE—is load-bearing yet unsupported by any continuum-limit theorem, consistency estimate, or comparison of critical points. Without such justification the existence of PDE minimizers does not imply useful information for the original discrete optimization landscape.
[Parabolic control problem and its dual system] The abstract states that existence of minimizers for the parabolic problem is proved via a dual system, but the manuscript supplies neither the full derivation, the precise function space setting, nor the treatment of boundary conditions. These omissions prevent verification that the dual formulation is well-posed and that the existence result is non-vacuous.

minor comments (1)

[Notation and preliminaries] Notation for the controlled operators and the admissible control sets should be introduced more explicitly to avoid ambiguity when the same symbols are reused for the discrete and continuous settings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments highlight important aspects of the modeling justification and the presentation of the technical results. We address each major comment below and describe the revisions we intend to incorporate.

read point-by-point responses

Referee: [Introduction and the PDE-reformulation section] The central modeling claim—that the supervised-learning objective for discrete NN layers is faithfully recovered by a coefficient-control problem for a parabolic PDE—is load-bearing yet unsupported by any continuum-limit theorem, consistency estimate, or comparison of critical points. Without such justification the existence of PDE minimizers does not imply useful information for the original discrete optimization landscape.

Authors: We agree that a rigorous discrete-to-continuous limit would strengthen the link between the original supervised-learning problem and the PDE formulation. The manuscript introduces the parabolic PDE as a continuous relaxation motivated by the layer-wise structure of residual networks and existing ODE interpretations in the literature, but does not contain a convergence theorem or critical-point comparison. In the revision we will add a new subsection that explicitly states the modeling assumptions, cites related continuum-limit results for neural ODEs, and outlines the technical steps required for a future consistency analysis. This will clarify the scope of the current existence result while indicating how it may inform the discrete setting. revision: partial
Referee: [Parabolic control problem and its dual system] The abstract states that existence of minimizers for the parabolic problem is proved via a dual system, but the manuscript supplies neither the full derivation, the precise function space setting, nor the treatment of boundary conditions. These omissions prevent verification that the dual formulation is well-posed and that the existence result is non-vacuous.

Authors: We acknowledge that the current presentation of the dual-system argument is too concise. The existence proof proceeds by introducing an adjoint equation, deriving first-order optimality conditions, and applying weak compactness in appropriate spaces. In the revised manuscript we will expand the relevant section to include: (i) the complete derivation of the dual system from the primal control problem, (ii) the precise function-space setting (e.g., state variable in L^2(0,T;H^1) ∩ H^1(0,T;H^{-1}) and control in L^2), and (iii) the boundary conditions employed (homogeneous Dirichlet). These additions will make the well-posedness of the dual formulation and the non-vacuity of the existence result verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity: existence results rest on standard PDE control theory independent of the NN mapping

full rationale

The paper introduces a modeling perspective that neural networks correspond to coefficient-controlled parabolic or hyperbolic PDEs, then derives a dual system and proves existence of minimizers for the parabolic control problem. These steps rely on classical techniques from optimal control of PDEs rather than any self-definition, fitted-parameter renaming, or load-bearing self-citation chain. The central existence claim does not reduce to quantities defined inside the paper; the NN-to-PDE interpretation is presented as an ansatz that motivates the problem but is not used to derive the existence result itself. No quoted equation or theorem collapses by construction to an input or prior self-result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central contribution rests on the interpretive step that neural networks correspond to PDEs whose coefficients can be treated as controls; no free parameters or new entities are introduced in the abstract.

axioms (1)

domain assumption Neural networks in supervised learning can be modeled as parabolic or hyperbolic PDEs with controllable coefficients.
This modeling choice is the starting point for reformulating the control problem and is stated as a novel perspective in the abstract.

pith-pipeline@v0.9.0 · 5783 in / 1186 out tokens · 77820 ms · 2026-05-22T01:00:45.078692+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages

[1]

R. A. Adams, Compact imbeddings of weighted sobolev spaces on unbounded domains , Journal of Dif- ferential Equations 9 (1971), no. 2, 325–334

work page 1971
[2]

Beauchard, J.-M

K. Beauchard, J.-M. Coron, and H. Teismann, Minimal time for the approximate bilinear control of schr¨ odinger equations, Mathematical Methods in the Applied Sciences 41 (2018), no. 5, 1831–1844

work page 2018
[3]

Bonner, A

S. Bonner, A. Smola, and D. Sutherland, Neural differential equations , Foundations and Trends in Machine Learning 14 (2021), no. 2–3, 142–229

work page 2021
[4]

Boyer and P

F. Boyer and P. Fabrie, Mathematical tools for the study of the incompressible navier-stokes equations andrelated models, vol. 183, Springer Science & Business Media, 2012

work page 2012
[5]

Cannarsa, P

P. Cannarsa, P. Martinez, and C. Urbani, Bilinear control of a degenerate hyperbolic equation , SIAM Journal on Mathematical Analysis 55 (2023), no. 6, 6517–6553

work page 2023
[6]

Chang, L

B. Chang, L. Meng, E. Haber, L. Ruthotto, D. Begert, and E. Holtham, Reversible architectures for arbitrarily deep residual neural networks, Proceedings of the AAAI conference on artificial intelligence, vol. 32, 2018

work page 2018
[7]

R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud, Neural ordinary differential equations, Advances in Neural Information Processing Systems (NeurIPS), vol. 31, 2018

work page 2018
[8]

R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, Neural ordinary differential equations, Advances in neural information processing systems 31 (2018)

work page 2018
[9]

Chizat and F

L. Chizat and F. Bach, Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss , Conference on learning theory, PMLR, 2020, pp. 1305–1338

work page 2020
[10]

3, 735–773

D.Ruiz-Balet and E.Zuazua, Neural ode control for classification, approximation, and transport, SIAM Review 65 (2023), no. 3, 735–773

work page 2023
[11]

Duca and V

A. Duca and V. Nersesyan, Bilinear control and growth of sobolev norms for the nonlinear schr¨ odinger equation, Journal of the European Mathematical Society (2024)

work page 2024
[12]

Dupont, A

E. Dupont, A. Doucet, and Y. W. Teh, Augmented neural ordinary differential equations, Advances in Neural Information Processing Systems, vol. 32, 2019

work page 2019
[13]

Erdos, M

L. Erdos, M. Salmhofer, and H.-T. Yau, Quantum diffusion of the random schr¨ odinger evolution in the scaling limit ii. the recollision diagrams , Communications in mathematical physics 271 (2007), no. 1, 1–53

work page 2007
[14]

Erd˝ os, M

L. Erd˝ os, M. Salmhofer, and H.-T Yau,Quantum diffusion of the random schr¨ odinger evolution in the scaling limit, Acta Mathematica 200 (2008), no. 2, 211–277. Neural Partial Differential Equations in Supervised Learning 43

work page 2008
[15]

L. C. Evans, Partial differential equations , second ed., Graduate Studies in Mathematics, vol. 19, American Mathematical Society, Providence, RI, 2010. MR 2597943

work page 2010
[16]

Friedman, Partial differential equations of parabolic type , Dover Books on Mathematics, Dover Publications, 2013

A. Friedman, Partial differential equations of parabolic type , Dover Books on Mathematics, Dover Publications, 2013

work page 2013
[17]

A. N. Gomez, M. Ren, R. Urtasun, and R. B. Grosse, The reversible residual network: Backpropagation without storing activations , Advances in neural information processing systems 30 (2017)

work page 2017
[18]

Goodfellow, Y

I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning, vol. 1, MIT press Cambridge, 2016

work page 2016
[19]

Grathwohl, R

W. Grathwohl, R. T. Q. Chen, J. Bettencourt, I. Sutskever, and D. Duvenaud, Ffjord: Free-form continuous dynamics for scalable reversible generative models , International Conference on Learning Representations (ICLR) (2019)

work page 2019
[20]

Haber, E.and Lensink, E

K. Haber, E.and Lensink, E. Treister, and L. Ruthotto, Imexnet a forward stable deep neural network , International Conference on Machine Learning, PMLR, 2019, pp. 2525–2534

work page 2019
[21]

B. C. Hall, Lie groups, lie algebras, and representations: An elementary introduction, 2nd ed., Springer, 2015

work page 2015
[22]

Hernández and E

M. Hern´ andez and E. Zuazua,Deep neural networks: Multi-classification and universal approximation , arXiv preprint arXiv:2409.06555 (2024)

work page arXiv 2024
[23]

N. J. Higham, Functions of matrices: Theory and computation , 1 ed., SIAM, Philadelphia, PA, 2008

work page 2008
[24]

Hinton, L

G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, and T. N. Sainath, Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal processing magazine 29 (2012), no. 6, 82–97

work page 2012
[25]

Hochreiter and J

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation, vol. 9, MIT Press, 1997, pp. 1735–1780

work page 1997
[26]

R. A. Horn and C. R. Johnson, Matrix analysis, 1 ed., Cambridge University Press, Cambridge, UK, 1985

work page 1985
[27]

Kidger, J

P. Kidger, J. Morrill, J. Foster, and T. Lyons, Neural controlled differential equations for irregular time series, Advances in Neural Information Processing Systems, 2020

work page 2020
[28]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems 25 (2012)

work page 2012
[29]

LeCun and Y

Y. LeCun and Y. Bengio, Convolutional networks for images, speech, and time series , The handbook of brain theory and neural networks 3361 (1995), no. 10, 1995

work page 1995
[30]

LeCun, Y

Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, nature 521 (2015), no. 7553, 436–444

work page 2015
[31]

LeCun, K

Y. LeCun, K. Kavukcuoglu, and C. Farabet, Convolutional networks and applications in vision , Pro- ceedings of 2010 IEEE international symposium on circuits and systems, IEEE, 2010, pp. 253–256

work page 2010
[32]

Lensink, B

K. Lensink, B. Peters, and Eldad Haber, Fully hyperbolic convolutional neural networks , Research in the Mathematical Sciences 9 (2022), no. 4, 60

work page 2022
[33]

Lions, Optimal control of systems governed by partial differential equations , vol

J.-L. Lions, Optimal control of systems governed by partial differential equations , vol. 170, Springer, 1971

work page 1971
[34]

Lions, College de france lecture, 9 nov 2018

P.-L. Lions, College de france lecture, 9 nov 2018

work page 2018
[35]

Lukkarinen and H

J. Lukkarinen and H. Spohn, Kinetic limit for wave propagation in a random medium , Arch. Ration. Mech. Anal. 183 (2007), no. 1, 93–162. MR 2259341

work page 2007
[36]

, Weakly nonlinear Schr¨ odinger equation with random initial data, Invent. Math. 183 (2011), no. 1, 79–188. MR 2755061

work page 2011
[37]

Massaroli, M

S. Massaroli, M. Poli, J. Park, A. Yamashita, and H. Asama, Dissecting neural odes , Advances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 3972–3983

work page 2020
[38]

Pazy, Semigroups of linear operators and applications to partial differential equations , Applied Mathematical Sciences, vol

A. Pazy, Semigroups of linear operators and applications to partial differential equations , Applied Mathematical Sciences, vol. 44, Springer-Verlag, New York, 1983. MR 710486

work page 1983
[39]

K. Pei, Y. Cao, J. Yang, and S. Jana, Deepxplore: Automated whitebox testing of deep learning systems, proceedings of the 26th Symposium on Operating Systems Principles, 2017, pp. 1–18

work page 2017
[40]

Pozzoli, Small-time global approximate controllability of bilinear wave equations , Journal of Differ- ential Equations 388 (2024), 421–438

E. Pozzoli, Small-time global approximate controllability of bilinear wave equations , Journal of Differ- ential Equations 388 (2024), 421–438. Neural Partial Differential Equations in Supervised Learning 44

work page 2024
[41]

Rackauckas, Y

C. Rackauckas, Y. Ma, J. Martensen, K. Warner, C.and Zubov, R. Supekar, D. Skinner, and A. Ramad- han, Universal differential equations for scientific machine learning , NeurIPS Workshop on Machine Learning and the Physical Sciences, 2020

work page 2020
[42]

Ruiz-Balet, E

D. Ruiz-Balet, E. Affili, and E. Zuazua, Interpolation and approximation via momentum resnets and neural odes, Systems & Control Letters 162 (2022), 105182

work page 2022
[43]

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Nature 323 (1986), no. 6088, 533–536

work page 1986
[44]

Ruthotto and E

L. Ruthotto and E. Haber, Deep neural networks motivated by partial differential equations , Journal of Mathematical Imaging and Vision 62 (2020), no. 3, 352–364

work page 2020
[45]

Simon, Trace ideals and their applications , 2nd ed., American Mathematical Society, 2005

B. Simon, Trace ideals and their applications , 2nd ed., American Mathematical Society, 2005

work page 2005
[46]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need , Advances in Neural Information Processing Systems 30 (2017)

work page 2017
[47]

Y. C. Zhang, X. Gao, X. Wang, A. Gholami, K. Keutzer, and M. W. Mahoney, Anodev2: A coupled neural ode framework, International Conference on Machine Learning (ICML), 2020, pp. 12359–12370

work page 2020
[48]

Zuazua, Controllability and observability of partial differential equations: some results and open problems, Handbook of differential equations: evolutionary equations, vol

E. Zuazua, Controllability and observability of partial differential equations: some results and open problems, Handbook of differential equations: evolutionary equations, vol. 3, Elsevier, 2007, pp. 527– 621. Naveen Jindal School of Management, The University of Texas at Dallas, Richardson, TX 75080, USA Email address: alain.bensoussan@utdallas.edu Depar...

work page 2007

[1] [1]

R. A. Adams, Compact imbeddings of weighted sobolev spaces on unbounded domains , Journal of Dif- ferential Equations 9 (1971), no. 2, 325–334

work page 1971

[2] [2]

Beauchard, J.-M

K. Beauchard, J.-M. Coron, and H. Teismann, Minimal time for the approximate bilinear control of schr¨ odinger equations, Mathematical Methods in the Applied Sciences 41 (2018), no. 5, 1831–1844

work page 2018

[3] [3]

Bonner, A

S. Bonner, A. Smola, and D. Sutherland, Neural differential equations , Foundations and Trends in Machine Learning 14 (2021), no. 2–3, 142–229

work page 2021

[4] [4]

Boyer and P

F. Boyer and P. Fabrie, Mathematical tools for the study of the incompressible navier-stokes equations andrelated models, vol. 183, Springer Science & Business Media, 2012

work page 2012

[5] [5]

Cannarsa, P

P. Cannarsa, P. Martinez, and C. Urbani, Bilinear control of a degenerate hyperbolic equation , SIAM Journal on Mathematical Analysis 55 (2023), no. 6, 6517–6553

work page 2023

[6] [6]

Chang, L

B. Chang, L. Meng, E. Haber, L. Ruthotto, D. Begert, and E. Holtham, Reversible architectures for arbitrarily deep residual neural networks, Proceedings of the AAAI conference on artificial intelligence, vol. 32, 2018

work page 2018

[7] [7]

R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud, Neural ordinary differential equations, Advances in Neural Information Processing Systems (NeurIPS), vol. 31, 2018

work page 2018

[8] [8]

R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, Neural ordinary differential equations, Advances in neural information processing systems 31 (2018)

work page 2018

[9] [9]

Chizat and F

L. Chizat and F. Bach, Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss , Conference on learning theory, PMLR, 2020, pp. 1305–1338

work page 2020

[10] [10]

3, 735–773

D.Ruiz-Balet and E.Zuazua, Neural ode control for classification, approximation, and transport, SIAM Review 65 (2023), no. 3, 735–773

work page 2023

[11] [11]

Duca and V

A. Duca and V. Nersesyan, Bilinear control and growth of sobolev norms for the nonlinear schr¨ odinger equation, Journal of the European Mathematical Society (2024)

work page 2024

[12] [12]

Dupont, A

E. Dupont, A. Doucet, and Y. W. Teh, Augmented neural ordinary differential equations, Advances in Neural Information Processing Systems, vol. 32, 2019

work page 2019

[13] [13]

Erdos, M

L. Erdos, M. Salmhofer, and H.-T. Yau, Quantum diffusion of the random schr¨ odinger evolution in the scaling limit ii. the recollision diagrams , Communications in mathematical physics 271 (2007), no. 1, 1–53

work page 2007

[14] [14]

Erd˝ os, M

L. Erd˝ os, M. Salmhofer, and H.-T Yau,Quantum diffusion of the random schr¨ odinger evolution in the scaling limit, Acta Mathematica 200 (2008), no. 2, 211–277. Neural Partial Differential Equations in Supervised Learning 43

work page 2008

[15] [15]

L. C. Evans, Partial differential equations , second ed., Graduate Studies in Mathematics, vol. 19, American Mathematical Society, Providence, RI, 2010. MR 2597943

work page 2010

[16] [16]

Friedman, Partial differential equations of parabolic type , Dover Books on Mathematics, Dover Publications, 2013

A. Friedman, Partial differential equations of parabolic type , Dover Books on Mathematics, Dover Publications, 2013

work page 2013

[17] [17]

A. N. Gomez, M. Ren, R. Urtasun, and R. B. Grosse, The reversible residual network: Backpropagation without storing activations , Advances in neural information processing systems 30 (2017)

work page 2017

[18] [18]

Goodfellow, Y

I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning, vol. 1, MIT press Cambridge, 2016

work page 2016

[19] [19]

Grathwohl, R

W. Grathwohl, R. T. Q. Chen, J. Bettencourt, I. Sutskever, and D. Duvenaud, Ffjord: Free-form continuous dynamics for scalable reversible generative models , International Conference on Learning Representations (ICLR) (2019)

work page 2019

[20] [20]

Haber, E.and Lensink, E

K. Haber, E.and Lensink, E. Treister, and L. Ruthotto, Imexnet a forward stable deep neural network , International Conference on Machine Learning, PMLR, 2019, pp. 2525–2534

work page 2019

[21] [21]

B. C. Hall, Lie groups, lie algebras, and representations: An elementary introduction, 2nd ed., Springer, 2015

work page 2015

[22] [22]

Hernández and E

M. Hern´ andez and E. Zuazua,Deep neural networks: Multi-classification and universal approximation , arXiv preprint arXiv:2409.06555 (2024)

work page arXiv 2024

[23] [23]

N. J. Higham, Functions of matrices: Theory and computation , 1 ed., SIAM, Philadelphia, PA, 2008

work page 2008

[24] [24]

Hinton, L

G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, and T. N. Sainath, Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal processing magazine 29 (2012), no. 6, 82–97

work page 2012

[25] [25]

Hochreiter and J

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation, vol. 9, MIT Press, 1997, pp. 1735–1780

work page 1997

[26] [26]

R. A. Horn and C. R. Johnson, Matrix analysis, 1 ed., Cambridge University Press, Cambridge, UK, 1985

work page 1985

[27] [27]

Kidger, J

P. Kidger, J. Morrill, J. Foster, and T. Lyons, Neural controlled differential equations for irregular time series, Advances in Neural Information Processing Systems, 2020

work page 2020

[28] [28]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems 25 (2012)

work page 2012

[29] [29]

LeCun and Y

Y. LeCun and Y. Bengio, Convolutional networks for images, speech, and time series , The handbook of brain theory and neural networks 3361 (1995), no. 10, 1995

work page 1995

[30] [30]

LeCun, Y

Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, nature 521 (2015), no. 7553, 436–444

work page 2015

[31] [31]

LeCun, K

Y. LeCun, K. Kavukcuoglu, and C. Farabet, Convolutional networks and applications in vision , Pro- ceedings of 2010 IEEE international symposium on circuits and systems, IEEE, 2010, pp. 253–256

work page 2010

[32] [32]

Lensink, B

K. Lensink, B. Peters, and Eldad Haber, Fully hyperbolic convolutional neural networks , Research in the Mathematical Sciences 9 (2022), no. 4, 60

work page 2022

[33] [33]

Lions, Optimal control of systems governed by partial differential equations , vol

J.-L. Lions, Optimal control of systems governed by partial differential equations , vol. 170, Springer, 1971

work page 1971

[34] [34]

Lions, College de france lecture, 9 nov 2018

P.-L. Lions, College de france lecture, 9 nov 2018

work page 2018

[35] [35]

Lukkarinen and H

J. Lukkarinen and H. Spohn, Kinetic limit for wave propagation in a random medium , Arch. Ration. Mech. Anal. 183 (2007), no. 1, 93–162. MR 2259341

work page 2007

[36] [36]

, Weakly nonlinear Schr¨ odinger equation with random initial data, Invent. Math. 183 (2011), no. 1, 79–188. MR 2755061

work page 2011

[37] [37]

Massaroli, M

S. Massaroli, M. Poli, J. Park, A. Yamashita, and H. Asama, Dissecting neural odes , Advances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 3972–3983

work page 2020

[38] [38]

Pazy, Semigroups of linear operators and applications to partial differential equations , Applied Mathematical Sciences, vol

A. Pazy, Semigroups of linear operators and applications to partial differential equations , Applied Mathematical Sciences, vol. 44, Springer-Verlag, New York, 1983. MR 710486

work page 1983

[39] [39]

K. Pei, Y. Cao, J. Yang, and S. Jana, Deepxplore: Automated whitebox testing of deep learning systems, proceedings of the 26th Symposium on Operating Systems Principles, 2017, pp. 1–18

work page 2017

[40] [40]

Pozzoli, Small-time global approximate controllability of bilinear wave equations , Journal of Differ- ential Equations 388 (2024), 421–438

E. Pozzoli, Small-time global approximate controllability of bilinear wave equations , Journal of Differ- ential Equations 388 (2024), 421–438. Neural Partial Differential Equations in Supervised Learning 44

work page 2024

[41] [41]

Rackauckas, Y

C. Rackauckas, Y. Ma, J. Martensen, K. Warner, C.and Zubov, R. Supekar, D. Skinner, and A. Ramad- han, Universal differential equations for scientific machine learning , NeurIPS Workshop on Machine Learning and the Physical Sciences, 2020

work page 2020

[42] [42]

Ruiz-Balet, E

D. Ruiz-Balet, E. Affili, and E. Zuazua, Interpolation and approximation via momentum resnets and neural odes, Systems & Control Letters 162 (2022), 105182

work page 2022

[43] [43]

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Nature 323 (1986), no. 6088, 533–536

work page 1986

[44] [44]

Ruthotto and E

L. Ruthotto and E. Haber, Deep neural networks motivated by partial differential equations , Journal of Mathematical Imaging and Vision 62 (2020), no. 3, 352–364

work page 2020

[45] [45]

Simon, Trace ideals and their applications , 2nd ed., American Mathematical Society, 2005

B. Simon, Trace ideals and their applications , 2nd ed., American Mathematical Society, 2005

work page 2005

[46] [46]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need , Advances in Neural Information Processing Systems 30 (2017)

work page 2017

[47] [47]

Y. C. Zhang, X. Gao, X. Wang, A. Gholami, K. Keutzer, and M. W. Mahoney, Anodev2: A coupled neural ode framework, International Conference on Machine Learning (ICML), 2020, pp. 12359–12370

work page 2020

[48] [48]

Zuazua, Controllability and observability of partial differential equations: some results and open problems, Handbook of differential equations: evolutionary equations, vol

E. Zuazua, Controllability and observability of partial differential equations: some results and open problems, Handbook of differential equations: evolutionary equations, vol. 3, Elsevier, 2007, pp. 527– 621. Naveen Jindal School of Management, The University of Texas at Dallas, Richardson, TX 75080, USA Email address: alain.bensoussan@utdallas.edu Depar...

work page 2007