DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators

George Em Karniadakis; Lu Lu; Pengzhan Jin

arxiv: 1910.03193 · v3 · pith:WKMJUTOSnew · submitted 2019-10-08 · 💻 cs.LG · stat.ML

DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators

Lu Lu , Pengzhan Jin , George Em Karniadakis This is my paper

Pith reviewed 2026-05-15 03:12 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords DeepONetoperator learninguniversal approximation theoremneural networksdifferential equationsgeneralization errorbranch nettrunk net

0 comments

The pith

DeepONets learn nonlinear operators from small datasets by splitting input encoding from output evaluation points.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes DeepONet, a neural network architecture designed to learn nonlinear operators that map input functions to output functions. It builds on the universal approximation theorem for operators by splitting the network into a branch net that encodes the input function at fixed sensor locations and a trunk net that encodes the output evaluation points. This design allows accurate approximation of operators in dynamic systems and partial differential equations from relatively small datasets, reducing generalization error compared to standard fully connected networks. Computational tests show error convergence rates that are polynomial up to fourth order or even exponential as the training dataset size increases.

Core claim

DeepONets realize the practical application of the operator approximation theorem through a branch-trunk architecture, enabling the learning of nonlinear operators for identifying differential equations with high accuracy and efficiency from limited data, as evidenced by observed high-order convergence in error with respect to training set size.

What carries the argument

The branch-trunk split architecture, where one subnetwork processes input function values at sensors and the other processes output locations to produce the operator output.

Load-bearing premise

That the practical optimization and generalization errors remain small enough with the branch-trunk design and standard training to achieve the high convergence rates promised by the approximation theorem.

What would settle it

A test where increasing the training dataset size for DeepONet on identifying a partial differential equation operator yields only linear or slower error reduction instead of the reported polynomial or exponential rates.

read the original abstract

While it is widely known that neural networks are universal approximators of continuous functions, a less known and perhaps more powerful result is that a neural network with a single hidden layer can approximate accurately any nonlinear continuous operator. This universal approximation theorem is suggestive of the potential application of neural networks in learning nonlinear operators from data. However, the theorem guarantees only a small approximation error for a sufficient large network, and does not consider the important optimization and generalization errors. To realize this theorem in practice, we propose deep operator networks (DeepONets) to learn operators accurately and efficiently from a relatively small dataset. A DeepONet consists of two sub-networks, one for encoding the input function at a fixed number of sensors $x_i, i=1,\dots,m$ (branch net), and another for encoding the locations for the output functions (trunk net). We perform systematic simulations for identifying two types of operators, i.e., dynamic systems and partial differential equations, and demonstrate that DeepONet significantly reduces the generalization error compared to the fully-connected networks. We also derive theoretically the dependence of the approximation error in terms of the number of sensors (where the input function is defined) as well as the input function type, and we verify the theorem with computational results. More importantly, we observe high-order error convergence in our computational tests, namely polynomial rates (from half order to fourth order) and even exponential convergence with respect to the training dataset size.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DeepONet gives a workable branch-trunk split that turns the operator UAT into something trainable on modest data and shows polynomial-to-exponential convergence on ODE and PDE examples.

read the letter

The main point is that they split the network into a branch sub-net that encodes the input function at a fixed set of sensors and a trunk sub-net that encodes the output locations. This split lets them approximate nonlinear operators from small datasets and produces visibly lower generalization error than plain fully-connected nets on the same tasks. They also derive how the approximation error scales with sensor count and confirm it numerically, then report convergence rates from half-order up to exponential as the training set grows on standard dynamic-system and PDE benchmarks. That combination is what is new relative to earlier operator-learning attempts. The architecture is simple to implement, the error bound is stated clearly, and the empirical rates are measured on held-out data rather than just training loss. Those are the parts that hold up. The weaker parts are the usual ones for this style of work. The universal approximation theorem only bounds the approximation error; the paper acknowledges that optimization and generalization errors still matter but offers no separate analysis of why the branch-trunk split controls them better than other architectures. The reported rates are observed on concrete examples rather than proven in general, and the experiments rely on standard benchmark problems without exhaustive ablation on sensor placement or data-generation details. Those gaps are real but not fatal for an initial demonstration. The paper is aimed at people who need to learn and query operators repeatedly in scientific computing or control settings. A reader already working on surrogate models or physics-informed networks would find the architecture and the convergence plots directly usable. It is worth sending to peer review because the central construction is reproducible, the empirical evidence is internally consistent, and the idea is simple enough that referees can check the claims without heroic effort.

Referee Report

2 major / 2 minor

Summary. The paper proposes deep operator networks (DeepONets) consisting of a branch network to encode the input function at a fixed number of sensors and a trunk network to encode the output function locations. This architecture is used to learn nonlinear operators for dynamic systems and PDEs from data. The authors derive the dependence of the approximation error on the number of sensors and input function type, verify it computationally, and report high-order convergence rates (polynomial to exponential) with respect to the training dataset size, while showing reduced generalization error compared to fully-connected networks.

Significance. If the empirical observations of high-order convergence hold, this work is significant as it provides a practical method to approximate operators with controllable error based on the operator universal approximation theorem. The separation into branch and trunk networks allows efficient learning from small datasets, which could impact fields like scientific machine learning and surrogate modeling for differential equations. The theoretical derivation combined with numerical verification adds strength to the claims.

major comments (2)

[Section on theoretical derivation] The approximation error bound depending on sensor count m is derived, but the manuscript should explicitly state the assumptions on the input function class (e.g., continuity or Sobolev space) to ensure the bound is rigorous and load-bearing for the convergence claims.
[Numerical results section] The reported polynomial and exponential convergence rates with training dataset size N are observed in computational tests; however, details on the exact error metric (e.g., L2 norm on held-out data), number of independent runs, and confirmation that rates are not due to overfitting need to be provided to support the high-order convergence claim.

minor comments (2)

The abstract mentions 'systematic simulations' but the manuscript could benefit from a table summarizing the benchmark problems, sensor counts m, and observed rates for clarity.
[Introduction] Clarify the distinction between the branch net and trunk net in the notation to avoid ambiguity for readers unfamiliar with the architecture.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive suggestions for minor revision. We have addressed both major comments by clarifying the theoretical assumptions and expanding the numerical details in the revised manuscript.

read point-by-point responses

Referee: [Section on theoretical derivation] The approximation error bound depending on sensor count m is derived, but the manuscript should explicitly state the assumptions on the input function class (e.g., continuity or Sobolev space) to ensure the bound is rigorous and load-bearing for the convergence claims.

Authors: We agree that an explicit statement of the function class is needed for rigor. The derivation relies on the universal approximation theorem for nonlinear operators, which holds for continuous input functions. In the revised manuscript, we have added a dedicated paragraph in the theoretical derivation section stating that the input functions are assumed to lie in C([0,1]^d) (continuous functions on a compact domain) or the appropriate Sobolev space when higher regularity is invoked, thereby making the error bound with respect to sensor count m fully rigorous under these conditions. revision: yes
Referee: [Numerical results section] The reported polynomial and exponential convergence rates with training dataset size N are observed in computational tests; however, details on the exact error metric (e.g., L2 norm on held-out data), number of independent runs, and confirmation that rates are not due to overfitting need to be provided to support the high-order convergence claim.

Authors: We have expanded the numerical results section to include these details. The error metric is the relative L2 norm computed on a fixed held-out test set of 2000 samples drawn independently of the training data. We performed 5 independent runs with different random seeds for network initialization and data shuffling, reporting both mean convergence rates and standard deviations. To rule out overfitting, we added a new figure showing that test error continues to decrease monotonically with N while training error saturates early; the reported high-order rates are therefore measured on unseen data. These clarifications have been incorporated. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper grounds its DeepONet proposal in the external universal approximation theorem for nonlinear operators (Chen & Chen 1995), which is not self-cited. The branch-trunk architecture is defined directly from that theorem without reference to fitted quantities. The claimed theoretical dependence of approximation error on sensor count m is derived from the external UAT and then verified numerically on held-out data for standard benchmark ODEs and PDEs; the reported polynomial-to-exponential convergence rates versus training-set size N are empirical observations, not restatements of training loss or self-defined quantities. No self-definitional steps, no load-bearing self-citations, and no renaming of known results appear in the derivation chain.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the known universal approximation theorem for operators and on the empirical observation that the branch-trunk split controls generalization error; no new physical entities or ad-hoc constants are introduced.

free parameters (1)

number of sensors m
The input function is evaluated at a fixed number m of sensor locations chosen by the user; this choice affects both accuracy and computational cost but is not fitted to data.

axioms (1)

standard math Universal approximation theorem for nonlinear continuous operators
Invoked in the opening paragraph to guarantee that a sufficiently large network can approximate any continuous operator; the paper treats this as background mathematics.

pith-pipeline@v0.9.0 · 5571 in / 1347 out tokens · 65069 ms · 2026-05-15T03:12:06.249764+00:00 · methodology

discussion (0)

Forward citations

Cited by 43 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Universal Approximation of Nonlinear Operators and Their Derivatives
cs.LG 2026-05 unverdicted novelty 8.0

Proves the first universal approximation theorems for k-times differentiable nonlinear operators between Banach spaces and their derivatives uniformly on compact sets in weighted Sobolev norms via encoder-decoder oper...
Constraint-Aware Flow Matching: Decision Aligned End-to-End Training for Constrained Sampling
cs.LG 2026-05 unverdicted novelty 7.0

Constraint-Aware Flow Matching integrates constraint projections into the flow matching training objective to align model dynamics with constrained sampling and reduce distributional shift.
Approximation of Maximally Monotone Operators : A Graph Convergence Perspective
cs.LG 2026-05 unverdicted novelty 7.0

Any maximally monotone operator can be approximated in local graph convergence by continuous encoder-decoder networks, with structure-preserving versions that retain maximal monotonicity via resolvent parameterizations.
Fixed-Point Neural Optimal Transport without Implicit Differentiation
math.OC 2026-05 unverdicted novelty 7.0

A single-network fixed-point formulation for neural optimal transport eliminates adversarial min-max optimization and implicit differentiation while enforcing dual feasibility exactly.
Stable Long-Horizon PDE Forecasting via Latent Structured Spectral Propagators
cs.LG 2026-05 unverdicted novelty 7.0

A latent Structured Spectral Propagator enables stable autoregressive PDE forecasting by decoupling spatial details from recurrent modal dynamics.
CATO: Charted Attention for Neural PDE Operators
cs.AI 2026-05 unverdicted novelty 7.0

CATO learns a continuous latent chart for efficient axial attention on PDE meshes and adds derivative-aware supervision to improve accuracy and reduce oversmoothing on general geometries.
Physics-Informed Neural PDE Solvers via Spatio-Temporal MeanFlow
cs.LG 2026-05 unverdicted novelty 7.0

Spatio-Temporal MeanFlow adapts MeanFlow to PDEs by replacing the generative velocity field with the physical operator and extending the integral constraint to the spatio-temporal domain, yielding a unified solver for...
Geometry-Aware Neural Optimizer for Shape Optimization and Inversion
cs.LG 2026-05 unverdicted novelty 7.0

GANO unifies shape encoding with auto-decoders, denoising-stabilized latent optimization, and geometry-injected surrogates into an end-to-end differentiable pipeline for PDE-governed shape optimization and inversion.
Geometry-Aware Neural Optimizer for Shape Optimization and Inversion
cs.LG 2026-05 conditional novelty 7.0

GANO is an end-to-end differentiable latent-space optimizer that unifies shape encoding, surrogate prediction, and controllable geometry updates for PDE-governed shape optimization and inversion.
AI models of unstable flow exhibit hallucination
physics.flu-dyn 2026-04 unverdicted novelty 7.0

AI models of viscous fingering exhibit hallucinations from spectral bias; DeepFingers combines FNO and DeepONet with time-contrast conditioning to predict accurate finger dynamics while preserving mixing metrics.
DeepRitzSplit Neural Operator for Phase-Field Models via Energy Splitting
math.AP 2026-04 unverdicted novelty 7.0

A DeepRitzSplit neural operator trained on energy-split variational forms enforces dissipation in phase-field models and outperforms data-driven training in generalization while running faster than Fourier spectral me...
DiLO: Decoupling Generative Priors and Neural Operators via Diffusion Latent Optimization for Inverse Problems
math.NA 2026-04 unverdicted novelty 7.0

DiLO turns diffusion sampling into deterministic latent optimization to satisfy the manifold consistency requirement for neural operators in inverse problem solving.
Is Flow Matching Just Trajectory Replay for Sequential Data?
stat.ML 2026-02 unverdicted novelty 7.0

Flow matching on time series targets a closed-form nonparametric velocity field that is a similarity-weighted mixture of observed transition velocities, making neural models approximations to an ideal memory-augmented...
CompNO: A Novel Foundation Model approach for solving Partial Differential Equations
cs.LG 2026-01 unverdicted novelty 7.0

CompNO composes specialized Fourier neural operator blocks for fundamental differential operators into task-specific solvers that achieve lower L2 error than baselines on linear parametric PDEs and remain competitive ...
Universal Approximation of Operators with Transformers and Neural Integral Operators
cs.LG 2024-09 unverdicted novelty 7.0

Transformers and generalized neural integral operators are shown to universally approximate operators between Hölder and Banach spaces.
Therm-FM: Foundation Model is ALL YOU NEED for 3D-ICs Thermal Simulation
cs.CE 2026-05 unverdicted novelty 6.0

Therm-FM adapts pretrained diffusion PDE foundation models to 3D-IC thermal simulation with multi-fidelity adaptation, reporting up to 10.6x mean error reduction and strong cross-design performance using under 20% of ...
Symplectic Neural Operators for Learning Infinite Dimensional Hamiltonian Systems
math.DS 2026-05 unverdicted novelty 6.0

Symplectic Neural Operators preserve symplectic structure for learning infinite-dimensional Hamiltonian PDEs and deliver improved long-term energy stability in theory and experiments.
Compositional Neural Operators for Multi-Dimensional Fluid Dynamics
cs.LG 2026-05 unverdicted novelty 6.0

Compositional Neural Operators decompose multi-dimensional fluid PDEs into a library of pretrained elementary physics blocks assembled via an aggregator that minimizes data and physics residuals.
Don't Fix the Basis -- Learn It: Spectral Representation with Adaptive Basis Learning for PDEs
cs.LG 2026-05 unverdicted novelty 6.0

ABLE learns a spatially adaptive Parseval frame from data via an ancillary density to replace fixed bases in spectral neural operators for PDEs.
PnP-Corrector: A Universal Correction Framework for Coupled Spatiotemporal Forecasting
cs.AI 2026-05 unverdicted novelty 6.0

PnP-Corrector decouples physics simulation from error correction to counter reciprocal error amplification in coupled spatiotemporal forecasting, cutting error by 29% in a 300-day ocean-atmosphere test.
PnP-Corrector: A Universal Correction Framework for Coupled Spatiotemporal Forecasting
cs.AI 2026-05 unverdicted novelty 6.0

PnP-Corrector decouples physics simulation from error correction via a plug-and-play agent, cutting error by 29% in 300-day global ocean-atmosphere forecasts.
Continuity Laws for Sequential Models
cs.LG 2026-05 unverdicted novelty 6.0

S4 models exhibit stable time-continuity unlike sensitive S6 models, with task continuity predicting performance and enabling temporal subsampling for better efficiency.
Hierarchical Multi-Fidelity Learning for Predicting Three-Dimensional Flame Wrinkling and Turbulent Burning Velocity
cs.LG 2026-05 unverdicted novelty 6.0

MuFiNNs integrates sparse experimental measurements with structured low-fidelity models via hierarchical construction and nonlinear correction to predict 3D flame wrinkling dynamics and turbulent mass burning velocity...
Geometry-Aware Neural Optimizer for Shape Optimization and Inversion
cs.LG 2026-05 unverdicted novelty 6.0

GANO unifies shape encoding, field prediction, and latent optimization with denoising for stable, controllable updates in PDE shape problems, reporting SOTA accuracy and up to 55.9% lift-to-drag gains on benchmarks.
Late Fusion Neural Operators for Extrapolation Across Parameter Space in Partial Differential Equations
cs.LG 2026-04 unverdicted novelty 6.0

Late Fusion Neural Operators disentangle state and parameter learning to outperform FNO and CAPE-FNO on advection, Burgers, and reaction-diffusion PDEs with 72% average RMSE reduction in and out of domain.
Hyperfastrl: Hypernetwork-based reinforcement learning for unified control of parametric chaotic PDEs
cs.CE 2026-04 unverdicted novelty 6.0

Hypernetworks map a forcing parameter directly to policy weights in an RL framework, enabling unified stabilization of the Kuramoto-Sivashinsky equation across regimes with KAN architectures showing strongest extrapolation.
Certified and accurate computation of function space norms of deep neural networks
math.NA 2026-03 unverdicted novelty 6.0

A certified adaptive quadrature framework computes guaranteed L^p, W^{1,p}, and W^{2,p} norms of deep neural networks by propagating interval enclosures on axis-aligned boxes.
Generalized Spherical Neural Operators: Green's Function Formulation
cs.LG 2025-12 unverdicted novelty 6.0

GSNO uses position-dependent spherical Green's functions to create flexible neural operators that adapt to non-equivariant systems on spheres while keeping spectral efficiency and grid invariance.
Differentiable Autoencoding Neural Operator for Interpretable and Integrable Latent Space Modeling
cs.LG 2025-09 unverdicted novelty 6.0

DIANO builds coarse-grid latent spaces for fluid dynamics data via neural operator encoding and decoding while integrating a differentiable PDE solver directly in the latent space for end-to-end physics-constrained training.
Deep Learning for Subspace Regression
cs.LG 2025-09 unverdicted novelty 6.0

Neural networks regress oversized subspaces for parametric problems using subspace-specific losses, with theory and experiments showing improved accuracy and smoother mappings.
Latent Space Dynamics Identification for Interface Tracking with Application to Shock-Induced Pore Collapse
physics.comp-ph 2025-07 unverdicted novelty 6.0

LaSDI-IT learns latent linear dynamics for interface tracking via a revised autoencoder and Gaussian process interpolation, achieving under 9% error and 106x speedup on shock-induced pore collapse in high explosives.
Operator Learning for Schr\"{o}dinger Equation: Unitarity, Error Bounds, and Time Generalization
stat.ML 2025-05 unverdicted novelty 6.0

A linear estimator for the Schrödinger evolution operator is introduced that enforces weak unitarity, supplies uniform prediction error bounds and time-extrapolation bounds, and reports up to 100x lower relative error...
On the definition and importance of interpretability in scientific machine learning
cs.LG 2025-05 conditional novelty 6.0

Interpretability in SciML requires mechanistic understanding rather than sparsity, and prior knowledge is often essential for interpretable scientific discovery.
Teaching Artificial Intelligence to Perform Rapid, Resolution-Invariant Grain Growth Modeling via Fourier Neural Operator
cond-mat.mtrl-sci 2025-03 unverdicted novelty 6.0

FNO surrogate model learns to predict long-term grain growth evolution from phase-field data while remaining accurate on unseen configurations and higher-resolution grids.
Generative diffusion learning for parametric partial differential equations
math.NA 2023-05 unverdicted novelty 6.0

A conditional DDPM framework is introduced to approximate solution operators for parameter-dependent PDEs, achieving accuracy comparable to FNO while recovering noise levels and providing confidence intervals.
Physics-Informed Generative Solver: Bridging Data-Driven Priors and Conservation Laws for Stable Spatiotemporal Field Reconstruction
cs.LG 2026-05 unverdicted novelty 5.0

A generative solver separates data-driven prior learning from inference-time enforcement of conservation laws using martingale-regularized score matching and physics-informed sampling for stable field reconstruction.
fPINN-DeepONet: A Physics-Informed Operator Learning Framework for Multi-term Time-fractional Mixed Diffusion-wave Equations
math.NA 2026-05 unverdicted novelty 5.0

fPINN-DeepONet integrates an L2 approximation for the Caputo derivative with DeepONet to solve multi-term time-fractional PDEs, including cases with space-time varying orders and noisy data.
Accelerated and data-efficient flow prediction in stirred tanks via physics-informed learning
cs.CE 2026-05 conditional novelty 5.0

Physics-informed constraints on implicit neural representations yield more accurate and stable predictions of stirred-tank flows than purely data-driven models when training data is scarce, with diminishing returns at...
ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms
cs.LG 2025-12 unverdicted novelty 5.0

ATHENA introduces an agentic team framework that autonomously manages the end-to-end computational research lifecycle via a knowledge-driven HENA loop to achieve validation errors of 10^{-14} in scientific computing a...
RETO: A Rotary-Enhanced Transformer Operator for High-Fidelity Prediction of Automotive Aerodynamics
eess.IV 2026-04 unverdicted novelty 4.0

RETO achieves relative L2 errors of 0.063 on ShapeNet and 0.089/0.097 on DrivAerML surface pressure/velocity, outperforming Transolver and other baselines.
XRePIT: A deep learning-computational fluid dynamics hybrid framework implemented in OpenFOAM for fast, robust, and scalable unsteady simulations
cs.LG 2025-10 unverdicted novelty 4.0

XRePIT automates residual-guided switching between neural surrogates and OpenFOAM to enable stable, up to 2.91x faster 3D unsteady flow simulations with L2 errors around 1E-03.
A Practitioner's Guide to Kolmogorov-Arnold Networks
cs.LG 2025-10 accept novelty 3.0

A systematic review of Kolmogorov-Arnold Networks that maps their relation to Kolmogorov superposition theory, MLPs, and kernels, examines basis-function design choices, summarizes performance advances, and supplies a...
Toward Artificial Intelligence Enabled Earth System Coupling
physics.ao-ph 2026-03 unverdicted novelty 2.0

AI methods can strengthen cross-domain interactions and support more coherent multi-component representations in Earth system models.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · cited by 40 Pith papers · 3 internal anchors

[1]

Bottou and O

L. Bottou and O. Bousquet. The tradeoﬀs of large scale learning. InAdvances in Neural Information Processing Systems, pages 161–168, 2008

work page 2008
[2]

S. L. Brunton, J. L. Proctor, and J. N. Kutz. Discovering governing equations from data by sparse identiﬁcation of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 113(15):3932–3937, 2016

work page 2016
[3]

Chen and H

T. Chen and H. Chen. Approximations of continuous functionals by neural networks with application to dynamic systems.IEEE Transactions on Neural Networks, 4(6):910–918, 1993

work page 1993
[4]

Approximationcapabilitytofunctionsofseveralvariables,nonlinearfunctionals, and operators by radial basis function neural networks

T.ChenandH.Chen. Approximationcapabilitytofunctionsofseveralvariables,nonlinearfunctionals, and operators by radial basis function neural networks. IEEE Transactions on Neural Networks, 6(4):904–910, 1995

work page 1995
[5]

T.ChenandH.Chen. Universalapproximationtononlinearoperatorsbyneuralnetworkswitharbitrary activation functions and its application to dynamical systems.IEEE Transactions on Neural Networks, 6(4):911–917, 1995

work page 1995
[6]

T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural ordinary diﬀerential equations. In Advances in Neural Information Processing Systems, pages 6571–6583, 2018

work page 2018
[7]

G. Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2(4):303–314, 1989

work page 1989
[8]

Dumoulin, E

V. Dumoulin, E. Perez, N. Schucher, F. Strub, H. d. Vries, A. Courville, and Y. Bengio. Feature-wise transformations. Distill, 2018. https://distill.pub/2018/feature-wise-transformations

work page 2018
[9]

N. B. Erichson, M. Muehlebach, and M. W. Mahoney. Physics-informed autoencoders for Lyapunov- stable ﬂuid ﬂow prediction.arXiv preprint arXiv:1905.10866, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905
[10]

B. Hanin. Universal function approximation by deep neural nets with bounded width and ReLU activations. arXiv preprint arXiv:1708.02691, 2017. 16

work page arXiv 2017
[11]

Multilayerfeedforward networksareuniversalapproxima- tors

K.Hornik, M.Stinchcombe, andH.White. Multilayerfeedforward networksareuniversalapproxima- tors. Neural Networks, 2(5):359–366, 1989

work page 1989
[12]

arXivpreprintarXiv:1905.10403 , 2019

J.JiaandA.R.Benson.Neuraljumpstochasticdiﬀerentialequations. arXivpreprintarXiv:1905.10403 , 2019

work page arXiv 1905
[13]

P. Jin, L. Lu, Y. Tang, and G. E. Karniadakis. Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness.arXiv preprint arXiv:1905.11427, 2019

work page arXiv 1905
[14]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classiﬁcation with deep convolutional neural networks. InAdvances in Neural Information Processing Systems, pages 1097–1105, 2012

work page 2012
[15]

L. Lu, X. Meng, Z. Mao, and G. E. Karniadakis. DeepXDE: A deep learning library for solving diﬀerential equations.arXiv preprint arXiv:1907.04502, 2019

work page arXiv 1907
[16]

L. Lu, Y. Shin, Y. Su, and G. E. Karniadakis. Dying ReLU and initialization: Theory and numerical examples. arXiv preprint arXiv:1903.06733, 2019

work page arXiv 1903
[17]

L. Lu, Y. Su, and G. E. Karniadakis. Collapse of deep and narrow neural nets.arXiv preprint arXiv:1808.04947, 2018

work page arXiv 2018
[18]

H. N. Mhaskar and N. Hahm. Neural networks for functional approximation and system identiﬁcation. Neural Computation, 9(1):143–159, 1997

work page 1997
[19]

Probabilityandcomputing: randomizationandprobabilistictechniques in algorithms and data analysis

M.MitzenmacherandE.Upfal. Probabilityandcomputing: randomizationandprobabilistictechniques in algorithms and data analysis. Cambridge university press, 2017

work page 2017
[20]

Machine learning with observers predicts complex spatiotemporal behavior

G. Neofotistos, M. Mattheakis, G. D. Barmparis, J. Hizanidis, G. P. Tsironis, and E. Kaxiras. Machine learning with observers predicts complex spatiotemporal behavior.arXiv preprint arXiv:1807.10758, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[21]

G. Pang, L. Lu, and G. E. Karniadakis. fPINNs: Fractional physics-informed neural networks.SIAM Journal on Scientiﬁc Computing, 41(4):A2603–A2626, 2019

work page 2019
[22]

J. C. Patra, R. N. Pal, B. Chatterji, and G. Panda. Identiﬁcation of nonlinear dynamic systems using functional link artiﬁcial neural networks.IEEE transactions on systems, man, and cybernetics, part b (cybernetics), 29(2):254–262, 1999

work page 1999
[23]

Datadrivengoverningequationsapproximationusingdeepneuralnetworks

T.Qin,K.Wu,andD.Xiu. Datadrivengoverningequationsapproximationusingdeepneuralnetworks. Journal of Computational Physics, 2019

work page 2019
[24]

Multistep Neural Networks for Data-driven Discovery of Nonlinear Dynamical Systems

M. Raissi, P. Perdikaris, and G. E. Karniadakis. Multistep neural networks for data-driven discovery of nonlinear dynamical systems.arXiv preprint arXiv:1801.01236, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[25]

Rossi and B

F. Rossi and B. Conan-Guez. Functional multi-layer perceptron: A non-linear tool for functional data analysis. Neural Networks, 18(1):45–60, 2005

work page 2005
[26]

S. H. Rudy, S. L. Brunton, J. L. Proctor, and J. N. Kutz. Data-driven discovery of partial diﬀerential equations. Science Advances, 3(4):e1602614, 2017

work page 2017
[27]

Sabour, N

S. Sabour, N. Frosst, and G. E. Hinton. Dynamic routing between capsules. InAdvances in Neural Information Processing Systems, pages 3856–3866, 2017. 17

work page 2017
[28]

Trask, R

N. Trask, R. G. Patel, B. J. Gross, and P. J. Atzberger. GMLS-Nets: A framework for learning from unstructured data.arXiv preprint arXiv:1909.05371, 2019

work page arXiv 1909
[29]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, pages 5998–6008, 2017

work page 2017
[30]

Winovich, K

N. Winovich, K. Ramani, and G. Lin. ConvPDE-UQ: Convolutional neural networks with quantiﬁed uncertainty for heterogeneous elliptic partial diﬀerential equations on varied domains.Journal of Computational Physics, 2019

work page 2019
[31]

Zhang, L

D. Zhang, L. Lu, L. Guo, and G. E. Karniadakis. Quantifying total uncertainty in physics-informed neuralnetworksforsolvingforwardandinversestochasticproblems. JournalofComputationalPhysics , 397:108850, 2019

work page 2019
[32]

Zhang and G

Z. Zhang and G. E. Karniadakis.Numerical methods for stochastic partial diﬀerential equations with white noise. Springer, 2017

work page 2017
[33]

Nonlineardynamicsystemidentiﬁcationusingpipelinedfunctionallinkartiﬁcial recurrent neural network.Neurocomputing, 72(13-15):3046–3054, 2009

H.ZhaoandJ.Zhang. Nonlineardynamicsystemidentiﬁcationusingpipelinedfunctionallinkartiﬁcial recurrent neural network.Neurocomputing, 72(13-15):3046–3054, 2009

work page 2009
[34]

too large

Y. Zhu, N. Zabaras, P.-S. Koutsourelakis, and P. Perdikaris. Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantiﬁcation without labeled data.Journal of Computational Physics, 394:56–81, 2019. A Neural networks to approximate nonlinear operators We list in Table 3 the main symbols and notations that are used thr...

work page 2019

[1] [1]

Bottou and O

L. Bottou and O. Bousquet. The tradeoﬀs of large scale learning. InAdvances in Neural Information Processing Systems, pages 161–168, 2008

work page 2008

[2] [2]

S. L. Brunton, J. L. Proctor, and J. N. Kutz. Discovering governing equations from data by sparse identiﬁcation of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 113(15):3932–3937, 2016

work page 2016

[3] [3]

Chen and H

T. Chen and H. Chen. Approximations of continuous functionals by neural networks with application to dynamic systems.IEEE Transactions on Neural Networks, 4(6):910–918, 1993

work page 1993

[4] [4]

Approximationcapabilitytofunctionsofseveralvariables,nonlinearfunctionals, and operators by radial basis function neural networks

T.ChenandH.Chen. Approximationcapabilitytofunctionsofseveralvariables,nonlinearfunctionals, and operators by radial basis function neural networks. IEEE Transactions on Neural Networks, 6(4):904–910, 1995

work page 1995

[5] [5]

T.ChenandH.Chen. Universalapproximationtononlinearoperatorsbyneuralnetworkswitharbitrary activation functions and its application to dynamical systems.IEEE Transactions on Neural Networks, 6(4):911–917, 1995

work page 1995

[6] [6]

T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural ordinary diﬀerential equations. In Advances in Neural Information Processing Systems, pages 6571–6583, 2018

work page 2018

[7] [7]

G. Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2(4):303–314, 1989

work page 1989

[8] [8]

Dumoulin, E

V. Dumoulin, E. Perez, N. Schucher, F. Strub, H. d. Vries, A. Courville, and Y. Bengio. Feature-wise transformations. Distill, 2018. https://distill.pub/2018/feature-wise-transformations

work page 2018

[9] [9]

N. B. Erichson, M. Muehlebach, and M. W. Mahoney. Physics-informed autoencoders for Lyapunov- stable ﬂuid ﬂow prediction.arXiv preprint arXiv:1905.10866, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905

[10] [10]

B. Hanin. Universal function approximation by deep neural nets with bounded width and ReLU activations. arXiv preprint arXiv:1708.02691, 2017. 16

work page arXiv 2017

[11] [11]

Multilayerfeedforward networksareuniversalapproxima- tors

K.Hornik, M.Stinchcombe, andH.White. Multilayerfeedforward networksareuniversalapproxima- tors. Neural Networks, 2(5):359–366, 1989

work page 1989

[12] [12]

arXivpreprintarXiv:1905.10403 , 2019

J.JiaandA.R.Benson.Neuraljumpstochasticdiﬀerentialequations. arXivpreprintarXiv:1905.10403 , 2019

work page arXiv 1905

[13] [13]

P. Jin, L. Lu, Y. Tang, and G. E. Karniadakis. Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness.arXiv preprint arXiv:1905.11427, 2019

work page arXiv 1905

[14] [14]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classiﬁcation with deep convolutional neural networks. InAdvances in Neural Information Processing Systems, pages 1097–1105, 2012

work page 2012

[15] [15]

L. Lu, X. Meng, Z. Mao, and G. E. Karniadakis. DeepXDE: A deep learning library for solving diﬀerential equations.arXiv preprint arXiv:1907.04502, 2019

work page arXiv 1907

[16] [16]

L. Lu, Y. Shin, Y. Su, and G. E. Karniadakis. Dying ReLU and initialization: Theory and numerical examples. arXiv preprint arXiv:1903.06733, 2019

work page arXiv 1903

[17] [17]

L. Lu, Y. Su, and G. E. Karniadakis. Collapse of deep and narrow neural nets.arXiv preprint arXiv:1808.04947, 2018

work page arXiv 2018

[18] [18]

H. N. Mhaskar and N. Hahm. Neural networks for functional approximation and system identiﬁcation. Neural Computation, 9(1):143–159, 1997

work page 1997

[19] [19]

Probabilityandcomputing: randomizationandprobabilistictechniques in algorithms and data analysis

M.MitzenmacherandE.Upfal. Probabilityandcomputing: randomizationandprobabilistictechniques in algorithms and data analysis. Cambridge university press, 2017

work page 2017

[20] [20]

Machine learning with observers predicts complex spatiotemporal behavior

G. Neofotistos, M. Mattheakis, G. D. Barmparis, J. Hizanidis, G. P. Tsironis, and E. Kaxiras. Machine learning with observers predicts complex spatiotemporal behavior.arXiv preprint arXiv:1807.10758, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[21] [21]

G. Pang, L. Lu, and G. E. Karniadakis. fPINNs: Fractional physics-informed neural networks.SIAM Journal on Scientiﬁc Computing, 41(4):A2603–A2626, 2019

work page 2019

[22] [22]

J. C. Patra, R. N. Pal, B. Chatterji, and G. Panda. Identiﬁcation of nonlinear dynamic systems using functional link artiﬁcial neural networks.IEEE transactions on systems, man, and cybernetics, part b (cybernetics), 29(2):254–262, 1999

work page 1999

[23] [23]

Datadrivengoverningequationsapproximationusingdeepneuralnetworks

T.Qin,K.Wu,andD.Xiu. Datadrivengoverningequationsapproximationusingdeepneuralnetworks. Journal of Computational Physics, 2019

work page 2019

[24] [24]

Multistep Neural Networks for Data-driven Discovery of Nonlinear Dynamical Systems

M. Raissi, P. Perdikaris, and G. E. Karniadakis. Multistep neural networks for data-driven discovery of nonlinear dynamical systems.arXiv preprint arXiv:1801.01236, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[25] [25]

Rossi and B

F. Rossi and B. Conan-Guez. Functional multi-layer perceptron: A non-linear tool for functional data analysis. Neural Networks, 18(1):45–60, 2005

work page 2005

[26] [26]

S. H. Rudy, S. L. Brunton, J. L. Proctor, and J. N. Kutz. Data-driven discovery of partial diﬀerential equations. Science Advances, 3(4):e1602614, 2017

work page 2017

[27] [27]

Sabour, N

S. Sabour, N. Frosst, and G. E. Hinton. Dynamic routing between capsules. InAdvances in Neural Information Processing Systems, pages 3856–3866, 2017. 17

work page 2017

[28] [28]

Trask, R

N. Trask, R. G. Patel, B. J. Gross, and P. J. Atzberger. GMLS-Nets: A framework for learning from unstructured data.arXiv preprint arXiv:1909.05371, 2019

work page arXiv 1909

[29] [29]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, pages 5998–6008, 2017

work page 2017

[30] [30]

Winovich, K

N. Winovich, K. Ramani, and G. Lin. ConvPDE-UQ: Convolutional neural networks with quantiﬁed uncertainty for heterogeneous elliptic partial diﬀerential equations on varied domains.Journal of Computational Physics, 2019

work page 2019

[31] [31]

Zhang, L

D. Zhang, L. Lu, L. Guo, and G. E. Karniadakis. Quantifying total uncertainty in physics-informed neuralnetworksforsolvingforwardandinversestochasticproblems. JournalofComputationalPhysics , 397:108850, 2019

work page 2019

[32] [32]

Zhang and G

Z. Zhang and G. E. Karniadakis.Numerical methods for stochastic partial diﬀerential equations with white noise. Springer, 2017

work page 2017

[33] [33]

Nonlineardynamicsystemidentiﬁcationusingpipelinedfunctionallinkartiﬁcial recurrent neural network.Neurocomputing, 72(13-15):3046–3054, 2009

H.ZhaoandJ.Zhang. Nonlineardynamicsystemidentiﬁcationusingpipelinedfunctionallinkartiﬁcial recurrent neural network.Neurocomputing, 72(13-15):3046–3054, 2009

work page 2009

[34] [34]

too large

Y. Zhu, N. Zabaras, P.-S. Koutsourelakis, and P. Perdikaris. Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantiﬁcation without labeled data.Journal of Computational Physics, 394:56–81, 2019. A Neural networks to approximate nonlinear operators We list in Table 3 the main symbols and notations that are used thr...

work page 2019