EML Trees Are Universal Approximators

Elie Abdo; Joe Germany; Joseph Bakarji

arxiv: 2606.23179 · v1 · pith:FFZGDYWYnew · submitted 2026-06-22 · 💻 cs.LG · cs.NA· cs.NE· cs.SC· math.NA

EML Trees Are Universal Approximators

Joe Germany , Elie Abdo , Joseph Bakarji This is my paper

Pith reviewed 2026-06-26 08:45 UTC · model grok-4.3

classification 💻 cs.LG cs.NAcs.NEcs.SCmath.NA

keywords EML treesuniversal approximationSobolev spacesfunction approximationtree structureslearning algorithmsexpressive powermachine learning

0 comments

The pith

Trees composed of EML functions can approximate any function in W^{k,∞} to arbitrary accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that tree-structured compositions of the EML function possess a universal approximation property for functions whose weak derivatives up to order k remain bounded. It adapts classical neural-network approximation theorems by constructing explicit EML trees that replicate the behavior of polynomials. This supplies both a theoretical guarantee and a concrete learning algorithm for fitting parameters within the trees. The result matters because it identifies a new compositional building block, built from a continuous NAND analogue, that inherits strong approximation guarantees while remaining trainable on practical tasks.

Core claim

We show that such trees enjoy a universal approximation property for functions in W^{k, ∞} for k ∈ ℕ, drawing on classical neural network approximation arguments while exploiting the ability to explicitly construct EML trees that mimic polynomial representations. We further propose a learning algorithm for EML-type trees equipped with fitting parameters, and demonstrate its feasibility in practical optimization problems.

What carries the argument

Tree-structured compositions of the EML (exp-minus-log) function, which serves as a continuous analogue of NAND gates and enables explicit mimicry of polynomial representations.

If this is right

EML trees inherit universal approximation from neural-network theory once the polynomial-mimicry construction is available.
A parameter-fitting algorithm exists that can optimize EML trees on concrete optimization problems.
The framework supplies theoretical grounding for using EML trees as function approximators in machine-learning settings.
Approximation holds for every natural number k, covering functions with increasing smoothness requirements.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Because the trees can mimic polynomials, they may achieve exact representation for polynomial targets under suitable parameter choices.
The same construction technique could be tested on other continuous gate analogues to check whether universal approximation transfers similarly.
Practical scaling behavior of the learning algorithm on high-dimensional inputs remains open for direct measurement.

Load-bearing premise

It is possible to explicitly construct EML trees that mimic polynomial representations for any target function in W^{k,∞}.

What would settle it

A concrete function in W^{k,∞} on a compact domain for which no finite-depth EML tree sequence converges in the Sobolev norm would refute the universal approximation claim.

Figures

Figures reproduced from arXiv: 2606.23179 by Elie Abdo, Joe Germany, Joseph Bakarji.

**Figure 1.** Figure 1: The Basic Binary Operations (a) Exponentiation to a constant n (2) (b) Exponentiation to a constant n with prefactor m (2) [PITH_FULL_IMAGE:figures/full_fig_p014_1.png] view at source ↗

**Figure 2.** Figure 2: Exponentiating 14 [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Hyperbolic Tangent (8) E Constructing Polynomials The following two subsections detail how to construct univariate and multivariate polynomials using EML trees and provide upper bounds for the number of required EML blocks in each case. E.1 Univariate Polynomials A univariate polynomial of degree k, denoted by p(x) = k ∑ i=0 aix i = a0 + a1x + a2x 2 + ⋯ + akx k , (32) with x > 0 and with positive coefficie… view at source ↗

**Figure 4.** Figure 4: Univariate Polynomial (5s − 3) Lemma 4 (Construction of Univariate Polynomials with Positive Coefficients). Given a univariate polynomial p of degree k defined over x > 0 and with positive coefficients, there exists an EML tree pθ (following the explicit construction in [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Construction of ρ N m for 2 ≤ m ≤ N − 1 and for ρ N N , we rewrite it as ρ N N (y) = 1 2 σ (α(y − N − 1 N )) + 1 2 = 1 2 ⎛ ⎝ 1 + e α(y− N−1 N ) e α(y− N−1 N ) + e −α(y− N−1 N ) ⎞ ⎠ ´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸ ¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶ positive − 1 2 e −α(y− N−1 N ) e α(y− N−1 N ) + e −α(y− N−1 N ) ´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸ ¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹… view at source ↗

read the original abstract

The recently introduced EML (Exp-Minus-Log) function acts as continuous analogue of NAND gates, providing a compositional building block capable of representing elementary functions. In this work, we study the expressive power of tree-structured compositions of EML functions. We show that such trees enjoy a universal approximation property for functions in $W^{k, \infty}$ for $k \in \mathbb N$, drawing on classical neural network approximation arguments while exploiting the ability to explicitly construct EML trees that mimic polynomial representations. We further propose a learning algorithm for EML-type trees equipped with fitting parameters, and demonstrate its feasibility in practical optimization problems. Our results establish EML trees as a theoretically grounded framework for function approximation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EML trees get a universal approximation claim for W^{k,∞} by asserting they can mimic polynomials, but that construction step is asserted rather than derived.

read the letter

The main takeaway is that EML trees are claimed to universally approximate functions in Sobolev spaces W^{k,∞} by building on classical neural network results and an explicit construction that lets the trees represent polynomials. The paper also adds a learning algorithm with fitting parameters and shows it can be run on some optimization tasks.

What is new is the universal approximation result for tree compositions of this particular EML function, which is positioned as a continuous NAND analogue. Prior work on EML is recent, so extending it to trees with this guarantee is a distinct step rather than a routine reuse of existing theorems. The practical learning part gives the theoretical claim some grounding in trainable models.

The paper does a clean job of sketching the high-level route: approximate the target by polynomials, then show the EML tree can stand in for those polynomials. That structure is standard and easy to follow. The demonstration that the learning algorithm works on concrete problems is useful for anyone who might want to try the model.

The soft spot is the polynomial-mimicking construction itself. The argument transfers the classical approximation guarantee only if EML trees can be built to exactly represent arbitrary polynomials of any degree through their composition rules. The abstract invokes this ability but supplies no derivation from the EML definition or the tree structure showing how addition, multiplication, or monomials are obtained. If the mimicking holds only approximately, only for low degrees, or under extra restrictions, the Sobolev-norm bound does not follow. The stress-test note flags exactly this gap, and without the explicit steps or error bounds it is hard to confirm the claim closes.

This is for readers working on new compositional approximators or gate analogues in approximation theory and machine learning. Someone already following universal approximation results for non-standard activations would get the most from it. It deserves a serious referee because the claim is specific, the family is new, and the combination of theory plus algorithm is worth checking even if the proof needs more detail on the construction.

I would send it to peer review. The core idea is worth a closer look.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that tree-structured compositions of the EML (Exp-Minus-Log) function, serving as a continuous analogue of NAND gates, possess a universal approximation property for functions in the Sobolev space W^{k,∞} (k ∈ ℕ). The argument draws on classical neural-network approximation results while relying on an explicit construction of EML trees that mimic polynomial representations; a learning algorithm for EML-type trees equipped with fitting parameters is also proposed and its feasibility shown on practical optimization tasks.

Significance. If the explicit construction is rigorously established with error bounds, the result would supply a new theoretically grounded class of universal approximators grounded in a continuous logic-gate primitive, extending existing NN theory to tree-structured EML compositions and offering a framework that could be directly useful for approximation and optimization.

major comments (2)

[Abstract and main theorem] Abstract and the main existence argument: the transfer of classical NN universal-approximation guarantees to EML trees for the full class W^{k,∞} rests entirely on the asserted ability to explicitly construct EML trees that mimic arbitrary polynomials. No derivation from the EML definition or the tree composition rules is supplied showing how addition, multiplication, or monomials of degree >1 are obtained, nor are approximation error bounds or verification for the Sobolev norm provided; without these steps the central claim cannot be verified.
[Learning algorithm] Learning-algorithm section: the proposed fitting procedure introduces free parameters inside the EML trees, yet it is not shown whether the optimization step preserves the polynomial-mimicking property or the resulting approximation rate for W^{k,∞}; this interaction is load-bearing for any claim that the learned trees inherit the universal-approximation guarantee.

minor comments (1)

[Preliminaries] The notation for tree depth, composition rules, and the precise definition of the EML primitive would benefit from an additional displayed equation block for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments. We respond to each major comment below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: [Abstract and main theorem] Abstract and the main existence argument: the transfer of classical NN universal-approximation guarantees to EML trees for the full class W^{k,∞} rests entirely on the asserted ability to explicitly construct EML trees that mimic arbitrary polynomials. No derivation from the EML definition or the tree composition rules is supplied showing how addition, multiplication, or monomials of degree >1 are obtained, nor are approximation error bounds or verification for the Sobolev norm provided; without these steps the central claim cannot be verified.

Authors: We agree that the explicit constructions and error bounds are not derived in sufficient detail in the current manuscript. In the revision we will insert a new subsection that starts from the EML definition and tree composition rules, shows how addition and multiplication are realized, extends the construction to monomials of arbitrary degree, and supplies the corresponding approximation error bounds together with verification in the W^{k,∞} norm. This will make the appeal to classical neural-network results fully rigorous and verifiable. revision: yes
Referee: [Learning algorithm] Learning-algorithm section: the proposed fitting procedure introduces free parameters inside the EML trees, yet it is not shown whether the optimization step preserves the polynomial-mimicking property or the resulting approximation rate for W^{k,∞}; this interaction is load-bearing for any claim that the learned trees inherit the universal-approximation guarantee.

Authors: The universal-approximation statement is a property of the entire class of EML trees; any concrete tree, including one whose parameters have been fitted by the proposed algorithm, remains an element of that class. We will revise the manuscript to state this distinction explicitly and to note that while membership in the class guarantees the existence result, the concrete approximation rate achieved by a learned tree is governed by the optimization outcome rather than by the specific polynomial-mimicking constructions used in the existence proof. A short discussion of this point will be added to the learning-algorithm section. revision: yes

Circularity Check

0 steps flagged

No circularity detected; relies on external NN theory

full rationale

The paper's central claim transfers universal approximation from classical neural network results in W^{k,∞} by asserting an explicit construction of EML trees that mimic polynomials. This is an existence argument grounded in prior external theory rather than any self-referential definition, fitted parameter renamed as prediction, or self-citation chain. No equations or steps in the abstract reduce the result to its own inputs by construction. The construction step, while undetailed here, does not constitute circularity under the specified patterns as it is not shown to be tautological or statistically forced within the paper's own framework.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unproven ability to construct EML trees that exactly mimic polynomials (invoked to transfer classical NN results) and on the background validity of those classical NN approximation theorems; the learning algorithm introduces fitting parameters whose number and initialization are not specified in the abstract.

free parameters (1)

fitting parameters inside EML trees
The abstract states that the learning algorithm equips EML-type trees with fitting parameters; these are chosen to match data and therefore count as free parameters whose values are not derived from first principles.

axioms (1)

domain assumption Classical neural network approximation arguments apply once EML trees can mimic polynomials
The abstract explicitly says the proof draws on these arguments after establishing the polynomial-mimicry construction.

pith-pipeline@v0.9.1-grok · 5653 in / 1433 out tokens · 13509 ms · 2026-06-26T08:45:45.174532+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 6 linked inside Pith

[1]

Demystifying black-box models with symbolic metamodels

Ahmed M Alaa and Mihaela Van der Schaar. Demystifying black-box models with symbolic metamodels. Advances in neural information processing systems, 32, 2019

2019
[2]

Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information theory, 39(3):930–945, 2002

Andrew R Barron. Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information theory, 39(3):930–945, 2002

2002
[3]

Discovering governing equations from data by sparse identification of nonlinear dynamical systems.Proceedings of the national academy of sciences, 113(15):3932–3937, 2016

Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems.Proceedings of the national academy of sciences, 113(15):3932–3937, 2016

2016
[4]

Discovering symbolic models from deep learning with inductive biases.Advances in neural information processing systems, 33:17429–17442, 2020

Miles Cranmer, Alvaro Sanchez Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer, David Spergel, and Shirley Ho. Discovering symbolic models from deep learning with inductive biases.Advances in neural information processing systems, 33:17429–17442, 2020

2020
[5]

Approximation by superpositions of a sigmoidal function.Mathematics of control, signals and systems, 2(4):303–314, 1989

George Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of control, signals and systems, 2(4):303–314, 1989

1989
[6]

On the approximation of functions by tanh neural networks.Neural Networks, 143:732–750, 2021

Tim De Ryck, Samuel Lanthaler, and Siddhartha Mishra. On the approximation of functions by tanh neural networks.Neural Networks, 143:732–750, 2021

2021
[7]

Error analysis for physics-informed neural networks (pinns) approximating kolmogorov pdes.Advances in Computational Mathematics, 48(6):79, 2022

Tim De Ryck and Siddhartha Mishra. Error analysis for physics-informed neural networks (pinns) approximating kolmogorov pdes.Advances in Computational Mathematics, 48(6):79, 2022

2022
[8]

Generic bounds on the approximation error for physics-informed (and) operator learning.Advances in Neural Information Processing Systems, 35:10945–10958, 2022

Tim De Ryck and Siddhartha Mishra. Generic bounds on the approximation error for physics-informed (and) operator learning.Advances in Neural Information Processing Systems, 35:10945–10958, 2022

2022
[9]

On polynomial approximation in sobolev spaces.SIAM journal on numerical analysis, 20(5):985–988, 1983

Ricardo G Durán. On polynomial approximation in sobolev spaces.SIAM journal on numerical analysis, 20(5):985–988, 1983

1983
[10]

Sparse identification of nonlinear dynamics and koopman operators with shallow recurrent decoder networks.Proceedings of the National Academy of Sciences, 123(16):e2508144123, 2026

Mars Liyao Gao, Jan P Williams, and J Nathan Kutz. Sparse identification of nonlinear dynamics and koopman operators with shallow recurrent decoder networks.Proceedings of the National Academy of Sciences, 123(16):e2508144123, 2026

2026
[11]

Error bounds for approximations with deep relu neural networks in w s, p norms.Analysis and Applications, 18(05):803–859, 2020

Ingo Gühring, Gitta Kutyniok, and Philipp Petersen. Error bounds for approximations with deep relu neural networks in w s, p norms.Analysis and Applications, 18(05):803–859, 2020

2020
[12]

Harris, K

Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Shepp...

2020
[13]

Multilayer feedforward networks are universal approximators.Neural networks, 2(5):359–366, 1989

Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators.Neural networks, 2(5):359–366, 1989

1989
[14]

On universal approximation and error bounds for fourier neural operators.Journal of Machine Learning Research, 22(290):1–76, 2021

Nikola Kovachki, Samuel Lanthaler, and Siddhartha Mishra. On universal approximation and error bounds for fourier neural operators.Journal of Machine Learning Research, 22(290):1–76, 2021

2021
[15]

Neural oscillators are universal.Advances in Neural Information Processing Systems, 36:46786–46806, 2023

Samuel Lanthaler, T Konstantin Rusch, and Siddhartha Mishra. Neural oscillators are universal.Advances in Neural Information Processing Systems, 36:46786–46806, 2023

2023
[16]

Fourier neural operator for parametric partial differential equations.arXiv preprint arXiv:2010.08895, 2020

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations.arXiv preprint arXiv:2010.08895, 2020

Pith/arXiv arXiv 2010
[17]

Neural operator: Graph kernel network for partial differential equations

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485, 2020

Pith/arXiv arXiv 2003
[18]

Kan: Kolmogorov-arnold networks.arXiv preprint arXiv:2404.19756, 2024

Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljaˇci´c, Thomas Y Hou, and Max Tegmark. Kan: Kolmogorov-arnold networks.arXiv preprint arXiv:2404.19756, 2024

Pith/arXiv arXiv 2024
[19]

Learning nonlinear operators via deeponet based on the universal approximation theorem of operators.Nature machine intelligence, 3(3):218–229, 2021

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators.Nature machine intelligence, 3(3):218–229, 2021. 10

2021
[20]

Extrapolation and learning equations.arXiv preprint arXiv:1610.02995, 2016

Georg Martius and Christoph H Lampert. Extrapolation and learning equations.arXiv preprint arXiv:1610.02995, 2016

Pith/arXiv arXiv 2016
[21]

Estimates on the generalization error of physics-informed neural networks for approximating pdes.IMA Journal of Numerical Analysis, 43(1):1–43, 2023

Siddhartha Mishra and Roberto Molinaro. Estimates on the generalization error of physics-informed neural networks for approximating pdes.IMA Journal of Numerical Analysis, 43(1):1–43, 2023

2023
[22]

All elementary functions from a single binary operator.arXiv preprint arXiv:2603.21852, 2026

Andrzej Odrzywołek. All elementary functions from a single binary operator.arXiv preprint arXiv:2603.21852, 2026

Pith/arXiv arXiv 2026
[23]

Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019

2019
[24]

Optimal approximation of piecewise smooth functions using deep relu neural networks.Neural Networks, 108:296–330, 2018

Philipp Petersen and Felix V oigtlaender. Optimal approximation of piecewise smooth functions using deep relu neural networks.Neural Networks, 108:296–330, 2018

2018
[25]

Approximation theory of the mlp model in neural networks.Acta numerica, 8:143–195, 1999

Allan Pinkus. Approximation theory of the mlp model in neural networks.Acta numerica, 8:143–195, 1999

1999
[26]

Data-driven discovery of partial differential equations.Science advances, 3(4):e1602614, 2017

Samuel H Rudy, Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Data-driven discovery of partial differential equations.Science advances, 3(4):e1602614, 2017

2017
[27]

Nonparametric regression using deep neural networks with ReLU activation function.Ann

Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with ReLU activation function.Ann. Stat., 48(4):1875–1897, August 2020

2020
[28]

Ai feynman: A physics-inspired method for symbolic regression

Silviu-Marian Udrescu and Max Tegmark. Ai feynman: A physics-inspired method for symbolic regression. Science advances, 6(16):eaay2631, 2020

2020
[29]

A note on polynomial approximation in sobolev spaces.ESAIM: Mathematical Modelling and Numerical Analysis, 33(4):715–719, 1999

Rüdiger Verfürth. A note on polynomial approximation in sobolev spaces.ESAIM: Mathematical Modelling and Numerical Analysis, 33(4):715–719, 1999

1999
[30]

On the expressiveness and spectral bias of kans.arXiv preprint arXiv:2410.01803, 2024

Yixuan Wang, Jonathan W Siegel, Ziming Liu, and Thomas Y Hou. On the expressiveness and spectral bias of kans.arXiv preprint arXiv:2410.01803, 2024

arXiv 2024
[31]

Common task framework for a critical evaluation of scientific machine learning algorithms.Advances in Neural Information Processing Systems, 38, 2026

Philippe Wyder, Judah Goldfeder, Alexey Yermakov, Yue Zhao, Stefano Riva, Jan Williams, David Zoro, Amy Rude, Matteo Tomasetto, Joe Germany, et al. Common task framework for a critical evaluation of scientific machine learning algorithms.Advances in Neural Information Processing Systems, 38, 2026

2026
[32]

Error bounds for approximations with deep relu networks.Neural networks, 94:103–114, 2017

Dmitry Yarotsky. Error bounds for approximations with deep relu networks.Neural networks, 94:103–114, 2017

2017
[33]

The seismic wavefield common task framework.arXiv preprint arXiv:2512.19927, 2025

Alexey Yermakov, Yue Zhao, Marine Denolle, Yiyu Ni, Philippe M Wyder, Judah Goldfeder, Stefano Riva, Jan Williams, David Zoro, Amy Sara Rude, et al. The seismic wavefield common task framework.arXiv preprint arXiv:2512.19927, 2025. 11 A Notation We briefly recall the multi-index notation. Given a d-tuple of non-negative integers α∈Nd 0 with d∈N, we have ∣...

Pith/arXiv arXiv 2025
[34]

Lemma 2.Let Ω⊂Rd be an open and bounded set of diameter 0<h<e −1/2d−3/2which is star-shaped with respect to every point in an open ball B⊂Ω with diameter ρh

and [29]. Lemma 2.Let Ω⊂Rd be an open and bounded set of diameter 0<h<e −1/2d−3/2which is star-shaped with respect to every point in an open ball B⊂Ω with diameter ρh. Then, for every f∈W s,∞(Ω), there exists a polynomial ˆfof degree at mosts−1such that for anyk∈N 0 withk<s, it holds that ∥f−ˆf∥W k,∞(Ω) ≤π1/4√s(d √ deh)s−k (s−k−1)! ∣f∣W s,∞(Ω) .(29) We re...
[35]

C(d, k, s, f) N s−k , (58) with suitably chosenε. Whereas for1≤k<s, we have ∥˜f N −˜pN∥W k,∞(I N i ) = ⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪ ∑ j∈AN (f−pN j )Φ N j ⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟...
[36]

H Proof of Corollary 1 Proof.Letη>0

C(d,0, s, f) N s , (71) where we have chosenεto satisfy ε= δ2 min{1, √ C(d, k, s, f)} N2s+2d+kkk2(k+1)d dmax{1,∥f∥1/2 W k,∞([0,1]d)} .(72) 22 Additionally, for1≤k<s, (72) implies that ε 1 k+1 ≥ δmin{1, √ C(d, k, s, f)} N s+d+1k2d √ dmax{1,∥f∥1/2 W k,∞([0,1]d)} .(73) Therefore, we obtain the bound ∥f−˜pN θ ∥W k,∞((0,1]d) ≤3d(1+δ)(2(k+1)) 3k max{Rk,ln k (βN...

[1] [1]

Demystifying black-box models with symbolic metamodels

Ahmed M Alaa and Mihaela Van der Schaar. Demystifying black-box models with symbolic metamodels. Advances in neural information processing systems, 32, 2019

2019

[2] [2]

Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information theory, 39(3):930–945, 2002

Andrew R Barron. Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information theory, 39(3):930–945, 2002

2002

[3] [3]

Discovering governing equations from data by sparse identification of nonlinear dynamical systems.Proceedings of the national academy of sciences, 113(15):3932–3937, 2016

Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems.Proceedings of the national academy of sciences, 113(15):3932–3937, 2016

2016

[4] [4]

Discovering symbolic models from deep learning with inductive biases.Advances in neural information processing systems, 33:17429–17442, 2020

Miles Cranmer, Alvaro Sanchez Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer, David Spergel, and Shirley Ho. Discovering symbolic models from deep learning with inductive biases.Advances in neural information processing systems, 33:17429–17442, 2020

2020

[5] [5]

Approximation by superpositions of a sigmoidal function.Mathematics of control, signals and systems, 2(4):303–314, 1989

George Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of control, signals and systems, 2(4):303–314, 1989

1989

[6] [6]

On the approximation of functions by tanh neural networks.Neural Networks, 143:732–750, 2021

Tim De Ryck, Samuel Lanthaler, and Siddhartha Mishra. On the approximation of functions by tanh neural networks.Neural Networks, 143:732–750, 2021

2021

[7] [7]

Error analysis for physics-informed neural networks (pinns) approximating kolmogorov pdes.Advances in Computational Mathematics, 48(6):79, 2022

Tim De Ryck and Siddhartha Mishra. Error analysis for physics-informed neural networks (pinns) approximating kolmogorov pdes.Advances in Computational Mathematics, 48(6):79, 2022

2022

[8] [8]

Generic bounds on the approximation error for physics-informed (and) operator learning.Advances in Neural Information Processing Systems, 35:10945–10958, 2022

Tim De Ryck and Siddhartha Mishra. Generic bounds on the approximation error for physics-informed (and) operator learning.Advances in Neural Information Processing Systems, 35:10945–10958, 2022

2022

[9] [9]

On polynomial approximation in sobolev spaces.SIAM journal on numerical analysis, 20(5):985–988, 1983

Ricardo G Durán. On polynomial approximation in sobolev spaces.SIAM journal on numerical analysis, 20(5):985–988, 1983

1983

[10] [10]

Sparse identification of nonlinear dynamics and koopman operators with shallow recurrent decoder networks.Proceedings of the National Academy of Sciences, 123(16):e2508144123, 2026

Mars Liyao Gao, Jan P Williams, and J Nathan Kutz. Sparse identification of nonlinear dynamics and koopman operators with shallow recurrent decoder networks.Proceedings of the National Academy of Sciences, 123(16):e2508144123, 2026

2026

[11] [11]

Error bounds for approximations with deep relu neural networks in w s, p norms.Analysis and Applications, 18(05):803–859, 2020

Ingo Gühring, Gitta Kutyniok, and Philipp Petersen. Error bounds for approximations with deep relu neural networks in w s, p norms.Analysis and Applications, 18(05):803–859, 2020

2020

[12] [12]

Harris, K

Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Shepp...

2020

[13] [13]

Multilayer feedforward networks are universal approximators.Neural networks, 2(5):359–366, 1989

Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators.Neural networks, 2(5):359–366, 1989

1989

[14] [14]

On universal approximation and error bounds for fourier neural operators.Journal of Machine Learning Research, 22(290):1–76, 2021

Nikola Kovachki, Samuel Lanthaler, and Siddhartha Mishra. On universal approximation and error bounds for fourier neural operators.Journal of Machine Learning Research, 22(290):1–76, 2021

2021

[15] [15]

Neural oscillators are universal.Advances in Neural Information Processing Systems, 36:46786–46806, 2023

Samuel Lanthaler, T Konstantin Rusch, and Siddhartha Mishra. Neural oscillators are universal.Advances in Neural Information Processing Systems, 36:46786–46806, 2023

2023

[16] [16]

Fourier neural operator for parametric partial differential equations.arXiv preprint arXiv:2010.08895, 2020

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations.arXiv preprint arXiv:2010.08895, 2020

Pith/arXiv arXiv 2010

[17] [17]

Neural operator: Graph kernel network for partial differential equations

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485, 2020

Pith/arXiv arXiv 2003

[18] [18]

Kan: Kolmogorov-arnold networks.arXiv preprint arXiv:2404.19756, 2024

Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljaˇci´c, Thomas Y Hou, and Max Tegmark. Kan: Kolmogorov-arnold networks.arXiv preprint arXiv:2404.19756, 2024

Pith/arXiv arXiv 2024

[19] [19]

Learning nonlinear operators via deeponet based on the universal approximation theorem of operators.Nature machine intelligence, 3(3):218–229, 2021

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators.Nature machine intelligence, 3(3):218–229, 2021. 10

2021

[20] [20]

Extrapolation and learning equations.arXiv preprint arXiv:1610.02995, 2016

Georg Martius and Christoph H Lampert. Extrapolation and learning equations.arXiv preprint arXiv:1610.02995, 2016

Pith/arXiv arXiv 2016

[21] [21]

Estimates on the generalization error of physics-informed neural networks for approximating pdes.IMA Journal of Numerical Analysis, 43(1):1–43, 2023

Siddhartha Mishra and Roberto Molinaro. Estimates on the generalization error of physics-informed neural networks for approximating pdes.IMA Journal of Numerical Analysis, 43(1):1–43, 2023

2023

[22] [22]

All elementary functions from a single binary operator.arXiv preprint arXiv:2603.21852, 2026

Andrzej Odrzywołek. All elementary functions from a single binary operator.arXiv preprint arXiv:2603.21852, 2026

Pith/arXiv arXiv 2026

[23] [23]

Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019

2019

[24] [24]

Optimal approximation of piecewise smooth functions using deep relu neural networks.Neural Networks, 108:296–330, 2018

Philipp Petersen and Felix V oigtlaender. Optimal approximation of piecewise smooth functions using deep relu neural networks.Neural Networks, 108:296–330, 2018

2018

[25] [25]

Approximation theory of the mlp model in neural networks.Acta numerica, 8:143–195, 1999

Allan Pinkus. Approximation theory of the mlp model in neural networks.Acta numerica, 8:143–195, 1999

1999

[26] [26]

Data-driven discovery of partial differential equations.Science advances, 3(4):e1602614, 2017

Samuel H Rudy, Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Data-driven discovery of partial differential equations.Science advances, 3(4):e1602614, 2017

2017

[27] [27]

Nonparametric regression using deep neural networks with ReLU activation function.Ann

Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with ReLU activation function.Ann. Stat., 48(4):1875–1897, August 2020

2020

[28] [28]

Ai feynman: A physics-inspired method for symbolic regression

Silviu-Marian Udrescu and Max Tegmark. Ai feynman: A physics-inspired method for symbolic regression. Science advances, 6(16):eaay2631, 2020

2020

[29] [29]

A note on polynomial approximation in sobolev spaces.ESAIM: Mathematical Modelling and Numerical Analysis, 33(4):715–719, 1999

Rüdiger Verfürth. A note on polynomial approximation in sobolev spaces.ESAIM: Mathematical Modelling and Numerical Analysis, 33(4):715–719, 1999

1999

[30] [30]

On the expressiveness and spectral bias of kans.arXiv preprint arXiv:2410.01803, 2024

Yixuan Wang, Jonathan W Siegel, Ziming Liu, and Thomas Y Hou. On the expressiveness and spectral bias of kans.arXiv preprint arXiv:2410.01803, 2024

arXiv 2024

[31] [31]

Common task framework for a critical evaluation of scientific machine learning algorithms.Advances in Neural Information Processing Systems, 38, 2026

Philippe Wyder, Judah Goldfeder, Alexey Yermakov, Yue Zhao, Stefano Riva, Jan Williams, David Zoro, Amy Rude, Matteo Tomasetto, Joe Germany, et al. Common task framework for a critical evaluation of scientific machine learning algorithms.Advances in Neural Information Processing Systems, 38, 2026

2026

[32] [32]

Error bounds for approximations with deep relu networks.Neural networks, 94:103–114, 2017

Dmitry Yarotsky. Error bounds for approximations with deep relu networks.Neural networks, 94:103–114, 2017

2017

[33] [33]

The seismic wavefield common task framework.arXiv preprint arXiv:2512.19927, 2025

Alexey Yermakov, Yue Zhao, Marine Denolle, Yiyu Ni, Philippe M Wyder, Judah Goldfeder, Stefano Riva, Jan Williams, David Zoro, Amy Sara Rude, et al. The seismic wavefield common task framework.arXiv preprint arXiv:2512.19927, 2025. 11 A Notation We briefly recall the multi-index notation. Given a d-tuple of non-negative integers α∈Nd 0 with d∈N, we have ∣...

Pith/arXiv arXiv 2025

[34] [34]

Lemma 2.Let Ω⊂Rd be an open and bounded set of diameter 0<h<e −1/2d−3/2which is star-shaped with respect to every point in an open ball B⊂Ω with diameter ρh

and [29]. Lemma 2.Let Ω⊂Rd be an open and bounded set of diameter 0<h<e −1/2d−3/2which is star-shaped with respect to every point in an open ball B⊂Ω with diameter ρh. Then, for every f∈W s,∞(Ω), there exists a polynomial ˆfof degree at mosts−1such that for anyk∈N 0 withk<s, it holds that ∥f−ˆf∥W k,∞(Ω) ≤π1/4√s(d √ deh)s−k (s−k−1)! ∣f∣W s,∞(Ω) .(29) We re...

[35] [35]

C(d, k, s, f) N s−k , (58) with suitably chosenε. Whereas for1≤k<s, we have ∥˜f N −˜pN∥W k,∞(I N i ) = ⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪ ∑ j∈AN (f−pN j )Φ N j ⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟⎥rro⟪⟪⟩r⟪⌟...

[36] [36]

H Proof of Corollary 1 Proof.Letη>0

C(d,0, s, f) N s , (71) where we have chosenεto satisfy ε= δ2 min{1, √ C(d, k, s, f)} N2s+2d+kkk2(k+1)d dmax{1,∥f∥1/2 W k,∞([0,1]d)} .(72) 22 Additionally, for1≤k<s, (72) implies that ε 1 k+1 ≥ δmin{1, √ C(d, k, s, f)} N s+d+1k2d √ dmax{1,∥f∥1/2 W k,∞([0,1]d)} .(73) Therefore, we obtain the bound ∥f−˜pN θ ∥W k,∞((0,1]d) ≤3d(1+δ)(2(k+1)) 3k max{Rk,ln k (βN...