Consistency of Learned Sparse Grid Quadrature Rules using NeuralODEs

Emil Partow; Hanno Gottschalk; Tobias J. Riedlinger

arxiv: 2507.01533 · v2 · pith:2MUYDPCPnew · submitted 2025-07-02 · 🧮 math.NA · cs.LG· cs.NA· math.PR

Consistency of Learned Sparse Grid Quadrature Rules using NeuralODEs

Hanno Gottschalk , Emil Partow , Tobias J. Riedlinger This is my paper

Pith reviewed 2026-05-21 23:40 UTC · model grok-4.3

classification 🧮 math.NA cs.LGcs.NAmath.PR

keywords sparse grid quadratureneural ODEtransport mapPAC consistencynumerical integrationmixed regularityClenshaw-CurtisKnothe-Rosenblatt map

0 comments

The pith

Learned transport maps with sparse-grid quadrature produce PAC-consistent integral estimators as sample size and quadrature budget both grow.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that expected values can be computed by pushing a tractable product measure through a learned map and then applying Clenshaw-Curtis sparse grids on the source. For arbitrary targets the map is realized as the time-one flow of a ReLU^{k+1} neural ODE, which yields an isotropic C^k regularity and the corresponding slower rate that still improves with smoothness k. For product targets the Knothe-Rosenblatt map is diagonal, so a simple empirical quantile estimator recovers the full mixed-derivative rate. In both regimes the resulting estimator converges to the true integral with high probability once the number of samples n and the number of quadrature nodes m are allowed to increase without bound.

Core claim

The LtI estimator is PAC consistent: with high probability the numerical integral approximates the true value to arbitrary accuracy as both the sample size n and the quadrature budget m tend to infinity. The analysis splits into a general regime where a neural-ODE flow supplies an isotropic C^k map and the rate m^{-k/d}(log m)^{(d-1)(k/d+1)}, and a diagonal regime where empirical quantile transport recovers the optimal mixed rate m^{-k}(log m)^{(d-1)(k+1)}.

What carries the argument

The structural fact that composition of a C^k_mix-regular function with a C^1-diffeomorphism preserves C^k_mix regularity only when the diffeomorphism is diagonal up to a permutation of coordinates; this fact forces the split into general and product-target regimes and determines which quadrature rate is available.

If this is right

As n and m both tend to infinity the numerical value converges to the true expectation with high probability in either regime.
Increasing the smoothness index k together with the matching ReLU order reduces the dimension dependence of the error in the general regime.
When the target is a product measure the lightweight empirical-quantile estimator already achieves the full mixed-derivative convergence rate without neural-ODE training.
The method therefore supplies a consistent quadrature procedure for both arbitrary and product-structured targets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

For problems whose target density factors, one could skip neural-ODE training entirely and still obtain the optimal sparse-grid rate.
The same structural preservation argument might be applied to other quadrature families once a suitable regularity class is identified.
Detecting or enforcing near-diagonal structure in the learned map could serve as a practical switch between the two regimes inside a single code base.

Load-bearing premise

Composition of a mixed-regularity function with a diffeomorphism preserves the mixed regularity only when the diffeomorphism is diagonal up to a permutation of coordinates.

What would settle it

A concrete counter-example in which a non-diagonal C^1 diffeomorphism composed with a C^k_mix function remains C^k_mix would remove the regime distinction and collapse the claimed rate separation between the general and diagonal cases.

Figures

Figures reproduced from arXiv: 2507.01533 by Emil Partow, Hanno Gottschalk, Tobias J. Riedlinger.

**Figure 1.** Figure 1: Comparison of Clenshaw-Curtis nodes on [−1, 1]2 : full tensor grid I 2 (6,6) (left) versus sparse grid S 2 6+2 (right) using closed non-linear growth. Due to their nested nodes and relatively straightforward construction, Clenshaw–Curtis rules are a practical default for high-dimensional integration, particularly in sparse grid settings. Nevertheless, any (sparse) quadrature rule can, in principle, be in… view at source ↗

read the original abstract

We prove consistency of a recently proposed scheme that evaluates expected values by composing a learned transport map with Clenshaw--Curtis sparse-grid quadrature on a tractable product source. Our analysis hinges on the structural fact that composition of a $C^k_{\mathrm{mix}}$-regular function -- which carries the fast quadrature rate $m^{-k}(\log m)^{(d-1)(k+1)}$ -- with a $C^1$-diffeomorphism can only be guaranteed to be $C^k_{\mathrm{mix}}$ itself, if the diffeomorphism is diagonal up to a permutation of coordinates. The fast rate is therefore available exclusively for product targets, and the analysis splits into two regimes. In the general regime of arbitrary targets, we learn the transport as the time-one flow of a $\mathrm{ReLU}^{k+1}$-neural ODE trained by maximum likelihood. The resulting flow lies in the isotropic space $C^k$ and yields the rate $m^{-k/d}(\log m)^{(d-1)(k/d+1)}$, with raising the density smoothness $k$ and the matched activation order $k+1$ mitigating the curse of dimensionality at the cost of harder optimization. In the diagonal regime of product targets, the Knothe--Rosenblatt map is itself diagonal and we estimate it pointwise via empirical quantile transport, a lightweight alternative that recovers the full mixed-regularity rate. In both regimes, the resulting LtI estimator is PAC (probably approximately correct) consistent. With high probability the numerical integral approximates the true value to arbitrary accuracy as both the sample size $n$ and the quadrature budget $m$ tend to infinity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proves PAC consistency of the LtI estimator for approximating expectations E[f(X)] by composing a learned transport map T with Clenshaw-Curtis sparse-grid quadrature on a product source measure. The analysis splits into a general regime, where T is realized as the time-one flow of a ReLU^{k+1} neural ODE trained by MLE and yields the isotropic rate m^{-k/d}(log m)^{(d-1)(k/d+1)}, and a diagonal/product-target regime, where the Knothe-Rosenblatt map is estimated by empirical quantiles and recovers the faster mixed-regularity rate m^{-k}(log m)^{(d-1)(k+1)}. The key structural fact invoked is that C^k_mix regularity is preserved under composition with a C^1-diffeomorphism only when the map is diagonal (up to coordinate permutation). In both regimes the double limit n,m→∞ implies that the numerical integral converges to the true value with high probability.

Significance. If the central claims hold, the work supplies the first rigorous PAC-consistency analysis for learned-transport-plus-sparse-grid quadrature, with explicit rates that quantify the trade-off between smoothness, dimension, and optimization difficulty. The observation that mixed regularity survives composition only for diagonal maps cleanly explains why the fast rate is available exclusively for product targets; the neural-ODE construction for the general case is a natural and technically sound way to obtain an isotropic C^k map. These results are directly relevant to high-dimensional integration and uncertainty quantification.

major comments (2)

[§3.2] §3.2 (general regime): the translation from MLE convergence of the neural-ODE parameters to a C^k-norm bound on the learned flow map is only sketched; the hidden constants that depend on the activation order k+1 and the Lipschitz constants of the vector field must be tracked explicitly to confirm that the quadrature error term indeed decays as m^{-k/d}.
[Theorem 4.3] Theorem 4.3 (PAC statement): the probability 1-δ appears only in the final display; it is not shown how δ interacts with the sample size n when the transport map is estimated from n i.i.d. draws, nor whether the double limit is taken in a specific order (n first, then m, or jointly).

minor comments (2)

[§2] The definition of the mixed Sobolev space C^k_mix and the precise statement of the structural preservation lemma should be moved from the appendix to §2 so that the regime split is self-contained.
[Abstract] Notation: the symbol LtI is introduced in the abstract but never expanded; a parenthetical “Learned transport + Integration” on first use would help readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment, and recommendation for minor revision. The comments identify opportunities to strengthen the exposition of the proofs. We respond to each major comment below and will incorporate the suggested clarifications in the revised manuscript.

read point-by-point responses

Referee: [§3.2] §3.2 (general regime): the translation from MLE convergence of the neural-ODE parameters to a C^k-norm bound on the learned flow map is only sketched; the hidden constants that depend on the activation order k+1 and the Lipschitz constants of the vector field must be tracked explicitly to confirm that the quadrature error term indeed decays as m^{-k/d}.

Authors: We agree that the argument in Section 3.2 would benefit from a more explicit accounting of constants. In the revision we will expand the derivation to track the dependence of the C^k-norm bound on the ReLU^{k+1} activation order and on the Lipschitz constants of the neural-ODE vector field. This will make the passage from MLE parameter convergence to the isotropic quadrature rate m^{-k/d}(log m)^{(d-1)(k/d+1)} fully rigorous and confirm that the hidden factors remain independent of m. revision: yes
Referee: [Theorem 4.3] Theorem 4.3 (PAC statement): the probability 1-δ appears only in the final display; it is not shown how δ interacts with the sample size n when the transport map is estimated from n i.i.d. draws, nor whether the double limit is taken in a specific order (n first, then m, or jointly).

Authors: We thank the referee for highlighting this point. The proof of Theorem 4.3 first lets n→∞ (for fixed m) to obtain a high-probability bound 1-δ on the transport-map error via concentration of the neural-ODE MLE, after which m→∞ controls the quadrature error. The dependence of δ on n is inherited from the sample-complexity bounds for the MLE estimator. In the revised manuscript we will state this ordering explicitly in the theorem and proof, and we will also indicate the joint-limit regime in which n grows sufficiently rapidly with m to keep the overall failure probability below any prescribed δ. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation rests on external convergence results

full rationale

The paper establishes PAC consistency of the LtI estimator via a decomposition into (i) convergence of the learned transport (Neural ODE MLE in the general case or empirical quantile transport in the diagonal case) and (ii) standard consistency of Clenshaw-Curtis sparse-grid quadrature applied to the composed integrand. The structural fact on C^k_mix preservation under diagonal diffeomorphisms is invoked solely to recover the fast sparse-grid rate in the product-target regime; it is presented as an independent observation supporting the rate split rather than a self-referential definition. No equation or claim reduces the double-limit consistency result to a fitted parameter renamed as a prediction, nor does any load-bearing step collapse to a self-citation chain. The argument therefore remains self-contained against external benchmarks from approximation theory and numerical quadrature.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on standard background results from mixed Sobolev spaces and neural ODE approximation theory; no new free parameters or invented entities are introduced.

axioms (1)

domain assumption Composition of a C^k_mix-regular function with a C^1-diffeomorphism preserves C^k_mix regularity only when the diffeomorphism is diagonal up to coordinate permutation.
This structural fact is explicitly identified in the abstract as the hinge of the entire rate analysis.

pith-pipeline@v0.9.0 · 5845 in / 1309 out tokens · 64279 ms · 2026-05-21T23:40:56.853609+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The fast rate is therefore available exclusively for product targets... In the general regime... yields the rate m^{-k/d}(log m)^{(d-1)(k/d+1)}
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat induction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 5.12 (PAC-Learnability of Sparse Grid Integration)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

[1]

A convenient infinite dimensional framework for generative adversarial learning

Hayk Asatryan, Hanno Gottschalk, Marieke Lippert, and Matthias Rottmann. “A convenient infinite dimensional framework for generative adversarial learning”. In: Electronic Journal of Statistics 17.1 (2023), pp. 391–428. doi: 10.1214/23-EJS2104

work page doi:10.1214/23-ejs2104 2023
[2]

Si- multaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations

Denis Belomestny, Alexey Naumov, Nikita Puchkin, and Sergey Samsonov. “Si- multaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations”. In: Neural Networks 161 (2023), pp. 242–253. issn: 0893-6080. doi: https://doi.org/10.1016/j.neunet.2023.01. 035

work page doi:10.1016/j.neunet.2023.01 2023
[3]

Sparse grids

Hans-Joachim Bungartz and Michael Griebel. “Sparse grids”. In: Acta numerica 13 (2004), pp. 147–269

work page 2004
[4]

Lu-net: Invertible neural networks based on matrix factorization

Robin Chan, Sarina Penquitt, and Hanno Gottschalk. “Lu-net: Invertible neural networks based on matrix factorization”. In: 2023 International Joint Conference on Neural Networks (IJCNN) . IEEE. 2023, pp. 1–10

work page 2023
[5]

Neural Ordinary Differential Equations

Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. “Neural Ordinary Differential Equations”. In: Advances in Neural Information Pro- cessing Systems . Ed. by S. Bengio et al. Vol. 31. Curran Associates, Inc., 2018. url: https : / / proceedings . neurips . cc / paper _ files / paper / 2018 / file / 69386f6bb1dfed68692a24c8686939b9-Paper.pdf

work page 2018
[6]

A method for numerical integration on an automatic computer

C. W. Clenshaw and A. R. Curtis. “A method for numerical integration on an automatic computer”. In: Numerische Mathematik 2.1 (Jan. 1960), pp. 197–205. issn: 0945-3245. doi: 10.1007/BF01386223

work page doi:10.1007/bf01386223 1960
[7]

Dalbey et al

Keith R. Dalbey et al. Dakota, A Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sen- sitivity Analysis: Theory Manual (V.6.15) . Tech. rep. Chapter 3: Stochastic Expan- sion Methods. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States), Nov. 2021. doi: 10.2172/1832293

work page doi:10.2172/1832293 2021
[8]

Bootstrap methods and their application

Anthony Christopher Davison and David Victor Hinkley. Bootstrap methods and their application. 1. Cambridge university press, 1997

work page 1997
[9]

Density estimation using Real NVP

Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. “Density estimation using Real NVP”. In: International Conference on Learning Representations. 2017

work page 2017
[10]

Claudia Drygala, Hanno Gottschalk, Thomas Kruse, S´ egol` ene Martin, and Annika M¨ utze.Learning Brenier Potentials with Convex Generative Adversarial Neural Net- works. 2025. arXiv: 2504.19779 [cs.LG]

work page arXiv 2025
[11]

Ehrhardt, Hanno Gottschalk, and Tobias J

Emily C. Ehrhardt, Hanno Gottschalk, and Tobias J. Riedlinger. Numerical and statistical analysis of NeuralODE with Runge-Kutta time integration . 2025. arXiv: 2503.10729 [cs.LG]

work page arXiv 2025
[12]

Ernst, Hanno Gottschalk, Toni Kowalewitz, and Patrick Kr¨ uger.Learning to Integrate

Oliver G. Ernst, Hanno Gottschalk, Toni Kowalewitz, and Patrick Kr¨ uger.Learning to Integrate. 2025. arXiv: 2506.11801 [math.NA]

work page arXiv 2025
[13]

Markov chain Monte Carlo: stochastic simulation for Bayesian inference

Dani Gamerman and Hedibert F Lopes. Markov chain Monte Carlo: stochastic simulation for Bayesian inference . Chapman and Hall/CRC, 2006

work page 2006
[14]

Numerical integration using sparse grids

Thomas Gerstner and Michael Griebel. “Numerical integration using sparse grids”. In: Numerical Algorithms 18.3 (Jan. 1998), pp. 209–232. issn: 1572-9265. doi: 10. 1023/A:1019129717644

work page 1998
[15]

Probability in High Dimension

Ramon van Handel. Probability in High Dimension. https://web.math.princeton. edu/~rvan/APC550.pdf. Lecture notes for APC 550, Princeton University. 2016

work page 2016
[16]

Ordinary Differential Equations

Philip Hartman. Ordinary Differential Equations. Second. Society for Industrial and Applied Mathematics, 2002. doi: 10.1137/1.9780898719222

work page doi:10.1137/1.9780898719222 2002
[17]

Probability Inequalities for Sums of Bounded Random Vari- ables

Wassily Hoeffding. “Probability Inequalities for Sums of Bounded Random Vari- ables”. In: Journal of the American Statistical Association 58.301 (1963), pp. 13–

work page 1963
[18]

Probability Inequalities for Sums of Bounded Ra n- dom Variables

doi: 10.1080/01621459.1963.10500830. REFERENCES 29

work page doi:10.1080/01621459.1963.10500830 1963
[19]

Perturbation Bounds for Determinants and Characteristic Polynomials

Ilse C. F. Ipsen and Rizwana Rehman. “Perturbation Bounds for Determinants and Characteristic Polynomials”. In:SIAM Journal on Matrix Analysis and Applications 30.2 (2008), pp. 762–776. doi: 10.1137/070704770

work page doi:10.1137/070704770 2008
[20]

On the optimum rate of transmitting information

J. H. B. Kemperman. “On the optimum rate of transmitting information”. In: Prob- ability and Information Theory. Ed. by M. Behara, K. Krickeberg, and J. Wolfowitz. Berlin, Heidelberg: Springer Berlin Heidelberg, 1969, pp. 126–169. isbn: 978-3-540- 36098-8

work page 1969
[21]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. “Flow Matching for Generative Modeling”. In: 11th International Conference on Learning Representations, ICLR 2023. 2023

work page 2023
[22]

Higher Chain Formula Proved by Combinatorics

Tsoy-Wo Ma. “Higher Chain Formula Proved by Combinatorics”. In: Electronic Journal of Combinatorics 16.1 (June 2009), p. 21

work page 2009
[23]

Distribution learning via neural dif- ferential equations: minimal energy regularization and approximation theory

Youssef Marzouk, Zhi Ren, and Jakob Zech. Distribution learning via neural dif- ferential equations: minimal energy regularization and approximation theory . 2025. arXiv: 2502.03795 [cs.LG]

work page arXiv 2025
[24]

Distribution Learning via Neural Differential Equations: A Nonparametric Statistical Perspec- tive

Youssef Marzouk, Zhi (Robert) Ren, Sven Wang, and Jakob Zech. “Distribution Learning via Neural Differential Equations: A Nonparametric Statistical Perspec- tive”. In: Journal of Machine Learning Research 25.232 (2024), pp. 1–61

work page 2024
[25]

Oberwolfeach Seminar Un- certainty Quantification (oral presentation)

Near-)optimality of quasi-Monte Carlo methods and sub-optimality of Gauss – Her- mite sparse-grid quadrature in Gaussian Sobolev spaces . Oberwolfeach Seminar Un- certainty Quantification (oral presentation). Apr. 2025

work page 2025
[26]

High dimensional integration of smooth functions over cubes

Erich Novak and Klaus Ritter. “High dimensional integration of smooth functions over cubes”. In: Numerische Mathematik 75.1 (Nov. 1996), pp. 79–97. issn: 0945-

work page 1996
[27]

doi: 10.1007/s002110050231

work page doi:10.1007/s002110050231
[28]

Simple Cubature Formulas with High Polynomial Exactness

Erich Novak and Klaus Ritter. “Simple Cubature Formulas with High Polynomial Exactness”. In: Constructive Approximation 15.4 (1999), pp. 499–522. issn: 1432-

work page 1999
[29]

doi: 10.1007/s003659900119

work page doi:10.1007/s003659900119
[30]

The Curse of Dimension and a Universal Method For Numerical Integration

Erich Novak and Klaus Ritter. “The Curse of Dimension and a Universal Method For Numerical Integration”. In: Multivariate Approximation and Splines . Ed. by G¨ unther N¨ urnberger, Jochen W. Schmidt, and Guido Walz. Basel: Birkh¨ auser Basel, 1997, pp. 177–187. isbn: 978-3-0348-8871-4

work page 1997
[31]

Normalizing flows for probabilistic modeling and inference

George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. “Normalizing flows for probabilistic modeling and inference”. In: Journal of Machine Learning Research 22.57 (2021), pp. 1–64

work page 2021
[32]

Information and information stability of random variables and processes

Mark S Pinsker. “Information and information stability of random variables and processes”. In: Holden-Day (1964)

work page 1964
[33]

Information Theory: From Coding to Learning

Yury Polyanskiy and Yihong Wu. Information Theory: From Coding to Learning . Cambridge University Press, 2025

work page 2025
[34]

Variational inference with normalizing flows

Danilo Rezende and Shakir Mohamed. “Variational inference with normalizing flows”. In: International conference on machine learning . PMLR. 2015, pp. 1530–1538

work page 2015
[35]

New advances in universal approximation with neural networks of minimal width

Dennis Rochau, Robin Chan, and Hanno Gottschalk. New advances in universal approximation with neural networks of minimal width . 2024. arXiv: 2411 . 08735 [cs.NE]

work page 2024
[36]

Optimal transport for applied mathematicians

Filippo Santambrogio. Optimal transport for applied mathematicians . en. 1st ed. Progress in nonlinear differential equations and their applications. Basel, Switzer- land: Birkhauser, Oct. 2015

work page 2015
[37]

Polynomial Splines

Larry Schumaker. “Polynomial Splines”. In: Spline Functions: Basic Theory . Cam- bridge Mathematical Library. Cambridge University Press, 2007, pp. 108–188

work page 2007
[38]

Understanding machine learning: From theory to algorithms

Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms . Cambridge university press, 2014

work page 2014
[39]

Product-integration with the Clenshaw-Curtis and related points

Ian H. Sloan and W. E. Smith. “Product-integration with the Clenshaw-Curtis and related points”. In: Numerische Mathematik 30.4 (Dec. 1978), pp. 415–428. issn: 0945-3245. doi: 10.1007/BF01398509. 30 REFERENCES

work page doi:10.1007/bf01398509 1978
[40]

Quadrature and Interpolation Formulas for Tensor Products of Certain Classes of Functions

Sergei Abramovich Smolyak. “Quadrature and Interpolation Formulas for Tensor Products of Certain Classes of Functions”. In: Doklady Akademii Nauk. Vol. 148. 5. Russian Academy of Sciences. 1963, pp. 1042–1045

work page 1963
[41]

Fast construction of Fej´ er and Clenshaw–Curtis rules for general weight functions

Alvise Sommariva. “Fast construction of Fej´ er and Clenshaw–Curtis rules for general weight functions”. In: Computers & Mathematics with Applications 65.4 (2013), pp. 682–693. issn: 0898-1221. doi: 10.1016/j.camwa.2012.12.004

work page doi:10.1016/j.camwa.2012.12.004 2013
[42]

Introduction to uncertainty quantification

Timothy John Sullivan. Introduction to uncertainty quantification. Vol. 63. Springer, 2015

work page 2015
[43]

Fast Construction of the Fej´ er and Clenshaw–Curtis Quadrature Rules

J¨ org Waldvogel. “Fast Construction of the Fej´ er and Clenshaw–Curtis Quadrature Rules”. In: BIT Numerical Mathematics 46.1 (Mar. 2006), pp. 195–202. issn: 1572-

work page 2006
[44]

doi: 10.1007/s10543-006-0045-4

work page doi:10.1007/s10543-006-0045-4
[45]

Explicit Cost Bounds of Algorithms for Multivariate Tensor Product Problems

G. W. Wasilkowski and H. Wozniakowski. “Explicit Cost Bounds of Algorithms for Multivariate Tensor Product Problems”. In: Journal of Complexity 11.1 (1995), pp. 1–56. issn: 0885-064X. doi: 10.1006/jcom.1995.1001

work page doi:10.1006/jcom.1995.1001 1995

[1] [1]

A convenient infinite dimensional framework for generative adversarial learning

Hayk Asatryan, Hanno Gottschalk, Marieke Lippert, and Matthias Rottmann. “A convenient infinite dimensional framework for generative adversarial learning”. In: Electronic Journal of Statistics 17.1 (2023), pp. 391–428. doi: 10.1214/23-EJS2104

work page doi:10.1214/23-ejs2104 2023

[2] [2]

Si- multaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations

Denis Belomestny, Alexey Naumov, Nikita Puchkin, and Sergey Samsonov. “Si- multaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations”. In: Neural Networks 161 (2023), pp. 242–253. issn: 0893-6080. doi: https://doi.org/10.1016/j.neunet.2023.01. 035

work page doi:10.1016/j.neunet.2023.01 2023

[3] [3]

Sparse grids

Hans-Joachim Bungartz and Michael Griebel. “Sparse grids”. In: Acta numerica 13 (2004), pp. 147–269

work page 2004

[4] [4]

Lu-net: Invertible neural networks based on matrix factorization

Robin Chan, Sarina Penquitt, and Hanno Gottschalk. “Lu-net: Invertible neural networks based on matrix factorization”. In: 2023 International Joint Conference on Neural Networks (IJCNN) . IEEE. 2023, pp. 1–10

work page 2023

[5] [5]

Neural Ordinary Differential Equations

Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. “Neural Ordinary Differential Equations”. In: Advances in Neural Information Pro- cessing Systems . Ed. by S. Bengio et al. Vol. 31. Curran Associates, Inc., 2018. url: https : / / proceedings . neurips . cc / paper _ files / paper / 2018 / file / 69386f6bb1dfed68692a24c8686939b9-Paper.pdf

work page 2018

[6] [6]

A method for numerical integration on an automatic computer

C. W. Clenshaw and A. R. Curtis. “A method for numerical integration on an automatic computer”. In: Numerische Mathematik 2.1 (Jan. 1960), pp. 197–205. issn: 0945-3245. doi: 10.1007/BF01386223

work page doi:10.1007/bf01386223 1960

[7] [7]

Dalbey et al

Keith R. Dalbey et al. Dakota, A Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sen- sitivity Analysis: Theory Manual (V.6.15) . Tech. rep. Chapter 3: Stochastic Expan- sion Methods. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States), Nov. 2021. doi: 10.2172/1832293

work page doi:10.2172/1832293 2021

[8] [8]

Bootstrap methods and their application

Anthony Christopher Davison and David Victor Hinkley. Bootstrap methods and their application. 1. Cambridge university press, 1997

work page 1997

[9] [9]

Density estimation using Real NVP

Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. “Density estimation using Real NVP”. In: International Conference on Learning Representations. 2017

work page 2017

[10] [10]

Claudia Drygala, Hanno Gottschalk, Thomas Kruse, S´ egol` ene Martin, and Annika M¨ utze.Learning Brenier Potentials with Convex Generative Adversarial Neural Net- works. 2025. arXiv: 2504.19779 [cs.LG]

work page arXiv 2025

[11] [11]

Ehrhardt, Hanno Gottschalk, and Tobias J

Emily C. Ehrhardt, Hanno Gottschalk, and Tobias J. Riedlinger. Numerical and statistical analysis of NeuralODE with Runge-Kutta time integration . 2025. arXiv: 2503.10729 [cs.LG]

work page arXiv 2025

[12] [12]

Ernst, Hanno Gottschalk, Toni Kowalewitz, and Patrick Kr¨ uger.Learning to Integrate

Oliver G. Ernst, Hanno Gottschalk, Toni Kowalewitz, and Patrick Kr¨ uger.Learning to Integrate. 2025. arXiv: 2506.11801 [math.NA]

work page arXiv 2025

[13] [13]

Markov chain Monte Carlo: stochastic simulation for Bayesian inference

Dani Gamerman and Hedibert F Lopes. Markov chain Monte Carlo: stochastic simulation for Bayesian inference . Chapman and Hall/CRC, 2006

work page 2006

[14] [14]

Numerical integration using sparse grids

Thomas Gerstner and Michael Griebel. “Numerical integration using sparse grids”. In: Numerical Algorithms 18.3 (Jan. 1998), pp. 209–232. issn: 1572-9265. doi: 10. 1023/A:1019129717644

work page 1998

[15] [15]

Probability in High Dimension

Ramon van Handel. Probability in High Dimension. https://web.math.princeton. edu/~rvan/APC550.pdf. Lecture notes for APC 550, Princeton University. 2016

work page 2016

[16] [16]

Ordinary Differential Equations

Philip Hartman. Ordinary Differential Equations. Second. Society for Industrial and Applied Mathematics, 2002. doi: 10.1137/1.9780898719222

work page doi:10.1137/1.9780898719222 2002

[17] [17]

Probability Inequalities for Sums of Bounded Random Vari- ables

Wassily Hoeffding. “Probability Inequalities for Sums of Bounded Random Vari- ables”. In: Journal of the American Statistical Association 58.301 (1963), pp. 13–

work page 1963

[18] [18]

Probability Inequalities for Sums of Bounded Ra n- dom Variables

doi: 10.1080/01621459.1963.10500830. REFERENCES 29

work page doi:10.1080/01621459.1963.10500830 1963

[19] [19]

Perturbation Bounds for Determinants and Characteristic Polynomials

Ilse C. F. Ipsen and Rizwana Rehman. “Perturbation Bounds for Determinants and Characteristic Polynomials”. In:SIAM Journal on Matrix Analysis and Applications 30.2 (2008), pp. 762–776. doi: 10.1137/070704770

work page doi:10.1137/070704770 2008

[20] [20]

On the optimum rate of transmitting information

J. H. B. Kemperman. “On the optimum rate of transmitting information”. In: Prob- ability and Information Theory. Ed. by M. Behara, K. Krickeberg, and J. Wolfowitz. Berlin, Heidelberg: Springer Berlin Heidelberg, 1969, pp. 126–169. isbn: 978-3-540- 36098-8

work page 1969

[21] [21]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. “Flow Matching for Generative Modeling”. In: 11th International Conference on Learning Representations, ICLR 2023. 2023

work page 2023

[22] [22]

Higher Chain Formula Proved by Combinatorics

Tsoy-Wo Ma. “Higher Chain Formula Proved by Combinatorics”. In: Electronic Journal of Combinatorics 16.1 (June 2009), p. 21

work page 2009

[23] [23]

Distribution learning via neural dif- ferential equations: minimal energy regularization and approximation theory

Youssef Marzouk, Zhi Ren, and Jakob Zech. Distribution learning via neural dif- ferential equations: minimal energy regularization and approximation theory . 2025. arXiv: 2502.03795 [cs.LG]

work page arXiv 2025

[24] [24]

Distribution Learning via Neural Differential Equations: A Nonparametric Statistical Perspec- tive

Youssef Marzouk, Zhi (Robert) Ren, Sven Wang, and Jakob Zech. “Distribution Learning via Neural Differential Equations: A Nonparametric Statistical Perspec- tive”. In: Journal of Machine Learning Research 25.232 (2024), pp. 1–61

work page 2024

[25] [25]

Oberwolfeach Seminar Un- certainty Quantification (oral presentation)

Near-)optimality of quasi-Monte Carlo methods and sub-optimality of Gauss – Her- mite sparse-grid quadrature in Gaussian Sobolev spaces . Oberwolfeach Seminar Un- certainty Quantification (oral presentation). Apr. 2025

work page 2025

[26] [26]

High dimensional integration of smooth functions over cubes

Erich Novak and Klaus Ritter. “High dimensional integration of smooth functions over cubes”. In: Numerische Mathematik 75.1 (Nov. 1996), pp. 79–97. issn: 0945-

work page 1996

[27] [27]

doi: 10.1007/s002110050231

work page doi:10.1007/s002110050231

[28] [28]

Simple Cubature Formulas with High Polynomial Exactness

Erich Novak and Klaus Ritter. “Simple Cubature Formulas with High Polynomial Exactness”. In: Constructive Approximation 15.4 (1999), pp. 499–522. issn: 1432-

work page 1999

[29] [29]

doi: 10.1007/s003659900119

work page doi:10.1007/s003659900119

[30] [30]

The Curse of Dimension and a Universal Method For Numerical Integration

Erich Novak and Klaus Ritter. “The Curse of Dimension and a Universal Method For Numerical Integration”. In: Multivariate Approximation and Splines . Ed. by G¨ unther N¨ urnberger, Jochen W. Schmidt, and Guido Walz. Basel: Birkh¨ auser Basel, 1997, pp. 177–187. isbn: 978-3-0348-8871-4

work page 1997

[31] [31]

Normalizing flows for probabilistic modeling and inference

George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. “Normalizing flows for probabilistic modeling and inference”. In: Journal of Machine Learning Research 22.57 (2021), pp. 1–64

work page 2021

[32] [32]

Information and information stability of random variables and processes

Mark S Pinsker. “Information and information stability of random variables and processes”. In: Holden-Day (1964)

work page 1964

[33] [33]

Information Theory: From Coding to Learning

Yury Polyanskiy and Yihong Wu. Information Theory: From Coding to Learning . Cambridge University Press, 2025

work page 2025

[34] [34]

Variational inference with normalizing flows

Danilo Rezende and Shakir Mohamed. “Variational inference with normalizing flows”. In: International conference on machine learning . PMLR. 2015, pp. 1530–1538

work page 2015

[35] [35]

New advances in universal approximation with neural networks of minimal width

Dennis Rochau, Robin Chan, and Hanno Gottschalk. New advances in universal approximation with neural networks of minimal width . 2024. arXiv: 2411 . 08735 [cs.NE]

work page 2024

[36] [36]

Optimal transport for applied mathematicians

Filippo Santambrogio. Optimal transport for applied mathematicians . en. 1st ed. Progress in nonlinear differential equations and their applications. Basel, Switzer- land: Birkhauser, Oct. 2015

work page 2015

[37] [37]

Polynomial Splines

Larry Schumaker. “Polynomial Splines”. In: Spline Functions: Basic Theory . Cam- bridge Mathematical Library. Cambridge University Press, 2007, pp. 108–188

work page 2007

[38] [38]

Understanding machine learning: From theory to algorithms

Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms . Cambridge university press, 2014

work page 2014

[39] [39]

Product-integration with the Clenshaw-Curtis and related points

Ian H. Sloan and W. E. Smith. “Product-integration with the Clenshaw-Curtis and related points”. In: Numerische Mathematik 30.4 (Dec. 1978), pp. 415–428. issn: 0945-3245. doi: 10.1007/BF01398509. 30 REFERENCES

work page doi:10.1007/bf01398509 1978

[40] [40]

Quadrature and Interpolation Formulas for Tensor Products of Certain Classes of Functions

Sergei Abramovich Smolyak. “Quadrature and Interpolation Formulas for Tensor Products of Certain Classes of Functions”. In: Doklady Akademii Nauk. Vol. 148. 5. Russian Academy of Sciences. 1963, pp. 1042–1045

work page 1963

[41] [41]

Fast construction of Fej´ er and Clenshaw–Curtis rules for general weight functions

Alvise Sommariva. “Fast construction of Fej´ er and Clenshaw–Curtis rules for general weight functions”. In: Computers & Mathematics with Applications 65.4 (2013), pp. 682–693. issn: 0898-1221. doi: 10.1016/j.camwa.2012.12.004

work page doi:10.1016/j.camwa.2012.12.004 2013

[42] [42]

Introduction to uncertainty quantification

Timothy John Sullivan. Introduction to uncertainty quantification. Vol. 63. Springer, 2015

work page 2015

[43] [43]

Fast Construction of the Fej´ er and Clenshaw–Curtis Quadrature Rules

J¨ org Waldvogel. “Fast Construction of the Fej´ er and Clenshaw–Curtis Quadrature Rules”. In: BIT Numerical Mathematics 46.1 (Mar. 2006), pp. 195–202. issn: 1572-

work page 2006

[44] [44]

doi: 10.1007/s10543-006-0045-4

work page doi:10.1007/s10543-006-0045-4

[45] [45]

Explicit Cost Bounds of Algorithms for Multivariate Tensor Product Problems

G. W. Wasilkowski and H. Wozniakowski. “Explicit Cost Bounds of Algorithms for Multivariate Tensor Product Problems”. In: Journal of Complexity 11.1 (1995), pp. 1–56. issn: 0885-064X. doi: 10.1006/jcom.1995.1001

work page doi:10.1006/jcom.1995.1001 1995