pith. sign in

arxiv: 2507.01533 · v2 · pith:2MUYDPCPnew · submitted 2025-07-02 · 🧮 math.NA · cs.LG· cs.NA· math.PR

Consistency of Learned Sparse Grid Quadrature Rules using NeuralODEs

Pith reviewed 2026-05-21 23:40 UTC · model grok-4.3

classification 🧮 math.NA cs.LGcs.NAmath.PR
keywords sparse grid quadratureneural ODEtransport mapPAC consistencynumerical integrationmixed regularityClenshaw-CurtisKnothe-Rosenblatt map
0
0 comments X

The pith

Learned transport maps with sparse-grid quadrature produce PAC-consistent integral estimators as sample size and quadrature budget both grow.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that expected values can be computed by pushing a tractable product measure through a learned map and then applying Clenshaw-Curtis sparse grids on the source. For arbitrary targets the map is realized as the time-one flow of a ReLU^{k+1} neural ODE, which yields an isotropic C^k regularity and the corresponding slower rate that still improves with smoothness k. For product targets the Knothe-Rosenblatt map is diagonal, so a simple empirical quantile estimator recovers the full mixed-derivative rate. In both regimes the resulting estimator converges to the true integral with high probability once the number of samples n and the number of quadrature nodes m are allowed to increase without bound.

Core claim

The LtI estimator is PAC consistent: with high probability the numerical integral approximates the true value to arbitrary accuracy as both the sample size n and the quadrature budget m tend to infinity. The analysis splits into a general regime where a neural-ODE flow supplies an isotropic C^k map and the rate m^{-k/d}(log m)^{(d-1)(k/d+1)}, and a diagonal regime where empirical quantile transport recovers the optimal mixed rate m^{-k}(log m)^{(d-1)(k+1)}.

What carries the argument

The structural fact that composition of a C^k_mix-regular function with a C^1-diffeomorphism preserves C^k_mix regularity only when the diffeomorphism is diagonal up to a permutation of coordinates; this fact forces the split into general and product-target regimes and determines which quadrature rate is available.

If this is right

  • As n and m both tend to infinity the numerical value converges to the true expectation with high probability in either regime.
  • Increasing the smoothness index k together with the matching ReLU order reduces the dimension dependence of the error in the general regime.
  • When the target is a product measure the lightweight empirical-quantile estimator already achieves the full mixed-derivative convergence rate without neural-ODE training.
  • The method therefore supplies a consistent quadrature procedure for both arbitrary and product-structured targets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • For problems whose target density factors, one could skip neural-ODE training entirely and still obtain the optimal sparse-grid rate.
  • The same structural preservation argument might be applied to other quadrature families once a suitable regularity class is identified.
  • Detecting or enforcing near-diagonal structure in the learned map could serve as a practical switch between the two regimes inside a single code base.

Load-bearing premise

Composition of a mixed-regularity function with a diffeomorphism preserves the mixed regularity only when the diffeomorphism is diagonal up to a permutation of coordinates.

What would settle it

A concrete counter-example in which a non-diagonal C^1 diffeomorphism composed with a C^k_mix function remains C^k_mix would remove the regime distinction and collapse the claimed rate separation between the general and diagonal cases.

Figures

Figures reproduced from arXiv: 2507.01533 by Emil Partow, Hanno Gottschalk, Tobias J. Riedlinger.

Figure 1
Figure 1. Figure 1: Comparison of Clenshaw-Curtis nodes on [−1, 1]2 : full tensor grid I 2 (6,6) (left) versus sparse grid S 2 6+2 (right) using closed non-linear growth. Due to their nested nodes and relatively straightforward construction, Clen￾shaw–Curtis rules are a practical default for high-dimensional integration, partic￾ularly in sparse grid settings. Nevertheless, any (sparse) quadrature rule can, in principle, be in… view at source ↗
read the original abstract

We prove consistency of a recently proposed scheme that evaluates expected values by composing a learned transport map with Clenshaw--Curtis sparse-grid quadrature on a tractable product source. Our analysis hinges on the structural fact that composition of a $C^k_{\mathrm{mix}}$-regular function -- which carries the fast quadrature rate $m^{-k}(\log m)^{(d-1)(k+1)}$ -- with a $C^1$-diffeomorphism can only be guaranteed to be $C^k_{\mathrm{mix}}$ itself, if the diffeomorphism is diagonal up to a permutation of coordinates. The fast rate is therefore available exclusively for product targets, and the analysis splits into two regimes. In the general regime of arbitrary targets, we learn the transport as the time-one flow of a $\mathrm{ReLU}^{k+1}$-neural ODE trained by maximum likelihood. The resulting flow lies in the isotropic space $C^k$ and yields the rate $m^{-k/d}(\log m)^{(d-1)(k/d+1)}$, with raising the density smoothness $k$ and the matched activation order $k+1$ mitigating the curse of dimensionality at the cost of harder optimization. In the diagonal regime of product targets, the Knothe--Rosenblatt map is itself diagonal and we estimate it pointwise via empirical quantile transport, a lightweight alternative that recovers the full mixed-regularity rate. In both regimes, the resulting LtI estimator is PAC (probably approximately correct) consistent. With high probability the numerical integral approximates the true value to arbitrary accuracy as both the sample size $n$ and the quadrature budget $m$ tend to infinity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proves PAC consistency of the LtI estimator for approximating expectations E[f(X)] by composing a learned transport map T with Clenshaw-Curtis sparse-grid quadrature on a product source measure. The analysis splits into a general regime, where T is realized as the time-one flow of a ReLU^{k+1} neural ODE trained by MLE and yields the isotropic rate m^{-k/d}(log m)^{(d-1)(k/d+1)}, and a diagonal/product-target regime, where the Knothe-Rosenblatt map is estimated by empirical quantiles and recovers the faster mixed-regularity rate m^{-k}(log m)^{(d-1)(k+1)}. The key structural fact invoked is that C^k_mix regularity is preserved under composition with a C^1-diffeomorphism only when the map is diagonal (up to coordinate permutation). In both regimes the double limit n,m→∞ implies that the numerical integral converges to the true value with high probability.

Significance. If the central claims hold, the work supplies the first rigorous PAC-consistency analysis for learned-transport-plus-sparse-grid quadrature, with explicit rates that quantify the trade-off between smoothness, dimension, and optimization difficulty. The observation that mixed regularity survives composition only for diagonal maps cleanly explains why the fast rate is available exclusively for product targets; the neural-ODE construction for the general case is a natural and technically sound way to obtain an isotropic C^k map. These results are directly relevant to high-dimensional integration and uncertainty quantification.

major comments (2)
  1. [§3.2] §3.2 (general regime): the translation from MLE convergence of the neural-ODE parameters to a C^k-norm bound on the learned flow map is only sketched; the hidden constants that depend on the activation order k+1 and the Lipschitz constants of the vector field must be tracked explicitly to confirm that the quadrature error term indeed decays as m^{-k/d}.
  2. [Theorem 4.3] Theorem 4.3 (PAC statement): the probability 1-δ appears only in the final display; it is not shown how δ interacts with the sample size n when the transport map is estimated from n i.i.d. draws, nor whether the double limit is taken in a specific order (n first, then m, or jointly).
minor comments (2)
  1. [§2] The definition of the mixed Sobolev space C^k_mix and the precise statement of the structural preservation lemma should be moved from the appendix to §2 so that the regime split is self-contained.
  2. [Abstract] Notation: the symbol LtI is introduced in the abstract but never expanded; a parenthetical “Learned transport + Integration” on first use would help readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment, and recommendation for minor revision. The comments identify opportunities to strengthen the exposition of the proofs. We respond to each major comment below and will incorporate the suggested clarifications in the revised manuscript.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (general regime): the translation from MLE convergence of the neural-ODE parameters to a C^k-norm bound on the learned flow map is only sketched; the hidden constants that depend on the activation order k+1 and the Lipschitz constants of the vector field must be tracked explicitly to confirm that the quadrature error term indeed decays as m^{-k/d}.

    Authors: We agree that the argument in Section 3.2 would benefit from a more explicit accounting of constants. In the revision we will expand the derivation to track the dependence of the C^k-norm bound on the ReLU^{k+1} activation order and on the Lipschitz constants of the neural-ODE vector field. This will make the passage from MLE parameter convergence to the isotropic quadrature rate m^{-k/d}(log m)^{(d-1)(k/d+1)} fully rigorous and confirm that the hidden factors remain independent of m. revision: yes

  2. Referee: [Theorem 4.3] Theorem 4.3 (PAC statement): the probability 1-δ appears only in the final display; it is not shown how δ interacts with the sample size n when the transport map is estimated from n i.i.d. draws, nor whether the double limit is taken in a specific order (n first, then m, or jointly).

    Authors: We thank the referee for highlighting this point. The proof of Theorem 4.3 first lets n→∞ (for fixed m) to obtain a high-probability bound 1-δ on the transport-map error via concentration of the neural-ODE MLE, after which m→∞ controls the quadrature error. The dependence of δ on n is inherited from the sample-complexity bounds for the MLE estimator. In the revised manuscript we will state this ordering explicitly in the theorem and proof, and we will also indicate the joint-limit regime in which n grows sufficiently rapidly with m to keep the overall failure probability below any prescribed δ. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation rests on external convergence results

full rationale

The paper establishes PAC consistency of the LtI estimator via a decomposition into (i) convergence of the learned transport (Neural ODE MLE in the general case or empirical quantile transport in the diagonal case) and (ii) standard consistency of Clenshaw-Curtis sparse-grid quadrature applied to the composed integrand. The structural fact on C^k_mix preservation under diagonal diffeomorphisms is invoked solely to recover the fast sparse-grid rate in the product-target regime; it is presented as an independent observation supporting the rate split rather than a self-referential definition. No equation or claim reduces the double-limit consistency result to a fitted parameter renamed as a prediction, nor does any load-bearing step collapse to a self-citation chain. The argument therefore remains self-contained against external benchmarks from approximation theory and numerical quadrature.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on standard background results from mixed Sobolev spaces and neural ODE approximation theory; no new free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Composition of a C^k_mix-regular function with a C^1-diffeomorphism preserves C^k_mix regularity only when the diffeomorphism is diagonal up to coordinate permutation.
    This structural fact is explicitly identified in the abstract as the hinge of the entire rate analysis.

pith-pipeline@v0.9.0 · 5845 in / 1309 out tokens · 64279 ms · 2026-05-21T23:40:56.853609+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

  1. [1]

    A convenient infinite dimensional framework for generative adversarial learning

    Hayk Asatryan, Hanno Gottschalk, Marieke Lippert, and Matthias Rottmann. “A convenient infinite dimensional framework for generative adversarial learning”. In: Electronic Journal of Statistics 17.1 (2023), pp. 391–428. doi: 10.1214/23-EJS2104

  2. [2]

    Si- multaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations

    Denis Belomestny, Alexey Naumov, Nikita Puchkin, and Sergey Samsonov. “Si- multaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations”. In: Neural Networks 161 (2023), pp. 242–253. issn: 0893-6080. doi: https://doi.org/10.1016/j.neunet.2023.01. 035

  3. [3]

    Sparse grids

    Hans-Joachim Bungartz and Michael Griebel. “Sparse grids”. In: Acta numerica 13 (2004), pp. 147–269

  4. [4]

    Lu-net: Invertible neural networks based on matrix factorization

    Robin Chan, Sarina Penquitt, and Hanno Gottschalk. “Lu-net: Invertible neural networks based on matrix factorization”. In: 2023 International Joint Conference on Neural Networks (IJCNN) . IEEE. 2023, pp. 1–10

  5. [5]

    Neural Ordinary Differential Equations

    Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. “Neural Ordinary Differential Equations”. In: Advances in Neural Information Pro- cessing Systems . Ed. by S. Bengio et al. Vol. 31. Curran Associates, Inc., 2018. url: https : / / proceedings . neurips . cc / paper _ files / paper / 2018 / file / 69386f6bb1dfed68692a24c8686939b9-Paper.pdf

  6. [6]

    A method for numerical integration on an automatic computer

    C. W. Clenshaw and A. R. Curtis. “A method for numerical integration on an automatic computer”. In: Numerische Mathematik 2.1 (Jan. 1960), pp. 197–205. issn: 0945-3245. doi: 10.1007/BF01386223

  7. [7]

    Dalbey et al

    Keith R. Dalbey et al. Dakota, A Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sen- sitivity Analysis: Theory Manual (V.6.15) . Tech. rep. Chapter 3: Stochastic Expan- sion Methods. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States), Nov. 2021. doi: 10.2172/1832293

  8. [8]

    Bootstrap methods and their application

    Anthony Christopher Davison and David Victor Hinkley. Bootstrap methods and their application. 1. Cambridge university press, 1997

  9. [9]

    Density estimation using Real NVP

    Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. “Density estimation using Real NVP”. In: International Conference on Learning Representations. 2017

  10. [10]

    Claudia Drygala, Hanno Gottschalk, Thomas Kruse, S´ egol` ene Martin, and Annika M¨ utze.Learning Brenier Potentials with Convex Generative Adversarial Neural Net- works. 2025. arXiv: 2504.19779 [cs.LG]

  11. [11]

    Ehrhardt, Hanno Gottschalk, and Tobias J

    Emily C. Ehrhardt, Hanno Gottschalk, and Tobias J. Riedlinger. Numerical and statistical analysis of NeuralODE with Runge-Kutta time integration . 2025. arXiv: 2503.10729 [cs.LG]

  12. [12]

    Ernst, Hanno Gottschalk, Toni Kowalewitz, and Patrick Kr¨ uger.Learning to Integrate

    Oliver G. Ernst, Hanno Gottschalk, Toni Kowalewitz, and Patrick Kr¨ uger.Learning to Integrate. 2025. arXiv: 2506.11801 [math.NA]

  13. [13]

    Markov chain Monte Carlo: stochastic simulation for Bayesian inference

    Dani Gamerman and Hedibert F Lopes. Markov chain Monte Carlo: stochastic simulation for Bayesian inference . Chapman and Hall/CRC, 2006

  14. [14]

    Numerical integration using sparse grids

    Thomas Gerstner and Michael Griebel. “Numerical integration using sparse grids”. In: Numerical Algorithms 18.3 (Jan. 1998), pp. 209–232. issn: 1572-9265. doi: 10. 1023/A:1019129717644

  15. [15]

    Probability in High Dimension

    Ramon van Handel. Probability in High Dimension. https://web.math.princeton. edu/~rvan/APC550.pdf. Lecture notes for APC 550, Princeton University. 2016

  16. [16]

    Ordinary Differential Equations

    Philip Hartman. Ordinary Differential Equations. Second. Society for Industrial and Applied Mathematics, 2002. doi: 10.1137/1.9780898719222

  17. [17]

    Probability Inequalities for Sums of Bounded Random Vari- ables

    Wassily Hoeffding. “Probability Inequalities for Sums of Bounded Random Vari- ables”. In: Journal of the American Statistical Association 58.301 (1963), pp. 13–

  18. [18]
  19. [19]

    Perturbation Bounds for Determinants and Characteristic Polynomials

    Ilse C. F. Ipsen and Rizwana Rehman. “Perturbation Bounds for Determinants and Characteristic Polynomials”. In:SIAM Journal on Matrix Analysis and Applications 30.2 (2008), pp. 762–776. doi: 10.1137/070704770

  20. [20]

    On the optimum rate of transmitting information

    J. H. B. Kemperman. “On the optimum rate of transmitting information”. In: Prob- ability and Information Theory. Ed. by M. Behara, K. Krickeberg, and J. Wolfowitz. Berlin, Heidelberg: Springer Berlin Heidelberg, 1969, pp. 126–169. isbn: 978-3-540- 36098-8

  21. [21]

    Flow Matching for Generative Modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. “Flow Matching for Generative Modeling”. In: 11th International Conference on Learning Representations, ICLR 2023. 2023

  22. [22]

    Higher Chain Formula Proved by Combinatorics

    Tsoy-Wo Ma. “Higher Chain Formula Proved by Combinatorics”. In: Electronic Journal of Combinatorics 16.1 (June 2009), p. 21

  23. [23]

    Distribution learning via neural dif- ferential equations: minimal energy regularization and approximation theory

    Youssef Marzouk, Zhi Ren, and Jakob Zech. Distribution learning via neural dif- ferential equations: minimal energy regularization and approximation theory . 2025. arXiv: 2502.03795 [cs.LG]

  24. [24]

    Distribution Learning via Neural Differential Equations: A Nonparametric Statistical Perspec- tive

    Youssef Marzouk, Zhi (Robert) Ren, Sven Wang, and Jakob Zech. “Distribution Learning via Neural Differential Equations: A Nonparametric Statistical Perspec- tive”. In: Journal of Machine Learning Research 25.232 (2024), pp. 1–61

  25. [25]

    Oberwolfeach Seminar Un- certainty Quantification (oral presentation)

    Near-)optimality of quasi-Monte Carlo methods and sub-optimality of Gauss – Her- mite sparse-grid quadrature in Gaussian Sobolev spaces . Oberwolfeach Seminar Un- certainty Quantification (oral presentation). Apr. 2025

  26. [26]

    High dimensional integration of smooth functions over cubes

    Erich Novak and Klaus Ritter. “High dimensional integration of smooth functions over cubes”. In: Numerische Mathematik 75.1 (Nov. 1996), pp. 79–97. issn: 0945-

  27. [27]

    doi: 10.1007/s002110050231

  28. [28]

    Simple Cubature Formulas with High Polynomial Exactness

    Erich Novak and Klaus Ritter. “Simple Cubature Formulas with High Polynomial Exactness”. In: Constructive Approximation 15.4 (1999), pp. 499–522. issn: 1432-

  29. [29]

    doi: 10.1007/s003659900119

  30. [30]

    The Curse of Dimension and a Universal Method For Numerical Integration

    Erich Novak and Klaus Ritter. “The Curse of Dimension and a Universal Method For Numerical Integration”. In: Multivariate Approximation and Splines . Ed. by G¨ unther N¨ urnberger, Jochen W. Schmidt, and Guido Walz. Basel: Birkh¨ auser Basel, 1997, pp. 177–187. isbn: 978-3-0348-8871-4

  31. [31]

    Normalizing flows for probabilistic modeling and inference

    George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. “Normalizing flows for probabilistic modeling and inference”. In: Journal of Machine Learning Research 22.57 (2021), pp. 1–64

  32. [32]

    Information and information stability of random variables and processes

    Mark S Pinsker. “Information and information stability of random variables and processes”. In: Holden-Day (1964)

  33. [33]

    Information Theory: From Coding to Learning

    Yury Polyanskiy and Yihong Wu. Information Theory: From Coding to Learning . Cambridge University Press, 2025

  34. [34]

    Variational inference with normalizing flows

    Danilo Rezende and Shakir Mohamed. “Variational inference with normalizing flows”. In: International conference on machine learning . PMLR. 2015, pp. 1530–1538

  35. [35]

    New advances in universal approximation with neural networks of minimal width

    Dennis Rochau, Robin Chan, and Hanno Gottschalk. New advances in universal approximation with neural networks of minimal width . 2024. arXiv: 2411 . 08735 [cs.NE]

  36. [36]

    Optimal transport for applied mathematicians

    Filippo Santambrogio. Optimal transport for applied mathematicians . en. 1st ed. Progress in nonlinear differential equations and their applications. Basel, Switzer- land: Birkhauser, Oct. 2015

  37. [37]

    Polynomial Splines

    Larry Schumaker. “Polynomial Splines”. In: Spline Functions: Basic Theory . Cam- bridge Mathematical Library. Cambridge University Press, 2007, pp. 108–188

  38. [38]

    Understanding machine learning: From theory to algorithms

    Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms . Cambridge university press, 2014

  39. [39]

    Product-integration with the Clenshaw-Curtis and related points

    Ian H. Sloan and W. E. Smith. “Product-integration with the Clenshaw-Curtis and related points”. In: Numerische Mathematik 30.4 (Dec. 1978), pp. 415–428. issn: 0945-3245. doi: 10.1007/BF01398509. 30 REFERENCES

  40. [40]

    Quadrature and Interpolation Formulas for Tensor Products of Certain Classes of Functions

    Sergei Abramovich Smolyak. “Quadrature and Interpolation Formulas for Tensor Products of Certain Classes of Functions”. In: Doklady Akademii Nauk. Vol. 148. 5. Russian Academy of Sciences. 1963, pp. 1042–1045

  41. [41]

    Fast construction of Fej´ er and Clenshaw–Curtis rules for general weight functions

    Alvise Sommariva. “Fast construction of Fej´ er and Clenshaw–Curtis rules for general weight functions”. In: Computers & Mathematics with Applications 65.4 (2013), pp. 682–693. issn: 0898-1221. doi: 10.1016/j.camwa.2012.12.004

  42. [42]

    Introduction to uncertainty quantification

    Timothy John Sullivan. Introduction to uncertainty quantification. Vol. 63. Springer, 2015

  43. [43]

    Fast Construction of the Fej´ er and Clenshaw–Curtis Quadrature Rules

    J¨ org Waldvogel. “Fast Construction of the Fej´ er and Clenshaw–Curtis Quadrature Rules”. In: BIT Numerical Mathematics 46.1 (Mar. 2006), pp. 195–202. issn: 1572-

  44. [44]

    doi: 10.1007/s10543-006-0045-4

  45. [45]

    Explicit Cost Bounds of Algorithms for Multivariate Tensor Product Problems

    G. W. Wasilkowski and H. Wozniakowski. “Explicit Cost Bounds of Algorithms for Multivariate Tensor Product Problems”. In: Journal of Complexity 11.1 (1995), pp. 1–56. issn: 0885-064X. doi: 10.1006/jcom.1995.1001