Consistency of Learned Sparse Grid Quadrature Rules using NeuralODEs
Pith reviewed 2026-05-21 23:40 UTC · model grok-4.3
The pith
Learned transport maps with sparse-grid quadrature produce PAC-consistent integral estimators as sample size and quadrature budget both grow.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The LtI estimator is PAC consistent: with high probability the numerical integral approximates the true value to arbitrary accuracy as both the sample size n and the quadrature budget m tend to infinity. The analysis splits into a general regime where a neural-ODE flow supplies an isotropic C^k map and the rate m^{-k/d}(log m)^{(d-1)(k/d+1)}, and a diagonal regime where empirical quantile transport recovers the optimal mixed rate m^{-k}(log m)^{(d-1)(k+1)}.
What carries the argument
The structural fact that composition of a C^k_mix-regular function with a C^1-diffeomorphism preserves C^k_mix regularity only when the diffeomorphism is diagonal up to a permutation of coordinates; this fact forces the split into general and product-target regimes and determines which quadrature rate is available.
If this is right
- As n and m both tend to infinity the numerical value converges to the true expectation with high probability in either regime.
- Increasing the smoothness index k together with the matching ReLU order reduces the dimension dependence of the error in the general regime.
- When the target is a product measure the lightweight empirical-quantile estimator already achieves the full mixed-derivative convergence rate without neural-ODE training.
- The method therefore supplies a consistent quadrature procedure for both arbitrary and product-structured targets.
Where Pith is reading between the lines
- For problems whose target density factors, one could skip neural-ODE training entirely and still obtain the optimal sparse-grid rate.
- The same structural preservation argument might be applied to other quadrature families once a suitable regularity class is identified.
- Detecting or enforcing near-diagonal structure in the learned map could serve as a practical switch between the two regimes inside a single code base.
Load-bearing premise
Composition of a mixed-regularity function with a diffeomorphism preserves the mixed regularity only when the diffeomorphism is diagonal up to a permutation of coordinates.
What would settle it
A concrete counter-example in which a non-diagonal C^1 diffeomorphism composed with a C^k_mix function remains C^k_mix would remove the regime distinction and collapse the claimed rate separation between the general and diagonal cases.
Figures
read the original abstract
We prove consistency of a recently proposed scheme that evaluates expected values by composing a learned transport map with Clenshaw--Curtis sparse-grid quadrature on a tractable product source. Our analysis hinges on the structural fact that composition of a $C^k_{\mathrm{mix}}$-regular function -- which carries the fast quadrature rate $m^{-k}(\log m)^{(d-1)(k+1)}$ -- with a $C^1$-diffeomorphism can only be guaranteed to be $C^k_{\mathrm{mix}}$ itself, if the diffeomorphism is diagonal up to a permutation of coordinates. The fast rate is therefore available exclusively for product targets, and the analysis splits into two regimes. In the general regime of arbitrary targets, we learn the transport as the time-one flow of a $\mathrm{ReLU}^{k+1}$-neural ODE trained by maximum likelihood. The resulting flow lies in the isotropic space $C^k$ and yields the rate $m^{-k/d}(\log m)^{(d-1)(k/d+1)}$, with raising the density smoothness $k$ and the matched activation order $k+1$ mitigating the curse of dimensionality at the cost of harder optimization. In the diagonal regime of product targets, the Knothe--Rosenblatt map is itself diagonal and we estimate it pointwise via empirical quantile transport, a lightweight alternative that recovers the full mixed-regularity rate. In both regimes, the resulting LtI estimator is PAC (probably approximately correct) consistent. With high probability the numerical integral approximates the true value to arbitrary accuracy as both the sample size $n$ and the quadrature budget $m$ tend to infinity.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proves PAC consistency of the LtI estimator for approximating expectations E[f(X)] by composing a learned transport map T with Clenshaw-Curtis sparse-grid quadrature on a product source measure. The analysis splits into a general regime, where T is realized as the time-one flow of a ReLU^{k+1} neural ODE trained by MLE and yields the isotropic rate m^{-k/d}(log m)^{(d-1)(k/d+1)}, and a diagonal/product-target regime, where the Knothe-Rosenblatt map is estimated by empirical quantiles and recovers the faster mixed-regularity rate m^{-k}(log m)^{(d-1)(k+1)}. The key structural fact invoked is that C^k_mix regularity is preserved under composition with a C^1-diffeomorphism only when the map is diagonal (up to coordinate permutation). In both regimes the double limit n,m→∞ implies that the numerical integral converges to the true value with high probability.
Significance. If the central claims hold, the work supplies the first rigorous PAC-consistency analysis for learned-transport-plus-sparse-grid quadrature, with explicit rates that quantify the trade-off between smoothness, dimension, and optimization difficulty. The observation that mixed regularity survives composition only for diagonal maps cleanly explains why the fast rate is available exclusively for product targets; the neural-ODE construction for the general case is a natural and technically sound way to obtain an isotropic C^k map. These results are directly relevant to high-dimensional integration and uncertainty quantification.
major comments (2)
- [§3.2] §3.2 (general regime): the translation from MLE convergence of the neural-ODE parameters to a C^k-norm bound on the learned flow map is only sketched; the hidden constants that depend on the activation order k+1 and the Lipschitz constants of the vector field must be tracked explicitly to confirm that the quadrature error term indeed decays as m^{-k/d}.
- [Theorem 4.3] Theorem 4.3 (PAC statement): the probability 1-δ appears only in the final display; it is not shown how δ interacts with the sample size n when the transport map is estimated from n i.i.d. draws, nor whether the double limit is taken in a specific order (n first, then m, or jointly).
minor comments (2)
- [§2] The definition of the mixed Sobolev space C^k_mix and the precise statement of the structural preservation lemma should be moved from the appendix to §2 so that the regime split is self-contained.
- [Abstract] Notation: the symbol LtI is introduced in the abstract but never expanded; a parenthetical “Learned transport + Integration” on first use would help readers.
Simulated Author's Rebuttal
We thank the referee for the careful reading, positive assessment, and recommendation for minor revision. The comments identify opportunities to strengthen the exposition of the proofs. We respond to each major comment below and will incorporate the suggested clarifications in the revised manuscript.
read point-by-point responses
-
Referee: [§3.2] §3.2 (general regime): the translation from MLE convergence of the neural-ODE parameters to a C^k-norm bound on the learned flow map is only sketched; the hidden constants that depend on the activation order k+1 and the Lipschitz constants of the vector field must be tracked explicitly to confirm that the quadrature error term indeed decays as m^{-k/d}.
Authors: We agree that the argument in Section 3.2 would benefit from a more explicit accounting of constants. In the revision we will expand the derivation to track the dependence of the C^k-norm bound on the ReLU^{k+1} activation order and on the Lipschitz constants of the neural-ODE vector field. This will make the passage from MLE parameter convergence to the isotropic quadrature rate m^{-k/d}(log m)^{(d-1)(k/d+1)} fully rigorous and confirm that the hidden factors remain independent of m. revision: yes
-
Referee: [Theorem 4.3] Theorem 4.3 (PAC statement): the probability 1-δ appears only in the final display; it is not shown how δ interacts with the sample size n when the transport map is estimated from n i.i.d. draws, nor whether the double limit is taken in a specific order (n first, then m, or jointly).
Authors: We thank the referee for highlighting this point. The proof of Theorem 4.3 first lets n→∞ (for fixed m) to obtain a high-probability bound 1-δ on the transport-map error via concentration of the neural-ODE MLE, after which m→∞ controls the quadrature error. The dependence of δ on n is inherited from the sample-complexity bounds for the MLE estimator. In the revised manuscript we will state this ordering explicitly in the theorem and proof, and we will also indicate the joint-limit regime in which n grows sufficiently rapidly with m to keep the overall failure probability below any prescribed δ. revision: yes
Circularity Check
No significant circularity; derivation rests on external convergence results
full rationale
The paper establishes PAC consistency of the LtI estimator via a decomposition into (i) convergence of the learned transport (Neural ODE MLE in the general case or empirical quantile transport in the diagonal case) and (ii) standard consistency of Clenshaw-Curtis sparse-grid quadrature applied to the composed integrand. The structural fact on C^k_mix preservation under diagonal diffeomorphisms is invoked solely to recover the fast sparse-grid rate in the product-target regime; it is presented as an independent observation supporting the rate split rather than a self-referential definition. No equation or claim reduces the double-limit consistency result to a fitted parameter renamed as a prediction, nor does any load-bearing step collapse to a self-citation chain. The argument therefore remains self-contained against external benchmarks from approximation theory and numerical quadrature.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Composition of a C^k_mix-regular function with a C^1-diffeomorphism preserves C^k_mix regularity only when the diffeomorphism is diagonal up to coordinate permutation.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The fast rate is therefore available exclusively for product targets... In the general regime... yields the rate m^{-k/d}(log m)^{(d-1)(k/d+1)}
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat induction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 5.12 (PAC-Learnability of Sparse Grid Integration)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A convenient infinite dimensional framework for generative adversarial learning
Hayk Asatryan, Hanno Gottschalk, Marieke Lippert, and Matthias Rottmann. “A convenient infinite dimensional framework for generative adversarial learning”. In: Electronic Journal of Statistics 17.1 (2023), pp. 391–428. doi: 10.1214/23-EJS2104
-
[2]
Denis Belomestny, Alexey Naumov, Nikita Puchkin, and Sergey Samsonov. “Si- multaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations”. In: Neural Networks 161 (2023), pp. 242–253. issn: 0893-6080. doi: https://doi.org/10.1016/j.neunet.2023.01. 035
-
[3]
Hans-Joachim Bungartz and Michael Griebel. “Sparse grids”. In: Acta numerica 13 (2004), pp. 147–269
work page 2004
-
[4]
Lu-net: Invertible neural networks based on matrix factorization
Robin Chan, Sarina Penquitt, and Hanno Gottschalk. “Lu-net: Invertible neural networks based on matrix factorization”. In: 2023 International Joint Conference on Neural Networks (IJCNN) . IEEE. 2023, pp. 1–10
work page 2023
-
[5]
Neural Ordinary Differential Equations
Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. “Neural Ordinary Differential Equations”. In: Advances in Neural Information Pro- cessing Systems . Ed. by S. Bengio et al. Vol. 31. Curran Associates, Inc., 2018. url: https : / / proceedings . neurips . cc / paper _ files / paper / 2018 / file / 69386f6bb1dfed68692a24c8686939b9-Paper.pdf
work page 2018
-
[6]
A method for numerical integration on an automatic computer
C. W. Clenshaw and A. R. Curtis. “A method for numerical integration on an automatic computer”. In: Numerische Mathematik 2.1 (Jan. 1960), pp. 197–205. issn: 0945-3245. doi: 10.1007/BF01386223
-
[7]
Keith R. Dalbey et al. Dakota, A Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sen- sitivity Analysis: Theory Manual (V.6.15) . Tech. rep. Chapter 3: Stochastic Expan- sion Methods. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States), Nov. 2021. doi: 10.2172/1832293
-
[8]
Bootstrap methods and their application
Anthony Christopher Davison and David Victor Hinkley. Bootstrap methods and their application. 1. Cambridge university press, 1997
work page 1997
-
[9]
Density estimation using Real NVP
Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. “Density estimation using Real NVP”. In: International Conference on Learning Representations. 2017
work page 2017
- [10]
-
[11]
Ehrhardt, Hanno Gottschalk, and Tobias J
Emily C. Ehrhardt, Hanno Gottschalk, and Tobias J. Riedlinger. Numerical and statistical analysis of NeuralODE with Runge-Kutta time integration . 2025. arXiv: 2503.10729 [cs.LG]
-
[12]
Ernst, Hanno Gottschalk, Toni Kowalewitz, and Patrick Kr¨ uger.Learning to Integrate
Oliver G. Ernst, Hanno Gottschalk, Toni Kowalewitz, and Patrick Kr¨ uger.Learning to Integrate. 2025. arXiv: 2506.11801 [math.NA]
-
[13]
Markov chain Monte Carlo: stochastic simulation for Bayesian inference
Dani Gamerman and Hedibert F Lopes. Markov chain Monte Carlo: stochastic simulation for Bayesian inference . Chapman and Hall/CRC, 2006
work page 2006
-
[14]
Numerical integration using sparse grids
Thomas Gerstner and Michael Griebel. “Numerical integration using sparse grids”. In: Numerical Algorithms 18.3 (Jan. 1998), pp. 209–232. issn: 1572-9265. doi: 10. 1023/A:1019129717644
work page 1998
-
[15]
Ramon van Handel. Probability in High Dimension. https://web.math.princeton. edu/~rvan/APC550.pdf. Lecture notes for APC 550, Princeton University. 2016
work page 2016
-
[16]
Ordinary Differential Equations
Philip Hartman. Ordinary Differential Equations. Second. Society for Industrial and Applied Mathematics, 2002. doi: 10.1137/1.9780898719222
-
[17]
Probability Inequalities for Sums of Bounded Random Vari- ables
Wassily Hoeffding. “Probability Inequalities for Sums of Bounded Random Vari- ables”. In: Journal of the American Statistical Association 58.301 (1963), pp. 13–
work page 1963
-
[18]
Probability Inequalities for Sums of Bounded Ra n- dom Variables
doi: 10.1080/01621459.1963.10500830. REFERENCES 29
-
[19]
Perturbation Bounds for Determinants and Characteristic Polynomials
Ilse C. F. Ipsen and Rizwana Rehman. “Perturbation Bounds for Determinants and Characteristic Polynomials”. In:SIAM Journal on Matrix Analysis and Applications 30.2 (2008), pp. 762–776. doi: 10.1137/070704770
-
[20]
On the optimum rate of transmitting information
J. H. B. Kemperman. “On the optimum rate of transmitting information”. In: Prob- ability and Information Theory. Ed. by M. Behara, K. Krickeberg, and J. Wolfowitz. Berlin, Heidelberg: Springer Berlin Heidelberg, 1969, pp. 126–169. isbn: 978-3-540- 36098-8
work page 1969
-
[21]
Flow Matching for Generative Modeling
Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. “Flow Matching for Generative Modeling”. In: 11th International Conference on Learning Representations, ICLR 2023. 2023
work page 2023
-
[22]
Higher Chain Formula Proved by Combinatorics
Tsoy-Wo Ma. “Higher Chain Formula Proved by Combinatorics”. In: Electronic Journal of Combinatorics 16.1 (June 2009), p. 21
work page 2009
-
[23]
Youssef Marzouk, Zhi Ren, and Jakob Zech. Distribution learning via neural dif- ferential equations: minimal energy regularization and approximation theory . 2025. arXiv: 2502.03795 [cs.LG]
-
[24]
Distribution Learning via Neural Differential Equations: A Nonparametric Statistical Perspec- tive
Youssef Marzouk, Zhi (Robert) Ren, Sven Wang, and Jakob Zech. “Distribution Learning via Neural Differential Equations: A Nonparametric Statistical Perspec- tive”. In: Journal of Machine Learning Research 25.232 (2024), pp. 1–61
work page 2024
-
[25]
Oberwolfeach Seminar Un- certainty Quantification (oral presentation)
Near-)optimality of quasi-Monte Carlo methods and sub-optimality of Gauss – Her- mite sparse-grid quadrature in Gaussian Sobolev spaces . Oberwolfeach Seminar Un- certainty Quantification (oral presentation). Apr. 2025
work page 2025
-
[26]
High dimensional integration of smooth functions over cubes
Erich Novak and Klaus Ritter. “High dimensional integration of smooth functions over cubes”. In: Numerische Mathematik 75.1 (Nov. 1996), pp. 79–97. issn: 0945-
work page 1996
-
[27]
doi: 10.1007/s002110050231
-
[28]
Simple Cubature Formulas with High Polynomial Exactness
Erich Novak and Klaus Ritter. “Simple Cubature Formulas with High Polynomial Exactness”. In: Constructive Approximation 15.4 (1999), pp. 499–522. issn: 1432-
work page 1999
-
[29]
doi: 10.1007/s003659900119
-
[30]
The Curse of Dimension and a Universal Method For Numerical Integration
Erich Novak and Klaus Ritter. “The Curse of Dimension and a Universal Method For Numerical Integration”. In: Multivariate Approximation and Splines . Ed. by G¨ unther N¨ urnberger, Jochen W. Schmidt, and Guido Walz. Basel: Birkh¨ auser Basel, 1997, pp. 177–187. isbn: 978-3-0348-8871-4
work page 1997
-
[31]
Normalizing flows for probabilistic modeling and inference
George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. “Normalizing flows for probabilistic modeling and inference”. In: Journal of Machine Learning Research 22.57 (2021), pp. 1–64
work page 2021
-
[32]
Information and information stability of random variables and processes
Mark S Pinsker. “Information and information stability of random variables and processes”. In: Holden-Day (1964)
work page 1964
-
[33]
Information Theory: From Coding to Learning
Yury Polyanskiy and Yihong Wu. Information Theory: From Coding to Learning . Cambridge University Press, 2025
work page 2025
-
[34]
Variational inference with normalizing flows
Danilo Rezende and Shakir Mohamed. “Variational inference with normalizing flows”. In: International conference on machine learning . PMLR. 2015, pp. 1530–1538
work page 2015
-
[35]
New advances in universal approximation with neural networks of minimal width
Dennis Rochau, Robin Chan, and Hanno Gottschalk. New advances in universal approximation with neural networks of minimal width . 2024. arXiv: 2411 . 08735 [cs.NE]
work page 2024
-
[36]
Optimal transport for applied mathematicians
Filippo Santambrogio. Optimal transport for applied mathematicians . en. 1st ed. Progress in nonlinear differential equations and their applications. Basel, Switzer- land: Birkhauser, Oct. 2015
work page 2015
-
[37]
Larry Schumaker. “Polynomial Splines”. In: Spline Functions: Basic Theory . Cam- bridge Mathematical Library. Cambridge University Press, 2007, pp. 108–188
work page 2007
-
[38]
Understanding machine learning: From theory to algorithms
Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms . Cambridge university press, 2014
work page 2014
-
[39]
Product-integration with the Clenshaw-Curtis and related points
Ian H. Sloan and W. E. Smith. “Product-integration with the Clenshaw-Curtis and related points”. In: Numerische Mathematik 30.4 (Dec. 1978), pp. 415–428. issn: 0945-3245. doi: 10.1007/BF01398509. 30 REFERENCES
-
[40]
Quadrature and Interpolation Formulas for Tensor Products of Certain Classes of Functions
Sergei Abramovich Smolyak. “Quadrature and Interpolation Formulas for Tensor Products of Certain Classes of Functions”. In: Doklady Akademii Nauk. Vol. 148. 5. Russian Academy of Sciences. 1963, pp. 1042–1045
work page 1963
-
[41]
Fast construction of Fej´ er and Clenshaw–Curtis rules for general weight functions
Alvise Sommariva. “Fast construction of Fej´ er and Clenshaw–Curtis rules for general weight functions”. In: Computers & Mathematics with Applications 65.4 (2013), pp. 682–693. issn: 0898-1221. doi: 10.1016/j.camwa.2012.12.004
-
[42]
Introduction to uncertainty quantification
Timothy John Sullivan. Introduction to uncertainty quantification. Vol. 63. Springer, 2015
work page 2015
-
[43]
Fast Construction of the Fej´ er and Clenshaw–Curtis Quadrature Rules
J¨ org Waldvogel. “Fast Construction of the Fej´ er and Clenshaw–Curtis Quadrature Rules”. In: BIT Numerical Mathematics 46.1 (Mar. 2006), pp. 195–202. issn: 1572-
work page 2006
-
[44]
doi: 10.1007/s10543-006-0045-4
-
[45]
Explicit Cost Bounds of Algorithms for Multivariate Tensor Product Problems
G. W. Wasilkowski and H. Wozniakowski. “Explicit Cost Bounds of Algorithms for Multivariate Tensor Product Problems”. In: Journal of Complexity 11.1 (1995), pp. 1–56. issn: 0885-064X. doi: 10.1006/jcom.1995.1001
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.