Recognition: no theorem link
Certified and accurate computation of function space norms of deep neural networks
Pith reviewed 2026-05-15 15:03 UTC · model grok-4.3
The pith
Interval arithmetic on refined boxes yields certified Lebesgue and Sobolev norms for neural networks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By propagating local interval enclosures of function values, Jacobians, and Hessians through adaptive marking and quadrature aggregation, the framework computes certified global bounds on L^p, W^{1,p}, and W^{2,p} norms of deep neural networks, with the enclosures becoming arbitrarily tight under refinement for standard activations.
What carries the argument
Interval arithmetic enclosures of network activations and derivatives on axis-aligned boxes, combined with adaptive refinement and quadrature-based aggregation to obtain global integral bounds.
If this is right
- Provides certified bounds on PINN interior residuals.
- The adaptive procedure converges to the true norm value.
- Extends to computation of other integral quantities beyond norms.
- Enables rigorous error control for neural network approximations of PDE solutions.
Where Pith is reading between the lines
- This method could extend to certifying other properties like stability or robustness measures of neural networks.
- Practical implementation might require careful choice of initial box partitions to ensure efficiency.
- Connections to verified computing in scientific computing could lead to hybrid symbolic-numeric approaches for NN analysis.
Load-bearing premise
The activations and their derivatives must admit computable interval enclosures on axis-aligned boxes that can be made tight through refinement.
What would settle it
A specific neural network where the computed bounds on the L2 norm fail to approach the true value even after extensive adaptive refinement.
Figures
read the original abstract
Neural network methods for PDEs require reliable error control in function space norms. However, trained neural networks can typically only be probed at a finite number of point values. Without strong assumptions, point evaluations alone do not provide enough information to derive tight deterministic and guaranteed bounds on function space norms. In this work, we move beyond a purely black-box setting and exploit the neural network structure directly. We present a framework for the certified and accurate computation of integral quantities of neural networks, including Lebesgue and Sobolev norms, by combining interval arithmetic enclosures on axis-aligned boxes with adaptive marking/refinement and quadrature-based aggregation. On each box, we compute guaranteed lower and upper bounds for function values and derivatives, and propagate these local certificates to global lower and upper bounds for the target integrals. Our analysis provides a general convergence theorem for such certified adaptive quadrature procedures and instantiates it for function values, Jacobians, and Hessians, yielding certified computation of $L^p$, $W^{1,p}$, and $W^{2,p}$ norms. We further show how these ingredients lead to practical certified bounds for PINN interior residuals. Numerical experiments illustrate the accuracy and practical behavior of the proposed methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims a framework for certified computation of integral quantities of neural networks, including L^p, W^{1,p}, and W^{2,p} norms, by using interval arithmetic enclosures on axis-aligned boxes combined with adaptive marking/refinement and quadrature aggregation. It states a general convergence theorem for such certified adaptive quadrature procedures and instantiates it for function values, Jacobians, and Hessians, with further application to certified bounds on PINN interior residuals. Numerical experiments are used to illustrate accuracy and practical behavior.
Significance. If the convergence theorem holds with enclosures that tighten sufficiently for standard activations including ReLU, the work would offer a practical method for obtaining guaranteed bounds on function-space norms of neural networks without relying solely on pointwise probes. This addresses a key need in neural PDE solvers for rigorous error control beyond black-box settings, and the use of interval methods with adaptive quadrature provides a concrete, implementable approach with potential for reproducibility via the described enclosures and quadrature rules.
major comments (2)
- [abstract and convergence theorem statement] The general convergence theorem (instantiated in the abstract for Jacobians and Hessians) assumes that local interval enclosures for the integrand and its derivatives can be driven to arbitrary accuracy by box refinement. However, for ReLU networks the first and second derivatives are piecewise constant with jumps across hyperplanes; standard interval arithmetic on axis-aligned boxes incurs wrapping and dependency errors that do not necessarily vanish uniformly when kink surfaces intersect the boxes. No separate argument is supplied showing that enclosure widths still contract at a rate sufficient for global bounds to converge to the true integral.
- [framework description] The weakest assumption stated—that network activations and derivatives admit computable interval enclosures on axis-aligned boxes—is not accompanied by a precise characterization of the network class (e.g., depth, width, or activation restrictions) for which the enclosures remain tight enough to guarantee practical convergence of the adaptive procedure.
minor comments (2)
- [quadrature-based aggregation] Clarify in the introduction or methods section whether the quadrature rules are chosen to be compatible with the interval enclosures (e.g., exact integration of enclosure bounds) or if additional overestimation is introduced.
- [numerical experiments] The numerical experiments section would benefit from explicit reporting of enclosure widths versus refinement level for at least one ReLU example to allow readers to assess the practical contraction rate.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback. We address the two major comments point by point below and will revise the manuscript to incorporate clarifications where needed.
read point-by-point responses
-
Referee: [abstract and convergence theorem statement] The general convergence theorem (instantiated in the abstract for Jacobians and Hessians) assumes that local interval enclosures for the integrand and its derivatives can be driven to arbitrary accuracy by box refinement. However, for ReLU networks the first and second derivatives are piecewise constant with jumps across hyperplanes; standard interval arithmetic on axis-aligned boxes incurs wrapping and dependency errors that do not necessarily vanish uniformly when kink surfaces intersect the boxes. No separate argument is supplied showing that enclosure widths still contract at a rate sufficient for global bounds to converge to the true integral.
Authors: We thank the referee for highlighting this subtlety. The general convergence theorem (Theorem 3.1) is stated under the assumption that enclosure diameters tend to zero with box diameters, which holds pointwise away from discontinuities. For ReLU networks the derivatives are discontinuous across hyperplanes of measure zero. The adaptive procedure refines boxes intersecting these surfaces, and the total volume of such boxes can be driven to zero while the integrand remains bounded (by construction of the network). The contribution of these boxes to the integral error is therefore controlled by their vanishing measure times a uniform bound on the integrand, ensuring the global certified bounds converge to the true value. We will add a dedicated remark after the theorem statement clarifying this argument for the non-smooth case without changing the theorem itself. revision: yes
-
Referee: [framework description] The weakest assumption stated—that network activations and derivatives admit computable interval enclosures on axis-aligned boxes—is not accompanied by a precise characterization of the network class (e.g., depth, width, or activation restrictions) for which the enclosures remain tight enough to guarantee practical convergence of the adaptive procedure.
Authors: We agree that a sharper characterization improves clarity. The framework applies to finite-depth, finite-width feedforward networks whose activations are continuous and admit computable interval enclosures together with their derivatives on compact intervals (ReLU, sigmoid, tanh, softplus, etc.). Enclosures are obtained recursively layer by layer via standard interval arithmetic; the resulting overestimation is reduced by the adaptive subdivision. We will revise the assumptions paragraph in Section 2 to state this network class explicitly and note that the method requires only that min/max of each activation and its derivative are computable on intervals, which covers all activations used in the numerical experiments. revision: yes
Circularity Check
No circularity: derivation uses standard interval arithmetic and adaptive quadrature
full rationale
The paper derives certified L^p, W^{1,p}, and W^{2,p} norm bounds by combining interval enclosures on axis-aligned boxes, adaptive marking/refinement, and quadrature aggregation. The general convergence theorem for certified adaptive quadrature is instantiated directly from the properties of interval arithmetic (tightening under refinement) and standard quadrature rules, without reducing any target quantity to a fitted parameter, self-defined input, or load-bearing self-citation. The method is self-contained against external interval libraries and does not rename known results or smuggle ansatzes via prior work by the same authors. No equation or step in the abstract or described framework exhibits a reduction by construction to its own inputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Interval arithmetic produces rigorous enclosures for the network and its derivatives on axis-aligned boxes
- standard math Adaptive marking and refinement combined with quadrature yields convergent global bounds
Reference graph
Works this paper leans on
-
[1]
M. Ainsworth and J. T. Oden. A posteriori error estimation in finite element analysis. Computer methods in applied mechanics and engineering, 142(1-2):1–88, 1997. 36 0 5 10 20 25 3015 Iteration Step 10 1 100 101 102 Normalized Global Error Geometric Convergence Decay | =0.5 (a) Geometric convergence (L2 norm) for the deep architecture. 0 5 10 20 25 3015 1...
work page 1997
-
[2]
A-posteriorierrorestimatesforthefiniteelementmethod
I.BabuškaandW.C.Rheinboldt. A-posteriorierrorestimatesforthefiniteelementmethod. International journal for numerical methods in engineering, 12(10):1597–1615, 1978
work page 1978
-
[3]
P. L. Bartlett, D. J. Foster, and M. Telgarsky. Spectrally-normalized margin bounds for neural networks. InAdvances in Neural Information Processing Systems 30 (NIPS), pages 6240–6249, 2017
work page 2017
-
[4]
S. Berrone, C. Canuto, and M. Pintore. Variational physics informed neural networks: The role of quadratures and test functions.Journal of Scientific Computing, 92:100, 2022. Ann. Univ. Ferrara 68:575–595 in an earlier version; cf. the published journal version
work page 2022
-
[5]
M. Berz and K. Makino. Verified integration of ODEs and flows using differential algebraic methods on high-order Taylor models.Reliable Computing, 4(4):361–369, 1998
work page 1998
-
[6]
M. Berz and K. Makino. Performance of Taylor model methods for validated integration of ODEs.International Journal of Pure and Applied Mathematics, 6(3):239–316, 2003
work page 2003
-
[7]
A. Boetius, S. Leue, and T. Sutter. Probabilistic verification of neural networks against group fairness. InProceedings of the 42nd International Conference on Machine Learn- ing (ICML), 2025. Computes guaranteed bounds on probabilities by aggregating interval bounds over domain partitions
work page 2025
- [8]
-
[9]
G. F. Corliss and L. B. Rall. Adaptive, self-validating numerical quadrature.SIAM Journal on Scientific and Statistical Computing, 8(5):831–847, 1987
work page 1987
-
[10]
T. De Ryck, A. D. Jagtap, and S. Mishra. Error estimates for physics-informed neural networks approximating the navier–stokes equations.IMA Journal of Numerical Analysis, 44(1):83–119, 2024
work page 2024
-
[11]
T. De Ryck and S. Mishra. Generic bounds on the approximation error for physics-informed (and other) neural networks.Acta Numerica, 33:633–713, 2024
work page 2024
- [12]
-
[13]
W. Dörfler. A convergent adaptive algorithm for Poisson’s equation.SIAM Journal on Numerical Analysis, 33(3):1106–1124, 1996
work page 1996
-
[14]
W. Dörfler. A convergent adaptive algorithm for poisson’s equation.SIAM Journal on Numerical Analysis, 33(3):1106–1124, 1996
work page 1996
-
[15]
N. Doumèche, G. Biau, C. Boyer, and É. Moulines. Physics-informed machine learning as a kernel method, 2025. A priori PINN error analysis paper cited as Doumèche et al. (2025) in the reviewed manuscript
work page 2025
-
[16]
W. E, C. Ma, and L. Wu. Barron spaces and the compositional function spaces for neural network models.Constructive Approximation, 55:369–406, 2022
work page 2022
- [17]
-
[18]
T. Entesari, R. Sharifi, and M. Fazlyab. Certified invertibility in neural networks via mixed- integer programming. InProceedings of the 41st International Conference on Machine Learning (ICML), 2024. Compositional curvature bounds paper; see also arXiv:2407.11498
- [19]
-
[20]
L. C. Evans.Partial differential equations, volume 19. American mathematical society, 2022
work page 2022
-
[21]
M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. J. Pappas. Efficient and accurate esti- mation of Lipschitz constants for deep neural networks. InAdvances in Neural Information Processing Systems 32 (NeurIPS), pages 11423–11434, 2019
work page 2019
-
[22]
C. Ferrari, M. N. Müller, N. Jovanović, and M. Vechev. Complete verification via multi- neuron relaxation guided branch-and-bound. InProceedings of the 10th International Con- ference on Learning Representations (ICLR), 2022
work page 2022
-
[23]
L. Gonon, L. Grigoryeva, J.-P. Ortega, and J. Ratsaby. Path-norm optimization for residual networks. InProceedings of the 12th International Conference on Learning Representations (ICLR), 2024. Extends path norms to general DAG ReLU networks with biases, skip connections, and pooling
work page 2024
-
[24]
Scalable Verified Training for Provably Robust Image Classification
S. Gowal, K. Dvijotham, R. Stanforth, R. Bunel, C. Qin, J. Uesato, R. Arandjelovic, T. Mann, and P. Kohli. On the effectiveness of interval bound propagation for training ver- ifiably robust models, 2018. Preprint; published as “Scalable Verified Training for Provably Robust Image Classification” at ICCV 2019
work page 2018
-
[25]
P. Grohs and F. Voigtlaender. Proof of the theory-to-practice gap in deep learning via sampling complexity bounds for neural network approximation spaces.Found Comput Math, 2023
work page 2023
-
[26]
I. Gühring, G. Kutyniok, and P. Petersen. Error bounds for approximations with deep ReLU neural networks inWs,p norms.Analysis and Applications, 18(5):803–859, 2020
work page 2020
-
[27]
B. Hanin and D. Rolnick. Deep relu networks have surprisingly few activation patterns. Advances in neural information processing systems, 32, 2019
work page 2019
-
[28]
B. Hillebrecht and B. Unger. Rigorous numerical analysis of neural networks.IEEE Trans- actions on Neural Networks and Learning Systems, 2022. A posteriori PINN error bounds via semigroup theory; see also the 2025 extended version
work page 2022
-
[29]
B. Hillebrecht and B. Unger. Certified machine learning: A posteriori error estimation for physics-informed neural networks.IEEE Transactions on Neural Networks and Learning Systems, 36(1):1583–1593, 2025
work page 2025
- [30]
- [31]
-
[32]
N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, and A. Anand- kumar. Neural operator: Learning maps between function spaces with applications to pdes. Journal of Machine Learning Research, 24(89):1–97, 2023
work page 2023
-
[33]
J. Laurel, S. Pailoor, C. Sidrane, S. Ugare, and S. Madhyastha. Toward better gradient descent via unified abstract interpretation of automatic differentiation. InProceedings of the ACM on Programming Languages (OOPSLA), volume 6, 2022. Abstract interpretation of higher-order automatic differentiation with certified derivative bounds via dual-number abstractions
work page 2022
-
[34]
Z. Liao and P. Ming. Spectral Barron spaces and neural network approximation in the high-frequency regime, 2023. Cited as Liao and Ming (2023) in the reviewed manuscript; see also related work on spectral Barron spaces in JMLR
work page 2023
-
[35]
D. A. Lorenz, R. Bacho, and G. Kutyniok. Certified neural network approximations for semilinear wave equations, 2024
work page 2024
-
[36]
L.Lu, P.Jin, andG.E.Karniadakis. Deeponet: Learningnonlinearoperatorsforidentifying differential equations based on the universal approximation theorem of operators.arXiv preprint arXiv:1910.03193, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1910
- [37]
-
[38]
R. E. Moore.Interval analysis. Prentice-Hall, 1966
work page 1966
-
[39]
R. E. Moore, R. B. Kearfott, and M. J. Cloud.Introduction to Interval Analysis. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, 2009
work page 2009
-
[40]
R. E. Moore, R. B. Kearfott, and M. J. Cloud.Introduction to interval analysis. SIAM, 2009
work page 2009
-
[41]
A. Mukherjee et al. Generalization bounds for physics-informed neural networks via residual control, 2026
work page 2026
-
[42]
M. T. Nakao, M. Plum, and Y. Watanabe.Numerical Verification Methods and Computer- Assisted Proofs for Partial Differential Equations, volume 53 ofSpringer Series in Compu- tational Mathematics. Springer, Singapore, 2019
work page 2019
-
[43]
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
B. Neyshabur, S. Bhojanapalli, and N. Srebro. A PAC-bayesian approach to spectrally- normalized margin bounds for neural networks.arXiv preprint arXiv:1707.09564, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[44]
B. Neyshabur, R. Tomioka, and N. Srebro. Norm-based capacity control in neural networks. InProceedings of the 28th Annual Conference on Learning Theory (COLT), volume 40 of Proceedings of Machine Learning Research, pages 1376–1401. PMLR, 2015
work page 2015
-
[45]
P. Petersen and J. Zech. Mathematical theory of deep learning. Monograph; bibliographic details incomplete
-
[46]
K. Petras. Autonomous and reliable numerical integration of functions with irregular be- havior.Journal of Computational and Applied Mathematics, 145(2):345–359, 2002. 42
work page 2002
- [47]
-
[48]
S. M. Rump. Verification methods: Rigorous results using floating-point arithmetic.Acta Numerica, 19:287–449, 2010
work page 2010
-
[49]
S. M. Rump. Verification methods: Rigorous results using floating-point arithmetic. In Proceedings of the 2010 International Symposium on Symbolic and Algebraic Computation, pages 3–4, 2010
work page 2010
-
[50]
S. Shalev-Shwartz and S. Ben-David.Understanding machine learning: From theory to algorithms. Cambridge university press, 2014
work page 2014
-
[51]
R. Sharifi and M. Fazlyab. Derivative-preserving reachability analysis of neural networks, 2024
work page 2024
-
[52]
Z. Shi, Q. Jin, J. Z. Kolter, S. Jana, C.-J. Hsieh, and H. Zhang. Neural network verifica- tion with branch-and-bound for general nonlinearities. InProceedings of the 31st Interna- tional Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), 2025
work page 2025
-
[53]
Z. Shi, Y. Wang, H. Zhang, J. Yi, and C.-J. Hsieh. Efficiently computing local lipschitz constants of neural networks via bound propagation. InAdvances in Neural Information Processing Systems 35 (NeurIPS), 2022
work page 2022
-
[54]
J. W. Siegel and J. Xu. Characterization of the variation spaces corresponding to shallow neural networks.Constructive Approximation, 57:1017–1078, 2023
work page 2023
- [55]
-
[56]
J. Sirignano and K. Spiliopoulos. Dgm: A deep learning algorithm for solving partial differential equations.Journal of computational physics, 375:1339–1364, 2018
work page 2018
-
[57]
R. Tanaka and K. Yatabe. Learn and verify: Interval-arithmetic-based verification of physics-informed neural networks for ODEs, 2026
work page 2026
-
[58]
Tucker.Validated Numerics: A Short Introduction to Rigorous Computations
W. Tucker.Validated Numerics: A Short Introduction to Rigorous Computations. Princeton University Press, Princeton, NJ, 2011
work page 2011
-
[59]
Verfürth.A posteriori error estimation techniques for finite element methods
R. Verfürth.A posteriori error estimation techniques for finite element methods. OUP Oxford, 2013
work page 2013
-
[60]
S. Wang, K. Pei, J. Whitehouse, J. Yang, and S. Jana. Formal security analysis of neural networks using symbolic intervals. In27th USENIX Security Symposium (USENIX Security 18), pages 1599–1614, Baltimore, MD, 2018. USENIX Association
work page 2018
-
[61]
S. Wang, H. Zhang, K. Xu, X. Lin, S. Jana, C.-J. Hsieh, and J. Z. Kolter.β-CROWN: Efficient bound propagation with per-neuron split constraints for complete and incomplete neural network robustness verification. InAdvances in Neural Information Processing Sys- tems 34 (NeurIPS), pages 29909–29921, 2021. 43
work page 2021
-
[62]
K. Xu, Z. Shi, H. Zhang, Y. Wang, K.-W. Chang, M. Huang, B. Kailkhura, X. Lin, and C.-J. Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond. InAdvances in Neural Information Processing Systems 33 (NeurIPS), pages 1129–1141, 2020
work page 2020
- [63]
-
[64]
H. Zhang, T.-W. Weng, P.-Y. Chen, C.-J. Hsieh, and L. Daniel. Efficient neural network ro- bustness certification with general activation functions. InAdvances in Neural Information Processing Systems 31 (NeurIPS), pages 4939–4948, 2018
work page 2018
-
[65]
H. Zhang, P. Zhang, and C.-J. Hsieh. Recurjac: An efficient recursive algorithm for bound- ing jacobian matrix of neural networks and its applications. InProceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 5757–5764, 2019. 44 0 5 10 15 20 25 30 Iteration Step 10 3 10 2 10 1 100 101 Global Error Geometric Convergence Decay | =0....
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.