pith. sign in

arxiv: 2501.11275 · v2 · submitted 2025-01-20 · 💻 cs.LG · cs.NA· math.NA

Higher Order Approximation Rates for ReLU CNNs in Korobov Spaces

Pith reviewed 2026-05-23 04:51 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.NA
keywords ReLU CNNKorobov spaceapproximation ratesparse gridmixed derivativeconvolutional neural networkhigher order approximation
0
0 comments X

The pith

ReLU CNNs can approximate Korobov functions with mixed derivatives of order m+1 at rates scaling with the (m+1)th power of depth.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates improved approximation rates for ReLU convolutional neural networks when approximating functions from higher-order Korobov spaces. For functions possessing mixed derivatives up to order m+1, the error bound improves from the classical second-order dependence on network depth to an (m+1)-order rate, aside from a logarithmic factor. This is achieved through the approximate representation of high-order sparse grid basis functions within the CNN architecture. A sympathetic reader would care because it shows that the expressivity of CNNs can exploit higher smoothness without being crippled by the curse of dimensionality.

Core claim

For target functions having a mixed derivative of order m+1 in each direction, ReLU CNNs achieve an L_p approximation rate of order (m+1) in terms of the network depth, modulo a logarithmic factor, by approximately representing the high-order sparse grid basis functions.

What carries the argument

approximate representation of high-order sparse grid basis functions by CNNs, which allows the depth to control the approximation order directly.

Load-bearing premise

The high-order sparse grid basis functions admit approximate representations by CNNs at depths and widths that scale to deliver the improved rate.

What would settle it

Finding a specific function with mixed derivative of order m+1 whose L_p approximation error by ReLU CNNs of depth d stays no better than O(1/d^2) for large d would disprove the improved rate.

Figures

Figures reproduced from arXiv: 2501.11275 by Guozhi Zhang, Yuwen Li.

Figure 1
Figure 1. Figure 1: Hierarchical ancestors of xl,i = 0.1875 (d = 1, l = 4, i = 3, α = 3) including its two neighbor points x3,1 = 0.125, x2,1 = 0.25 and another ancestor x1,1 = 0.5. In Section 4, we shall derive CNN approximation rates for f ∈ Km+1 p (Ω) from the higher order sparse grid interpolation (2.8) Inf(x) = X |l|1≤n+d−1 X i∈Il vl,iφ α l,i (x) where each vl,i is given in the next lemma. Lemma 2.1. [7, Lemma 4.5] Let w… view at source ↗
Figure 2
Figure 2. Figure 2: (a) Graphs of T1, T2, T3; (b) interpolants R1, R2 of x 2 . The approximate product ×eM,U (x, y) is then built upon RU . The next lemma provides an error bound of |×eM,U (x, y) − xy| (see [23]) as well as verifies non￾negativity of ×eM,U (x, y) when (x, y) ∈ [0, 1]2 . The non-negativity of ×eM,U (x, y) serves as a crucial property in our analysis and seems missing in the classical liter￾ature. The proof of … view at source ↗
Figure 3
Figure 3. Figure 3: An illustration of the proof of Lemma 3.3, l = k = n = 2. The above lemma demonstrates that some CNN can eliminate zeros in the input vector. Lemma 3.3 is useful for deriving the next lemma about approximately multiplying components in a partitioned vector by CNNs. Lemma 3.4. Let U, k, l ∈ N+ and M > 0. For any {y1, . . . , yl} ⊂ [0, M] k , there exists hJ of the form (1.2) with depth J ≤ (128+14U)lk s−1 +… view at source ↗
Figure 4
Figure 4. Figure 4: (a) d = 1, l = 4, i = 3, the basis function φ 2 l,i at grid point xl,i = 0.1875; (b) factors ρl,i,1, ρl,i,2 of φ 2 l,i. In general, when m ≥ 3 and |l|1 ≤ n + d − 1, if αj = m for some j, we set (4.1) ρlj ,ij ,k(x) := σ  x − xlj ,ij ,k xlj ,ij − xlj ,ij ,k  , k = 1, . . . , m [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (a) d = 1, l = 4, i = 3, the basis function φ 3 l,i at grid point xl,i = 0.1875; (b) factors ρl,i,1, ρl,i,2, ρl,i,3 of φ 3 l,i. Consider the set of all piecewise linear factors:  ρβi,k(xj ) : i = 1, . . . , N, j = 1, . . . , d, k = 1, . . . , m , where each ρβi,k(xj ) can be written as the form σ(aβi,j,kxj+bβi,j,k) for aβi,j,k, bβi,j,k ∈ R and 0 ≤ ρβi,k(xj ) ≤ 2 n+d−1 . Let w := [w1,1; . . . ; w1,m; . . .… view at source ↗
read the original abstract

This paper investigates the $L_p$ approximation error for higher order Korobov functions using deep convolutional neural networks (CNNs) with ReLU activation. For target functions having a mixed derivative of order m+1 in each direction, we improve classical approximation rate of second order to (m+1)-th order (modulo a logarithmic factor) in terms of the depth of CNNs. The key ingredient in our analysis is approximate representation of high-order sparse grid basis functions by CNNs. The results suggest that higher order expressivity of CNNs does not severely suffer from the curse of dimensionality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that ReLU CNNs achieve L_p approximation rates of order (m+1) (modulo logarithmic factors) in depth for target functions in higher-order Korobov spaces possessing mixed derivatives of order m+1 in each coordinate. This improves upon the classical second-order rate for ReLU networks. The key technical step is an approximate representation of the associated high-order sparse-grid basis functions by CNNs; the results are presented as evidence that higher-order expressivity of CNNs does not suffer severely from the curse of dimensionality.

Significance. If the representation of the sparse-grid basis functions is established with depth scaling that yields the claimed rate, the work would strengthen the theoretical foundation for CNN approximation in high-dimensional mixed-smoothness settings and demonstrate that CNNs can exploit smoothness beyond the standard piecewise-linear regime without incurring prohibitive depth costs.

major comments (2)
  1. [Section containing the representation lemma / theorem for sparse-grid bases] The headline rate improvement rests entirely on the approximate representation of high-order sparse-grid basis functions (products of 1-D splines of order m+1) by ReLU CNNs. The manuscript must supply the explicit construction, depth/width bounds, and error estimates for this representation; without them the (m+1) rate cannot be verified and the result reduces to the classical O(depth^{-2}) bound.
  2. [Proof of the main approximation theorem] It is necessary to confirm that the depth required for the CNN representation of each basis function scales at most linearly with m (or better) and is independent of the number of active multi-indices; any linear dependence on m or on the cardinality of the sparse grid would cancel the claimed improvement over the second-order rate.
minor comments (2)
  1. Clarify the precise dependence of the logarithmic factor on dimension d and smoothness m in the final error bound.
  2. Add a short remark comparing the obtained rate with known results for fully connected ReLU networks on the same Korobov spaces.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. The two major points both concern the clarity and explicitness of the CNN construction for the sparse-grid basis functions. We address them point-by-point below and will revise the manuscript to make the relevant sections and proofs more prominent.

read point-by-point responses
  1. Referee: [Section containing the representation lemma / theorem for sparse-grid bases] The headline rate improvement rests entirely on the approximate representation of high-order sparse-grid basis functions (products of 1-D splines of order m+1) by ReLU CNNs. The manuscript must supply the explicit construction, depth/width bounds, and error estimates for this representation; without them the (m+1) rate cannot be verified and the result reduces to the classical O(depth^{-2}) bound.

    Authors: We agree that the headline claim depends on this construction and will ensure it is stated with full explicitness. The construction appears in Section 3 (Lemma 3.2 and the surrounding discussion): each 1-D spline of order m+1 is realized by a shallow ReLU network of depth O(m) and width O(1), after which the multivariate product is obtained by a convolutional layer that performs the necessary multiplications via the identity xy = ((x+y)^2 - (x-y)^2)/4 realized with two additional ReLU layers. The approximation error for each basis function is bounded by O(2^{-k}) with depth O(m+k). We will add a self-contained subsection titled “Explicit CNN realization of high-order sparse-grid basis functions” that collects the depth/width/error statements and moves the full inductive proof to the appendix. revision: yes

  2. Referee: [Proof of the main approximation theorem] It is necessary to confirm that the depth required for the CNN representation of each basis function scales at most linearly with m (or better) and is independent of the number of active multi-indices; any linear dependence on m or on the cardinality of the sparse grid would cancel the claimed improvement over the second-order rate.

    Authors: The depth bound is indeed linear in m and independent of both dimension d and the cardinality of the sparse grid. In the proof of the main theorem (Theorem 2.1), each individual basis function is approximated by its own CNN copy whose depth is O(m + log(1/ε)) regardless of how many other multi-indices are present; the final network is obtained by a single linear combination layer whose width equals the (finite) number of active basis functions but whose depth is unaffected. Consequently the overall depth remains O(m + log factors) and the (m+1)-order rate is preserved. We will insert an explicit remark after the statement of Theorem 2.1 that records this independence and cross-references the per-basis-function depth bound from Section 3. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation rests on independent representation construction

full rationale

The paper's central claim improves approximation rates to order m+1 by establishing approximate CNN representations of high-order sparse-grid basis functions. This representation step is presented as the key analytical ingredient and is not shown to reduce by definition, by fitting, or by self-citation chain to the target rate itself. No self-definitional loops, fitted inputs renamed as predictions, or ansatzes imported via overlapping-author citations appear in the abstract or described derivation. The result is therefore self-contained against external benchmarks once the representation property is accepted as independently verified.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard properties of Korobov spaces and the novel claim that CNNs can approximate the associated sparse-grid bases at the required order; no free parameters or invented entities are introduced in the abstract.

axioms (2)
  • domain assumption Standard definition and properties of higher-order Korobov spaces with mixed derivatives of order m+1
    The target function class is defined via these spaces; the rate improvement is stated relative to this class.
  • ad hoc to paper CNNs admit approximate representations of high-order sparse grid basis functions with depth scaling that yields the (m+1) rate
    This is explicitly called the key ingredient of the analysis.

pith-pipeline@v0.9.0 · 5627 in / 1470 out tokens · 28233 ms · 2026-05-23T04:51:15.600571+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 2 internal anchors

  1. [1]

    Alpert, G

    B. Alpert, G. Beylkin, D. Gines, and L. Vozovoi, Adaptive solution of partial differential equations in multiwavelet bases , J. Comput. Phys. 182 (2002), no. 1, 149–190

  2. [2]

    3, 524–549

    Chenglong Bao, Qianxiao Li, Zuowei Shen, Cheng Tai, Lei W u , and Xueshuang Xiang, Ap- proximation analysis of convolutional neural networks , East Asian Journal on Applied Math- ematics 13 (2023), no. 3, 524–549

  3. [3]

    7970, 533–538

    Kaifeng Bi, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, and Qi Tian, Accurate medium-range global weather forecasting with 3d neural net works, Nature 619 (2023), no. 7970, 533–538

  4. [4]

    Moise Blanchard and Mohammed Amine Bennouna, Shallow and deep networks are near- optimal approximators of Korobov functions , International Conference on Learning Represen- tations, 2022

  5. [5]

    Hans-Joachim Bungartz, A multigrid algorithm for higher order finite elements on spa rse grids, Electronic Transactions on Numerical Analysis 6 (1997), 63–77

  6. [6]

    thesis, Technische Universit¨ at M¨ unchen, 1998

    , Finite elements of higher order on sparse grids , Ph.D. thesis, Technische Universit¨ at M¨ unchen, 1998

  7. [7]

    Hans-Joachim Bungartz and Michael Griebel, Sparse grids , Acta Numerica 13 (2004), 147–269

  8. [8]

    Alexey Dosovitskiy, An image is worth 16 × 16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)

  9. [9]

    Dennis Elbr¨ achter, Philipp Grohs, Arnulf Jentzen, and C hristoph Schwab, DNN expression rate analysis of high-dimensional PDEs: application to opt ion pricing , Constructive Approx- imation 55 (2022), no. 1, 3–71

  10. [10]

    Zhiying Fang, Han Feng, Shuo Huang, and Ding-Xuan Zhou, Theory of deep convolutional neural networks ii: Spherical analysis , Neural Networks 131 (2020), 154–162

  11. [11]

    Vasile Gradinaru and Ralf Hiptmair, Mixed finite elements on sparse grids , Numer. Math. 93 (2003), 471–495

  12. [12]

    3, Paper No

    Juncai He, Lin Li, and Jinchao Xu, Approximation properties of deep ReLU CNNs , Research in the Mathematical Sciences 9 (2022), no. 3, Paper No. 38, 24. MR 4447410

  13. [13]

    Juncai He, Xinliang Liu, and Jinchao Xu, MgNO: Efficient parameterization of linear opera- tors via multigrid , The Twelfth International Conference on Learning Represe ntations, 2024

  14. [14]

    Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, eds.), 2022

    Jian Huang, Guohao Shen, Yuling Jiao, and Yuanyuan Lin, Approximation with CNNs in Sobolev space: with applications to classification , Advances in Neural Information Processing Systems (Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, eds.), 2022

  15. [15]

    7873, 583–589

    John Jumper, Richard Evans, Alexander Pritzel, Tim Gree n, Michael Figurnov, et al., Highly accurate protein structure prediction with alphafold , Nature 596 (2021), no. 7873, 583–589

  16. [16]

    Hinton, Imagenet classification with deep convolutional neural networks , Commun

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton, Imagenet classification with deep convolutional neural networks , Commun. ACM 60 (2017), no. 6, 84–90

  17. [17]

    388, 1–26

    Zongyi Li, Daniel Zhengyu Huang, Burigede Liu, and Anima Anandkumar, Fourier neu- ral operator with learned deformations for PDEs on general g eometries, Journal of Machine Learning Research 24 (2023), no. 388, 1–26

  18. [18]

    Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzade nesheli, Burigede liu, Kaushik Bhat- tacharya, Andrew Stuart, and Anima Anandkumar, Fourier neural operator for parametric partial differential equations , International Conference on Learning Representations, 2 021

  19. [19]

    Peilin Liu, Yuqing Liu, Xiang Zhou, and Ding-Xuan Zhou, Approximation of functionals on Korobov spaces with Fourier functional networks , Neural Networks 182 (2025), 106922

  20. [20]

    Yuqing Liu, Tong Mao, and Ding-Xuan Zhou, Approximation of functions from Korobov spaces by shallow neural networks , Information Sciences 670 (2024), 120573

  21. [21]

    5, 5465–5506

    Jianfeng Lu, Zuowei Shen, Haizhao Yang, and Shijun Zhang , Deep network approximation for smooth functions , SIAM Journal on Mathematical Analysis 53 (2021), no. 5, 5465–5506

  22. [22]

    3, 218–229

    Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and G eorge Em Karniadakis, Learning nonlinear operators via deeponet based on the universal app roximation theorem of operators , Nature Machine Intelligence 3 (2021), no. 3, 218–229. 18 YUWEN LI, GUOZHI ZHANG

  23. [23]

    6, Paper No

    Tong Mao and Ding-Xuan Zhou, Approximation of functions from Korobov spaces by deep convolutional neural networks , Advances in Computational Mathematics 48 (2022), no. 6, Paper No. 84, 26. MR 4519646

  24. [24]

    Tong Mao and Ding-Xuan Zhou, Rates of approximation by ReLU shallow neural networks , Journal of Complexity 79 (2023), 101784

  25. [25]

    1, 78–92

    Hadrien Montanelli and Qiang Du, New Error Bounds for Deep ReLU Networks Using Sparse Grids, SIAM Journal on Mathematics of Data Science 1 (2019), no. 1, 78–92

  26. [26]

    Philipp Petersen and Felix Voigtlaender, Optimal approximation of piecewise smooth functions using deep ReLU neural networks , Neural Networks 108 (2018), 296–330. 27. , Equivalence of approximation by convolutional neural netw orks and fully-connected networks, Proceedings of the American Mathematical Society 148 (2020), no. 4, 1567–1581

  27. [27]

    Maziar Raissi, Paris Perdikaris, and George E Karniadak is, Physics-informed neural networks: A deep learning framework for solving forward and inverse pr oblems involving nonlinear par- tial differential equations , Journal of Computational physics 378 (2019), 686–707

  28. [28]

    Zuowei Shen, Haizhao Yang, and Shijun Zhang, Nonlinear approximation via compositions , Neural Networks 119 (2019), 74–84

  29. [29]

    5, 1768–1811

    Zuowei Shen, Haizhao Yang, and Shijun Zhang, Deep network approximation characterized by number of neurons , Communications in Computational Physics 28 (2020), no. 5, 1768–1811

  30. [30]

    , Optimal approximation rate of ReLU networks in terms of widt h and depth , Journal de Math´ ematiques Pures et Appliqu´ ees157 (2022), 101–135

  31. [31]

    357, 1–52

    Jonathan W Siegel, Optimal approximation rates for deep ReLU neural networks o n Sobolev and Besov spaces , Journal of Machine Learning Research 24 (2023), no. 357, 1–52

  32. [32]

    Karen Simonyan and Andrew Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 (2014)

  33. [33]

    Taiji Suzuki, Adaptivity of deep ReLU network for learning in Besov and mix ed smooth Besov spaces: optimal rate and curse of dimensionality , International Conference on Learning Representations, 2019

  34. [34]

    Zixuan W ang, Qi Tang, W ei Guo, and Yingda Cheng, Sparse grid discontinuous Galerkin methods for high-dimensional elliptic equations , J. Comput. Phys. 314 (2016), 244–263

  35. [35]

    Yahong Yang and Yulong Lu, Near-optimal deep neural network approximation for Korobo v functions with respect to Lp and H1 norms, Neural Networks 180 (2024), 106702

  36. [36]

    Yunfei Yang, On the optimal approximation of sobolev and besov functions using deep relu neural networks , arXiv preprint arXiv:2409.00901 (2024)

  37. [37]

    Dmitry Yarotsky, Error bounds for approximations with deep ReLU networks , Neural Net- works 94 (2017), 103–114

  38. [38]

    Ding-Xuan Zhou, Theory of deep convolutional neural networks: Downsamplin g, Neural Net- works 124 (2020), 319–327

  39. [39]

    2, 787–794

    Ding-Xuan Zhou, Universality of deep convolutional neural networks , Applied and Computa- tional Harmonic Analysis 48 (2020), no. 2, 787–794. MR 4047545 School of Mathematical Sciences, Zhejiang University, Hangz hou, Zhejiang 310058, China Email address : liyuwen@zju.edu.cn Email address : gzzh@zju.edu.cn