Higher Order Approximation Rates for ReLU CNNs in Korobov Spaces
Pith reviewed 2026-05-23 04:51 UTC · model grok-4.3
The pith
ReLU CNNs can approximate Korobov functions with mixed derivatives of order m+1 at rates scaling with the (m+1)th power of depth.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For target functions having a mixed derivative of order m+1 in each direction, ReLU CNNs achieve an L_p approximation rate of order (m+1) in terms of the network depth, modulo a logarithmic factor, by approximately representing the high-order sparse grid basis functions.
What carries the argument
approximate representation of high-order sparse grid basis functions by CNNs, which allows the depth to control the approximation order directly.
Load-bearing premise
The high-order sparse grid basis functions admit approximate representations by CNNs at depths and widths that scale to deliver the improved rate.
What would settle it
Finding a specific function with mixed derivative of order m+1 whose L_p approximation error by ReLU CNNs of depth d stays no better than O(1/d^2) for large d would disprove the improved rate.
Figures
read the original abstract
This paper investigates the $L_p$ approximation error for higher order Korobov functions using deep convolutional neural networks (CNNs) with ReLU activation. For target functions having a mixed derivative of order m+1 in each direction, we improve classical approximation rate of second order to (m+1)-th order (modulo a logarithmic factor) in terms of the depth of CNNs. The key ingredient in our analysis is approximate representation of high-order sparse grid basis functions by CNNs. The results suggest that higher order expressivity of CNNs does not severely suffer from the curse of dimensionality.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that ReLU CNNs achieve L_p approximation rates of order (m+1) (modulo logarithmic factors) in depth for target functions in higher-order Korobov spaces possessing mixed derivatives of order m+1 in each coordinate. This improves upon the classical second-order rate for ReLU networks. The key technical step is an approximate representation of the associated high-order sparse-grid basis functions by CNNs; the results are presented as evidence that higher-order expressivity of CNNs does not suffer severely from the curse of dimensionality.
Significance. If the representation of the sparse-grid basis functions is established with depth scaling that yields the claimed rate, the work would strengthen the theoretical foundation for CNN approximation in high-dimensional mixed-smoothness settings and demonstrate that CNNs can exploit smoothness beyond the standard piecewise-linear regime without incurring prohibitive depth costs.
major comments (2)
- [Section containing the representation lemma / theorem for sparse-grid bases] The headline rate improvement rests entirely on the approximate representation of high-order sparse-grid basis functions (products of 1-D splines of order m+1) by ReLU CNNs. The manuscript must supply the explicit construction, depth/width bounds, and error estimates for this representation; without them the (m+1) rate cannot be verified and the result reduces to the classical O(depth^{-2}) bound.
- [Proof of the main approximation theorem] It is necessary to confirm that the depth required for the CNN representation of each basis function scales at most linearly with m (or better) and is independent of the number of active multi-indices; any linear dependence on m or on the cardinality of the sparse grid would cancel the claimed improvement over the second-order rate.
minor comments (2)
- Clarify the precise dependence of the logarithmic factor on dimension d and smoothness m in the final error bound.
- Add a short remark comparing the obtained rate with known results for fully connected ReLU networks on the same Korobov spaces.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. The two major points both concern the clarity and explicitness of the CNN construction for the sparse-grid basis functions. We address them point-by-point below and will revise the manuscript to make the relevant sections and proofs more prominent.
read point-by-point responses
-
Referee: [Section containing the representation lemma / theorem for sparse-grid bases] The headline rate improvement rests entirely on the approximate representation of high-order sparse-grid basis functions (products of 1-D splines of order m+1) by ReLU CNNs. The manuscript must supply the explicit construction, depth/width bounds, and error estimates for this representation; without them the (m+1) rate cannot be verified and the result reduces to the classical O(depth^{-2}) bound.
Authors: We agree that the headline claim depends on this construction and will ensure it is stated with full explicitness. The construction appears in Section 3 (Lemma 3.2 and the surrounding discussion): each 1-D spline of order m+1 is realized by a shallow ReLU network of depth O(m) and width O(1), after which the multivariate product is obtained by a convolutional layer that performs the necessary multiplications via the identity xy = ((x+y)^2 - (x-y)^2)/4 realized with two additional ReLU layers. The approximation error for each basis function is bounded by O(2^{-k}) with depth O(m+k). We will add a self-contained subsection titled “Explicit CNN realization of high-order sparse-grid basis functions” that collects the depth/width/error statements and moves the full inductive proof to the appendix. revision: yes
-
Referee: [Proof of the main approximation theorem] It is necessary to confirm that the depth required for the CNN representation of each basis function scales at most linearly with m (or better) and is independent of the number of active multi-indices; any linear dependence on m or on the cardinality of the sparse grid would cancel the claimed improvement over the second-order rate.
Authors: The depth bound is indeed linear in m and independent of both dimension d and the cardinality of the sparse grid. In the proof of the main theorem (Theorem 2.1), each individual basis function is approximated by its own CNN copy whose depth is O(m + log(1/ε)) regardless of how many other multi-indices are present; the final network is obtained by a single linear combination layer whose width equals the (finite) number of active basis functions but whose depth is unaffected. Consequently the overall depth remains O(m + log factors) and the (m+1)-order rate is preserved. We will insert an explicit remark after the statement of Theorem 2.1 that records this independence and cross-references the per-basis-function depth bound from Section 3. revision: yes
Circularity Check
No circularity; derivation rests on independent representation construction
full rationale
The paper's central claim improves approximation rates to order m+1 by establishing approximate CNN representations of high-order sparse-grid basis functions. This representation step is presented as the key analytical ingredient and is not shown to reduce by definition, by fitting, or by self-citation chain to the target rate itself. No self-definitional loops, fitted inputs renamed as predictions, or ansatzes imported via overlapping-author citations appear in the abstract or described derivation. The result is therefore self-contained against external benchmarks once the representation property is accepted as independently verified.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Standard definition and properties of higher-order Korobov spaces with mixed derivatives of order m+1
- ad hoc to paper CNNs admit approximate representations of high-order sparse grid basis functions with depth scaling that yields the (m+1) rate
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The key ingredient in our analysis is approximate representation of high-order sparse grid basis functions by CNNs... improve classical approximation rate of second order to (m+1)-th order
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 1.3... L ≤ C s d^4 m^3 N (log2 N)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
Chenglong Bao, Qianxiao Li, Zuowei Shen, Cheng Tai, Lei W u , and Xueshuang Xiang, Ap- proximation analysis of convolutional neural networks , East Asian Journal on Applied Math- ematics 13 (2023), no. 3, 524–549
work page 2023
-
[3]
Kaifeng Bi, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, and Qi Tian, Accurate medium-range global weather forecasting with 3d neural net works, Nature 619 (2023), no. 7970, 533–538
work page 2023
-
[4]
Moise Blanchard and Mohammed Amine Bennouna, Shallow and deep networks are near- optimal approximators of Korobov functions , International Conference on Learning Represen- tations, 2022
work page 2022
-
[5]
Hans-Joachim Bungartz, A multigrid algorithm for higher order finite elements on spa rse grids, Electronic Transactions on Numerical Analysis 6 (1997), 63–77
work page 1997
-
[6]
thesis, Technische Universit¨ at M¨ unchen, 1998
, Finite elements of higher order on sparse grids , Ph.D. thesis, Technische Universit¨ at M¨ unchen, 1998
work page 1998
-
[7]
Hans-Joachim Bungartz and Michael Griebel, Sparse grids , Acta Numerica 13 (2004), 147–269
work page 2004
-
[8]
Alexey Dosovitskiy, An image is worth 16 × 16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[9]
Dennis Elbr¨ achter, Philipp Grohs, Arnulf Jentzen, and C hristoph Schwab, DNN expression rate analysis of high-dimensional PDEs: application to opt ion pricing , Constructive Approx- imation 55 (2022), no. 1, 3–71
work page 2022
-
[10]
Zhiying Fang, Han Feng, Shuo Huang, and Ding-Xuan Zhou, Theory of deep convolutional neural networks ii: Spherical analysis , Neural Networks 131 (2020), 154–162
work page 2020
-
[11]
Vasile Gradinaru and Ralf Hiptmair, Mixed finite elements on sparse grids , Numer. Math. 93 (2003), 471–495
work page 2003
-
[12]
Juncai He, Lin Li, and Jinchao Xu, Approximation properties of deep ReLU CNNs , Research in the Mathematical Sciences 9 (2022), no. 3, Paper No. 38, 24. MR 4447410
work page 2022
-
[13]
Juncai He, Xinliang Liu, and Jinchao Xu, MgNO: Efficient parameterization of linear opera- tors via multigrid , The Twelfth International Conference on Learning Represe ntations, 2024
work page 2024
-
[14]
Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, eds.), 2022
Jian Huang, Guohao Shen, Yuling Jiao, and Yuanyuan Lin, Approximation with CNNs in Sobolev space: with applications to classification , Advances in Neural Information Processing Systems (Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, eds.), 2022
work page 2022
-
[15]
John Jumper, Richard Evans, Alexander Pritzel, Tim Gree n, Michael Figurnov, et al., Highly accurate protein structure prediction with alphafold , Nature 596 (2021), no. 7873, 583–589
work page 2021
-
[16]
Hinton, Imagenet classification with deep convolutional neural networks , Commun
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton, Imagenet classification with deep convolutional neural networks , Commun. ACM 60 (2017), no. 6, 84–90
work page 2017
- [17]
-
[18]
Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzade nesheli, Burigede liu, Kaushik Bhat- tacharya, Andrew Stuart, and Anima Anandkumar, Fourier neural operator for parametric partial differential equations , International Conference on Learning Representations, 2 021
-
[19]
Peilin Liu, Yuqing Liu, Xiang Zhou, and Ding-Xuan Zhou, Approximation of functionals on Korobov spaces with Fourier functional networks , Neural Networks 182 (2025), 106922
work page 2025
-
[20]
Yuqing Liu, Tong Mao, and Ding-Xuan Zhou, Approximation of functions from Korobov spaces by shallow neural networks , Information Sciences 670 (2024), 120573
work page 2024
-
[21]
Jianfeng Lu, Zuowei Shen, Haizhao Yang, and Shijun Zhang , Deep network approximation for smooth functions , SIAM Journal on Mathematical Analysis 53 (2021), no. 5, 5465–5506
work page 2021
-
[22]
Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and G eorge Em Karniadakis, Learning nonlinear operators via deeponet based on the universal app roximation theorem of operators , Nature Machine Intelligence 3 (2021), no. 3, 218–229. 18 YUWEN LI, GUOZHI ZHANG
work page 2021
-
[23]
Tong Mao and Ding-Xuan Zhou, Approximation of functions from Korobov spaces by deep convolutional neural networks , Advances in Computational Mathematics 48 (2022), no. 6, Paper No. 84, 26. MR 4519646
work page 2022
-
[24]
Tong Mao and Ding-Xuan Zhou, Rates of approximation by ReLU shallow neural networks , Journal of Complexity 79 (2023), 101784
work page 2023
- [25]
-
[26]
Philipp Petersen and Felix Voigtlaender, Optimal approximation of piecewise smooth functions using deep ReLU neural networks , Neural Networks 108 (2018), 296–330. 27. , Equivalence of approximation by convolutional neural netw orks and fully-connected networks, Proceedings of the American Mathematical Society 148 (2020), no. 4, 1567–1581
work page 2018
-
[27]
Maziar Raissi, Paris Perdikaris, and George E Karniadak is, Physics-informed neural networks: A deep learning framework for solving forward and inverse pr oblems involving nonlinear par- tial differential equations , Journal of Computational physics 378 (2019), 686–707
work page 2019
-
[28]
Zuowei Shen, Haizhao Yang, and Shijun Zhang, Nonlinear approximation via compositions , Neural Networks 119 (2019), 74–84
work page 2019
-
[29]
Zuowei Shen, Haizhao Yang, and Shijun Zhang, Deep network approximation characterized by number of neurons , Communications in Computational Physics 28 (2020), no. 5, 1768–1811
work page 2020
-
[30]
, Optimal approximation rate of ReLU networks in terms of widt h and depth , Journal de Math´ ematiques Pures et Appliqu´ ees157 (2022), 101–135
work page 2022
- [31]
-
[32]
Karen Simonyan and Andrew Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[33]
Taiji Suzuki, Adaptivity of deep ReLU network for learning in Besov and mix ed smooth Besov spaces: optimal rate and curse of dimensionality , International Conference on Learning Representations, 2019
work page 2019
-
[34]
Zixuan W ang, Qi Tang, W ei Guo, and Yingda Cheng, Sparse grid discontinuous Galerkin methods for high-dimensional elliptic equations , J. Comput. Phys. 314 (2016), 244–263
work page 2016
-
[35]
Yahong Yang and Yulong Lu, Near-optimal deep neural network approximation for Korobo v functions with respect to Lp and H1 norms, Neural Networks 180 (2024), 106702
work page 2024
- [36]
-
[37]
Dmitry Yarotsky, Error bounds for approximations with deep ReLU networks , Neural Net- works 94 (2017), 103–114
work page 2017
-
[38]
Ding-Xuan Zhou, Theory of deep convolutional neural networks: Downsamplin g, Neural Net- works 124 (2020), 319–327
work page 2020
-
[39]
Ding-Xuan Zhou, Universality of deep convolutional neural networks , Applied and Computa- tional Harmonic Analysis 48 (2020), no. 2, 787–794. MR 4047545 School of Mathematical Sciences, Zhejiang University, Hangz hou, Zhejiang 310058, China Email address : liyuwen@zju.edu.cn Email address : gzzh@zju.edu.cn
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.