Higher Order Approximation Rates for ReLU CNNs in Korobov Spaces

Guozhi Zhang; Yuwen Li

arxiv: 2501.11275 · v2 · submitted 2025-01-20 · 💻 cs.LG · cs.NA· math.NA

Higher Order Approximation Rates for ReLU CNNs in Korobov Spaces

Yuwen Li , Guozhi Zhang This is my paper

Pith reviewed 2026-05-23 04:51 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.NA

keywords ReLU CNNKorobov spaceapproximation ratesparse gridmixed derivativeconvolutional neural networkhigher order approximation

0 comments

The pith

ReLU CNNs can approximate Korobov functions with mixed derivatives of order m+1 at rates scaling with the (m+1)th power of depth.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates improved approximation rates for ReLU convolutional neural networks when approximating functions from higher-order Korobov spaces. For functions possessing mixed derivatives up to order m+1, the error bound improves from the classical second-order dependence on network depth to an (m+1)-order rate, aside from a logarithmic factor. This is achieved through the approximate representation of high-order sparse grid basis functions within the CNN architecture. A sympathetic reader would care because it shows that the expressivity of CNNs can exploit higher smoothness without being crippled by the curse of dimensionality.

Core claim

For target functions having a mixed derivative of order m+1 in each direction, ReLU CNNs achieve an L_p approximation rate of order (m+1) in terms of the network depth, modulo a logarithmic factor, by approximately representing the high-order sparse grid basis functions.

What carries the argument

approximate representation of high-order sparse grid basis functions by CNNs, which allows the depth to control the approximation order directly.

Load-bearing premise

The high-order sparse grid basis functions admit approximate representations by CNNs at depths and widths that scale to deliver the improved rate.

What would settle it

Finding a specific function with mixed derivative of order m+1 whose L_p approximation error by ReLU CNNs of depth d stays no better than O(1/d^2) for large d would disprove the improved rate.

Figures

Figures reproduced from arXiv: 2501.11275 by Guozhi Zhang, Yuwen Li.

**Figure 1.** Figure 1: Hierarchical ancestors of xl,i = 0.1875 (d = 1, l = 4, i = 3, α = 3) including its two neighbor points x3,1 = 0.125, x2,1 = 0.25 and another ancestor x1,1 = 0.5. In Section 4, we shall derive CNN approximation rates for f ∈ Km+1 p (Ω) from the higher order sparse grid interpolation (2.8) Inf(x) = X |l|1≤n+d−1 X i∈Il vl,iφ α l,i (x) where each vl,i is given in the next lemma. Lemma 2.1. [7, Lemma 4.5] Let w… view at source ↗

**Figure 2.** Figure 2: (a) Graphs of T1, T2, T3; (b) interpolants R1, R2 of x 2 . The approximate product ×eM,U (x, y) is then built upon RU . The next lemma provides an error bound of |×eM,U (x, y) − xy| (see [23]) as well as verifies nonnegativity of ×eM,U (x, y) when (x, y) ∈ [0, 1]2 . The non-negativity of ×eM,U (x, y) serves as a crucial property in our analysis and seems missing in the classical literature. The proof of … view at source ↗

**Figure 3.** Figure 3: An illustration of the proof of Lemma 3.3, l = k = n = 2. The above lemma demonstrates that some CNN can eliminate zeros in the input vector. Lemma 3.3 is useful for deriving the next lemma about approximately multiplying components in a partitioned vector by CNNs. Lemma 3.4. Let U, k, l ∈ N+ and M > 0. For any {y1, . . . , yl} ⊂ [0, M] k , there exists hJ of the form (1.2) with depth J ≤ (128+14U)lk s−1 +… view at source ↗

**Figure 4.** Figure 4: (a) d = 1, l = 4, i = 3, the basis function φ 2 l,i at grid point xl,i = 0.1875; (b) factors ρl,i,1, ρl,i,2 of φ 2 l,i. In general, when m ≥ 3 and |l|1 ≤ n + d − 1, if αj = m for some j, we set (4.1) ρlj ,ij ,k(x) := σ x − xlj ,ij ,k xlj ,ij − xlj ,ij ,k , k = 1, . . . , m [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: (a) d = 1, l = 4, i = 3, the basis function φ 3 l,i at grid point xl,i = 0.1875; (b) factors ρl,i,1, ρl,i,2, ρl,i,3 of φ 3 l,i. Consider the set of all piecewise linear factors: ρβi,k(xj ) : i = 1, . . . , N, j = 1, . . . , d, k = 1, . . . , m , where each ρβi,k(xj ) can be written as the form σ(aβi,j,kxj+bβi,j,k) for aβi,j,k, bβi,j,k ∈ R and 0 ≤ ρβi,k(xj ) ≤ 2 n+d−1 . Let w := [w1,1; . . . ; w1,m; . . .… view at source ↗

read the original abstract

This paper investigates the $L_p$ approximation error for higher order Korobov functions using deep convolutional neural networks (CNNs) with ReLU activation. For target functions having a mixed derivative of order m+1 in each direction, we improve classical approximation rate of second order to (m+1)-th order (modulo a logarithmic factor) in terms of the depth of CNNs. The key ingredient in our analysis is approximate representation of high-order sparse grid basis functions by CNNs. The results suggest that higher order expressivity of CNNs does not severely suffer from the curse of dimensionality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims an (m+1) approximation rate for ReLU CNNs on higher-order Korobov functions by representing sparse-grid bases, but the depth scaling of that representation step determines whether the improvement actually holds.

read the letter

The main result here is lifting the approximation rate from the classical second order up to order m+1 (modulo logs) for ReLU CNNs on functions with mixed derivatives of order m+1 in Korobov spaces. They do this by showing the high-order sparse-grid basis functions can be approximated by the networks at controlled depth and width. That is the concrete advance over prior second-order bounds for ReLU networks. The paper does a clean job of framing the problem around sparse grids to manage the high-dimensional setting while targeting higher smoothness, and the motivation about higher-order expressivity not suffering badly from the curse of dimensionality is stated plainly. The approach builds directly on known sparse-grid techniques and standard NN representation arguments, which keeps the logic straightforward. The soft spot is exactly the representation step for the basis functions. If the CNN depth needed for each product spline basis grows linearly with m or with the number of active multi-indices, or if the error per basis stays stuck at order 2, the claimed rate improvement does not go through. The abstract flags this as the key ingredient, so the full paper must deliver explicit depth and error bounds that avoid those hidden costs. No circularity or fitting issues appear in the setup. This paper is for readers already working on approximation rates for neural networks in high dimensions, especially those tracking CNN expressivity and sparse-grid methods. A specialist in that corner would get value from the rate calculation once the construction is verified. I would send it to peer review so the depth bounds can be checked in detail.

Referee Report

2 major / 2 minor

Summary. The paper claims that ReLU CNNs achieve L_p approximation rates of order (m+1) (modulo logarithmic factors) in depth for target functions in higher-order Korobov spaces possessing mixed derivatives of order m+1 in each coordinate. This improves upon the classical second-order rate for ReLU networks. The key technical step is an approximate representation of the associated high-order sparse-grid basis functions by CNNs; the results are presented as evidence that higher-order expressivity of CNNs does not suffer severely from the curse of dimensionality.

Significance. If the representation of the sparse-grid basis functions is established with depth scaling that yields the claimed rate, the work would strengthen the theoretical foundation for CNN approximation in high-dimensional mixed-smoothness settings and demonstrate that CNNs can exploit smoothness beyond the standard piecewise-linear regime without incurring prohibitive depth costs.

major comments (2)

[Section containing the representation lemma / theorem for sparse-grid bases] The headline rate improvement rests entirely on the approximate representation of high-order sparse-grid basis functions (products of 1-D splines of order m+1) by ReLU CNNs. The manuscript must supply the explicit construction, depth/width bounds, and error estimates for this representation; without them the (m+1) rate cannot be verified and the result reduces to the classical O(depth^{-2}) bound.
[Proof of the main approximation theorem] It is necessary to confirm that the depth required for the CNN representation of each basis function scales at most linearly with m (or better) and is independent of the number of active multi-indices; any linear dependence on m or on the cardinality of the sparse grid would cancel the claimed improvement over the second-order rate.

minor comments (2)

Clarify the precise dependence of the logarithmic factor on dimension d and smoothness m in the final error bound.
Add a short remark comparing the obtained rate with known results for fully connected ReLU networks on the same Korobov spaces.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. The two major points both concern the clarity and explicitness of the CNN construction for the sparse-grid basis functions. We address them point-by-point below and will revise the manuscript to make the relevant sections and proofs more prominent.

read point-by-point responses

Referee: [Section containing the representation lemma / theorem for sparse-grid bases] The headline rate improvement rests entirely on the approximate representation of high-order sparse-grid basis functions (products of 1-D splines of order m+1) by ReLU CNNs. The manuscript must supply the explicit construction, depth/width bounds, and error estimates for this representation; without them the (m+1) rate cannot be verified and the result reduces to the classical O(depth^{-2}) bound.

Authors: We agree that the headline claim depends on this construction and will ensure it is stated with full explicitness. The construction appears in Section 3 (Lemma 3.2 and the surrounding discussion): each 1-D spline of order m+1 is realized by a shallow ReLU network of depth O(m) and width O(1), after which the multivariate product is obtained by a convolutional layer that performs the necessary multiplications via the identity xy = ((x+y)^2 - (x-y)^2)/4 realized with two additional ReLU layers. The approximation error for each basis function is bounded by O(2^{-k}) with depth O(m+k). We will add a self-contained subsection titled “Explicit CNN realization of high-order sparse-grid basis functions” that collects the depth/width/error statements and moves the full inductive proof to the appendix. revision: yes
Referee: [Proof of the main approximation theorem] It is necessary to confirm that the depth required for the CNN representation of each basis function scales at most linearly with m (or better) and is independent of the number of active multi-indices; any linear dependence on m or on the cardinality of the sparse grid would cancel the claimed improvement over the second-order rate.

Authors: The depth bound is indeed linear in m and independent of both dimension d and the cardinality of the sparse grid. In the proof of the main theorem (Theorem 2.1), each individual basis function is approximated by its own CNN copy whose depth is O(m + log(1/ε)) regardless of how many other multi-indices are present; the final network is obtained by a single linear combination layer whose width equals the (finite) number of active basis functions but whose depth is unaffected. Consequently the overall depth remains O(m + log factors) and the (m+1)-order rate is preserved. We will insert an explicit remark after the statement of Theorem 2.1 that records this independence and cross-references the per-basis-function depth bound from Section 3. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation rests on independent representation construction

full rationale

The paper's central claim improves approximation rates to order m+1 by establishing approximate CNN representations of high-order sparse-grid basis functions. This representation step is presented as the key analytical ingredient and is not shown to reduce by definition, by fitting, or by self-citation chain to the target rate itself. No self-definitional loops, fitted inputs renamed as predictions, or ansatzes imported via overlapping-author citations appear in the abstract or described derivation. The result is therefore self-contained against external benchmarks once the representation property is accepted as independently verified.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard properties of Korobov spaces and the novel claim that CNNs can approximate the associated sparse-grid bases at the required order; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Standard definition and properties of higher-order Korobov spaces with mixed derivatives of order m+1
The target function class is defined via these spaces; the rate improvement is stated relative to this class.
ad hoc to paper CNNs admit approximate representations of high-order sparse grid basis functions with depth scaling that yields the (m+1) rate
This is explicitly called the key ingredient of the analysis.

pith-pipeline@v0.9.0 · 5627 in / 1470 out tokens · 28233 ms · 2026-05-23T04:51:15.600571+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The key ingredient in our analysis is approximate representation of high-order sparse grid basis functions by CNNs... improve classical approximation rate of second order to (m+1)-th order
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 1.3... L ≤ C s d^4 m^3 N (log2 N)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 2 internal anchors

[1]

Alpert, G

B. Alpert, G. Beylkin, D. Gines, and L. Vozovoi, Adaptive solution of partial diﬀerential equations in multiwavelet bases , J. Comput. Phys. 182 (2002), no. 1, 149–190

work page 2002
[2]

3, 524–549

Chenglong Bao, Qianxiao Li, Zuowei Shen, Cheng Tai, Lei W u , and Xueshuang Xiang, Ap- proximation analysis of convolutional neural networks , East Asian Journal on Applied Math- ematics 13 (2023), no. 3, 524–549

work page 2023
[3]

7970, 533–538

Kaifeng Bi, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, and Qi Tian, Accurate medium-range global weather forecasting with 3d neural net works, Nature 619 (2023), no. 7970, 533–538

work page 2023
[4]

Moise Blanchard and Mohammed Amine Bennouna, Shallow and deep networks are near- optimal approximators of Korobov functions , International Conference on Learning Represen- tations, 2022

work page 2022
[5]

Hans-Joachim Bungartz, A multigrid algorithm for higher order ﬁnite elements on spa rse grids, Electronic Transactions on Numerical Analysis 6 (1997), 63–77

work page 1997
[6]

thesis, Technische Universit¨ at M¨ unchen, 1998

, Finite elements of higher order on sparse grids , Ph.D. thesis, Technische Universit¨ at M¨ unchen, 1998

work page 1998
[7]

Hans-Joachim Bungartz and Michael Griebel, Sparse grids , Acta Numerica 13 (2004), 147–269

work page 2004
[8]

Alexey Dosovitskiy, An image is worth 16 × 16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010
[9]

Dennis Elbr¨ achter, Philipp Grohs, Arnulf Jentzen, and C hristoph Schwab, DNN expression rate analysis of high-dimensional PDEs: application to opt ion pricing , Constructive Approx- imation 55 (2022), no. 1, 3–71

work page 2022
[10]

Zhiying Fang, Han Feng, Shuo Huang, and Ding-Xuan Zhou, Theory of deep convolutional neural networks ii: Spherical analysis , Neural Networks 131 (2020), 154–162

work page 2020
[11]

Vasile Gradinaru and Ralf Hiptmair, Mixed ﬁnite elements on sparse grids , Numer. Math. 93 (2003), 471–495

work page 2003
[12]

3, Paper No

Juncai He, Lin Li, and Jinchao Xu, Approximation properties of deep ReLU CNNs , Research in the Mathematical Sciences 9 (2022), no. 3, Paper No. 38, 24. MR 4447410

work page 2022
[13]

Juncai He, Xinliang Liu, and Jinchao Xu, MgNO: Eﬃcient parameterization of linear opera- tors via multigrid , The Twelfth International Conference on Learning Represe ntations, 2024

work page 2024
[14]

Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, eds.), 2022

Jian Huang, Guohao Shen, Yuling Jiao, and Yuanyuan Lin, Approximation with CNNs in Sobolev space: with applications to classiﬁcation , Advances in Neural Information Processing Systems (Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, eds.), 2022

work page 2022
[15]

7873, 583–589

John Jumper, Richard Evans, Alexander Pritzel, Tim Gree n, Michael Figurnov, et al., Highly accurate protein structure prediction with alphafold , Nature 596 (2021), no. 7873, 583–589

work page 2021
[16]

Hinton, Imagenet classiﬁcation with deep convolutional neural networks , Commun

Alex Krizhevsky, Ilya Sutskever, and Geoﬀrey E. Hinton, Imagenet classiﬁcation with deep convolutional neural networks , Commun. ACM 60 (2017), no. 6, 84–90

work page 2017
[17]

388, 1–26

Zongyi Li, Daniel Zhengyu Huang, Burigede Liu, and Anima Anandkumar, Fourier neu- ral operator with learned deformations for PDEs on general g eometries, Journal of Machine Learning Research 24 (2023), no. 388, 1–26

work page 2023
[18]

Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzade nesheli, Burigede liu, Kaushik Bhat- tacharya, Andrew Stuart, and Anima Anandkumar, Fourier neural operator for parametric partial diﬀerential equations , International Conference on Learning Representations, 2 021

work page
[19]

Peilin Liu, Yuqing Liu, Xiang Zhou, and Ding-Xuan Zhou, Approximation of functionals on Korobov spaces with Fourier functional networks , Neural Networks 182 (2025), 106922

work page 2025
[20]

Yuqing Liu, Tong Mao, and Ding-Xuan Zhou, Approximation of functions from Korobov spaces by shallow neural networks , Information Sciences 670 (2024), 120573

work page 2024
[21]

5, 5465–5506

Jianfeng Lu, Zuowei Shen, Haizhao Yang, and Shijun Zhang , Deep network approximation for smooth functions , SIAM Journal on Mathematical Analysis 53 (2021), no. 5, 5465–5506

work page 2021
[22]

3, 218–229

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and G eorge Em Karniadakis, Learning nonlinear operators via deeponet based on the universal app roximation theorem of operators , Nature Machine Intelligence 3 (2021), no. 3, 218–229. 18 YUWEN LI, GUOZHI ZHANG

work page 2021
[23]

6, Paper No

Tong Mao and Ding-Xuan Zhou, Approximation of functions from Korobov spaces by deep convolutional neural networks , Advances in Computational Mathematics 48 (2022), no. 6, Paper No. 84, 26. MR 4519646

work page 2022
[24]

Tong Mao and Ding-Xuan Zhou, Rates of approximation by ReLU shallow neural networks , Journal of Complexity 79 (2023), 101784

work page 2023
[25]

1, 78–92

Hadrien Montanelli and Qiang Du, New Error Bounds for Deep ReLU Networks Using Sparse Grids, SIAM Journal on Mathematics of Data Science 1 (2019), no. 1, 78–92

work page 2019
[26]

Philipp Petersen and Felix Voigtlaender, Optimal approximation of piecewise smooth functions using deep ReLU neural networks , Neural Networks 108 (2018), 296–330. 27. , Equivalence of approximation by convolutional neural netw orks and fully-connected networks, Proceedings of the American Mathematical Society 148 (2020), no. 4, 1567–1581

work page 2018
[27]

Maziar Raissi, Paris Perdikaris, and George E Karniadak is, Physics-informed neural networks: A deep learning framework for solving forward and inverse pr oblems involving nonlinear par- tial diﬀerential equations , Journal of Computational physics 378 (2019), 686–707

work page 2019
[28]

Zuowei Shen, Haizhao Yang, and Shijun Zhang, Nonlinear approximation via compositions , Neural Networks 119 (2019), 74–84

work page 2019
[29]

5, 1768–1811

Zuowei Shen, Haizhao Yang, and Shijun Zhang, Deep network approximation characterized by number of neurons , Communications in Computational Physics 28 (2020), no. 5, 1768–1811

work page 2020
[30]

, Optimal approximation rate of ReLU networks in terms of widt h and depth , Journal de Math´ ematiques Pures et Appliqu´ ees157 (2022), 101–135

work page 2022
[31]

357, 1–52

Jonathan W Siegel, Optimal approximation rates for deep ReLU neural networks o n Sobolev and Besov spaces , Journal of Machine Learning Research 24 (2023), no. 357, 1–52

work page 2023
[32]

Karen Simonyan and Andrew Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[33]

Taiji Suzuki, Adaptivity of deep ReLU network for learning in Besov and mix ed smooth Besov spaces: optimal rate and curse of dimensionality , International Conference on Learning Representations, 2019

work page 2019
[34]

Zixuan W ang, Qi Tang, W ei Guo, and Yingda Cheng, Sparse grid discontinuous Galerkin methods for high-dimensional elliptic equations , J. Comput. Phys. 314 (2016), 244–263

work page 2016
[35]

Yahong Yang and Yulong Lu, Near-optimal deep neural network approximation for Korobo v functions with respect to Lp and H1 norms, Neural Networks 180 (2024), 106702

work page 2024
[36]

Yunfei Yang, On the optimal approximation of sobolev and besov functions using deep relu neural networks , arXiv preprint arXiv:2409.00901 (2024)

work page arXiv 2024
[37]

Dmitry Yarotsky, Error bounds for approximations with deep ReLU networks , Neural Net- works 94 (2017), 103–114

work page 2017
[38]

Ding-Xuan Zhou, Theory of deep convolutional neural networks: Downsamplin g, Neural Net- works 124 (2020), 319–327

work page 2020
[39]

2, 787–794

Ding-Xuan Zhou, Universality of deep convolutional neural networks , Applied and Computa- tional Harmonic Analysis 48 (2020), no. 2, 787–794. MR 4047545 School of Mathematical Sciences, Zhejiang University, Hangz hou, Zhejiang 310058, China Email address : liyuwen@zju.edu.cn Email address : gzzh@zju.edu.cn

work page 2020

[1] [1]

Alpert, G

B. Alpert, G. Beylkin, D. Gines, and L. Vozovoi, Adaptive solution of partial diﬀerential equations in multiwavelet bases , J. Comput. Phys. 182 (2002), no. 1, 149–190

work page 2002

[2] [2]

3, 524–549

Chenglong Bao, Qianxiao Li, Zuowei Shen, Cheng Tai, Lei W u , and Xueshuang Xiang, Ap- proximation analysis of convolutional neural networks , East Asian Journal on Applied Math- ematics 13 (2023), no. 3, 524–549

work page 2023

[3] [3]

7970, 533–538

Kaifeng Bi, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, and Qi Tian, Accurate medium-range global weather forecasting with 3d neural net works, Nature 619 (2023), no. 7970, 533–538

work page 2023

[4] [4]

Moise Blanchard and Mohammed Amine Bennouna, Shallow and deep networks are near- optimal approximators of Korobov functions , International Conference on Learning Represen- tations, 2022

work page 2022

[5] [5]

Hans-Joachim Bungartz, A multigrid algorithm for higher order ﬁnite elements on spa rse grids, Electronic Transactions on Numerical Analysis 6 (1997), 63–77

work page 1997

[6] [6]

thesis, Technische Universit¨ at M¨ unchen, 1998

, Finite elements of higher order on sparse grids , Ph.D. thesis, Technische Universit¨ at M¨ unchen, 1998

work page 1998

[7] [7]

Hans-Joachim Bungartz and Michael Griebel, Sparse grids , Acta Numerica 13 (2004), 147–269

work page 2004

[8] [8]

Alexey Dosovitskiy, An image is worth 16 × 16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010

[9] [9]

Dennis Elbr¨ achter, Philipp Grohs, Arnulf Jentzen, and C hristoph Schwab, DNN expression rate analysis of high-dimensional PDEs: application to opt ion pricing , Constructive Approx- imation 55 (2022), no. 1, 3–71

work page 2022

[10] [10]

Zhiying Fang, Han Feng, Shuo Huang, and Ding-Xuan Zhou, Theory of deep convolutional neural networks ii: Spherical analysis , Neural Networks 131 (2020), 154–162

work page 2020

[11] [11]

Vasile Gradinaru and Ralf Hiptmair, Mixed ﬁnite elements on sparse grids , Numer. Math. 93 (2003), 471–495

work page 2003

[12] [12]

3, Paper No

Juncai He, Lin Li, and Jinchao Xu, Approximation properties of deep ReLU CNNs , Research in the Mathematical Sciences 9 (2022), no. 3, Paper No. 38, 24. MR 4447410

work page 2022

[13] [13]

Juncai He, Xinliang Liu, and Jinchao Xu, MgNO: Eﬃcient parameterization of linear opera- tors via multigrid , The Twelfth International Conference on Learning Represe ntations, 2024

work page 2024

[14] [14]

Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, eds.), 2022

Jian Huang, Guohao Shen, Yuling Jiao, and Yuanyuan Lin, Approximation with CNNs in Sobolev space: with applications to classiﬁcation , Advances in Neural Information Processing Systems (Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, eds.), 2022

work page 2022

[15] [15]

7873, 583–589

John Jumper, Richard Evans, Alexander Pritzel, Tim Gree n, Michael Figurnov, et al., Highly accurate protein structure prediction with alphafold , Nature 596 (2021), no. 7873, 583–589

work page 2021

[16] [16]

Hinton, Imagenet classiﬁcation with deep convolutional neural networks , Commun

Alex Krizhevsky, Ilya Sutskever, and Geoﬀrey E. Hinton, Imagenet classiﬁcation with deep convolutional neural networks , Commun. ACM 60 (2017), no. 6, 84–90

work page 2017

[17] [17]

388, 1–26

Zongyi Li, Daniel Zhengyu Huang, Burigede Liu, and Anima Anandkumar, Fourier neu- ral operator with learned deformations for PDEs on general g eometries, Journal of Machine Learning Research 24 (2023), no. 388, 1–26

work page 2023

[18] [18]

Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzade nesheli, Burigede liu, Kaushik Bhat- tacharya, Andrew Stuart, and Anima Anandkumar, Fourier neural operator for parametric partial diﬀerential equations , International Conference on Learning Representations, 2 021

work page

[19] [19]

Peilin Liu, Yuqing Liu, Xiang Zhou, and Ding-Xuan Zhou, Approximation of functionals on Korobov spaces with Fourier functional networks , Neural Networks 182 (2025), 106922

work page 2025

[20] [20]

Yuqing Liu, Tong Mao, and Ding-Xuan Zhou, Approximation of functions from Korobov spaces by shallow neural networks , Information Sciences 670 (2024), 120573

work page 2024

[21] [21]

5, 5465–5506

Jianfeng Lu, Zuowei Shen, Haizhao Yang, and Shijun Zhang , Deep network approximation for smooth functions , SIAM Journal on Mathematical Analysis 53 (2021), no. 5, 5465–5506

work page 2021

[22] [22]

3, 218–229

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and G eorge Em Karniadakis, Learning nonlinear operators via deeponet based on the universal app roximation theorem of operators , Nature Machine Intelligence 3 (2021), no. 3, 218–229. 18 YUWEN LI, GUOZHI ZHANG

work page 2021

[23] [23]

6, Paper No

Tong Mao and Ding-Xuan Zhou, Approximation of functions from Korobov spaces by deep convolutional neural networks , Advances in Computational Mathematics 48 (2022), no. 6, Paper No. 84, 26. MR 4519646

work page 2022

[24] [24]

Tong Mao and Ding-Xuan Zhou, Rates of approximation by ReLU shallow neural networks , Journal of Complexity 79 (2023), 101784

work page 2023

[25] [25]

1, 78–92

Hadrien Montanelli and Qiang Du, New Error Bounds for Deep ReLU Networks Using Sparse Grids, SIAM Journal on Mathematics of Data Science 1 (2019), no. 1, 78–92

work page 2019

[26] [26]

Philipp Petersen and Felix Voigtlaender, Optimal approximation of piecewise smooth functions using deep ReLU neural networks , Neural Networks 108 (2018), 296–330. 27. , Equivalence of approximation by convolutional neural netw orks and fully-connected networks, Proceedings of the American Mathematical Society 148 (2020), no. 4, 1567–1581

work page 2018

[27] [27]

Maziar Raissi, Paris Perdikaris, and George E Karniadak is, Physics-informed neural networks: A deep learning framework for solving forward and inverse pr oblems involving nonlinear par- tial diﬀerential equations , Journal of Computational physics 378 (2019), 686–707

work page 2019

[28] [28]

Zuowei Shen, Haizhao Yang, and Shijun Zhang, Nonlinear approximation via compositions , Neural Networks 119 (2019), 74–84

work page 2019

[29] [29]

5, 1768–1811

Zuowei Shen, Haizhao Yang, and Shijun Zhang, Deep network approximation characterized by number of neurons , Communications in Computational Physics 28 (2020), no. 5, 1768–1811

work page 2020

[30] [30]

, Optimal approximation rate of ReLU networks in terms of widt h and depth , Journal de Math´ ematiques Pures et Appliqu´ ees157 (2022), 101–135

work page 2022

[31] [31]

357, 1–52

Jonathan W Siegel, Optimal approximation rates for deep ReLU neural networks o n Sobolev and Besov spaces , Journal of Machine Learning Research 24 (2023), no. 357, 1–52

work page 2023

[32] [32]

Karen Simonyan and Andrew Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[33] [33]

Taiji Suzuki, Adaptivity of deep ReLU network for learning in Besov and mix ed smooth Besov spaces: optimal rate and curse of dimensionality , International Conference on Learning Representations, 2019

work page 2019

[34] [34]

Zixuan W ang, Qi Tang, W ei Guo, and Yingda Cheng, Sparse grid discontinuous Galerkin methods for high-dimensional elliptic equations , J. Comput. Phys. 314 (2016), 244–263

work page 2016

[35] [35]

Yahong Yang and Yulong Lu, Near-optimal deep neural network approximation for Korobo v functions with respect to Lp and H1 norms, Neural Networks 180 (2024), 106702

work page 2024

[36] [36]

Yunfei Yang, On the optimal approximation of sobolev and besov functions using deep relu neural networks , arXiv preprint arXiv:2409.00901 (2024)

work page arXiv 2024

[37] [37]

Dmitry Yarotsky, Error bounds for approximations with deep ReLU networks , Neural Net- works 94 (2017), 103–114

work page 2017

[38] [38]

Ding-Xuan Zhou, Theory of deep convolutional neural networks: Downsamplin g, Neural Net- works 124 (2020), 319–327

work page 2020

[39] [39]

2, 787–794

Ding-Xuan Zhou, Universality of deep convolutional neural networks , Applied and Computa- tional Harmonic Analysis 48 (2020), no. 2, 787–794. MR 4047545 School of Mathematical Sciences, Zhejiang University, Hangz hou, Zhejiang 310058, China Email address : liyuwen@zju.edu.cn Email address : gzzh@zju.edu.cn

work page 2020