Two-Dimensional Deep ReLU CNN Approximation for Korobov Functions: A Constructive Approach
Pith reviewed 2026-05-23 01:12 UTC · model grok-4.3
The pith
Two-dimensional deep ReLU CNNs approximate Korobov functions at near-optimal rates via an explicit construction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose a fully constructive approach for building 2D CNNs to approximate Korobov functions and provide a rigorous analysis of the complexity of the constructed networks. Our results demonstrate that 2D CNNs achieve near-optimal approximation rates under the continuous weight selection model, significantly alleviating the curse of dimensionality.
What carries the argument
An explicit, constructive procedure that assembles multi-channel 2D convolutional layers with zero-padding and ReLU activations followed by a fully connected layer to realize the approximation.
If this is right
- Korobov functions in two dimensions can be approximated by 2D CNNs with rates that improve on generic high-dimensional bounds.
- The number of parameters and layers required remains controlled independently of certain smoothness parameters.
- The same constructive pattern supplies a template for analyzing other periodic or mixed-derivative function classes.
- Theoretical support is given for preferring 2D CNN architectures over fully connected networks for these targets.
Where Pith is reading between the lines
- If the continuous-weight rates survive discretization of the weights, the same networks could be trained by gradient descent with only modest loss of efficiency.
- The construction may generalize to inputs of dimension greater than two once the padding and channel-count rules are adjusted accordingly.
- Similar constructive arguments could be attempted for other activation functions or for approximation in different norms.
Load-bearing premise
The continuous weight selection model accurately reflects practical CNN training and the explicit construction extends without hidden constants that grow with the smoothness parameter of the Korobov class.
What would settle it
Compute the actual L2 approximation error achieved by the constructed 2D CNNs on a concrete Korobov function as depth and channel count increase and check whether the observed rate stays within a fixed multiple of the theoretical near-optimal bound.
read the original abstract
This paper investigates approximation capabilities of two-dimensional (2D) deep convolutional neural networks (CNNs), with Korobov functions serving as a benchmark. We focus on 2D CNNs, comprising multi-channel convolutional layers with zero-padding and ReLU activations, followed by a fully connected layer. We propose a fully constructive approach for building 2D CNNs to approximate Korobov functions and provide a rigorous analysis of the complexity of the constructed networks. Our results demonstrate that 2D CNNs achieve near-optimal approximation rates under the continuous weight selection model, significantly alleviating the curse of dimensionality. This work provides a solid theoretical foundation for 2D CNNs and illustrates their potential for broader applications in function approximation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a fully constructive method for approximating Korobov functions on the unit square using two-dimensional deep ReLU CNNs consisting of multi-channel convolutional layers with zero-padding, ReLU activations, and a terminal fully connected layer. Under the continuous weight selection model it claims to prove that the constructed networks attain near-optimal approximation rates (matching Kolmogorov widths of the Korobov class) with network size and depth scaling polynomially in 1/ε, thereby alleviating the curse of dimensionality.
Significance. A verified constructive result establishing s-independent polynomial rates for 2D CNN approximation of Korobov functions would supply a concrete theoretical foundation for CNN expressivity in anisotropic smoothness settings and would be a useful reference point for subsequent work on CNN approximation theory.
major comments (2)
- [§4, Theorem 4.2] §4, Theorem 4.2 (main approximation result): the claimed near-optimality requires that the network size N(ε,s) and the magnitudes of the selected weights remain free of exponential factors in the smoothness parameter s; the explicit construction (via 1D lifting or tensorization) must be inspected to confirm that no such exp(c s) blow-up occurs in channel count or weight bounds, as this would reduce the result to standard Sobolev-type rates rather than an alleviation of dimensionality effects.
- [§3.2] §3.2 (network architecture definition): the continuous-weight model is invoked without a quantitative comparison to the discrete-weight case; if the construction relies on weight values whose bit-complexity or dynamic range grows with s, the practical relevance of the rate must be clarified.
minor comments (2)
- [Abstract] The abstract states 'rigorous analysis' but does not indicate the precise scaling exponents or the dependence on s; adding one sentence summarizing the network-size bound would improve readability.
- [§2] Notation for the Korobov class K^s_2 and the mixed-derivative seminorm should be introduced once in §2 and used consistently thereafter.
Simulated Author's Rebuttal
We thank the referee for the careful reading and the detailed comments on our manuscript. We address each major comment below with point-by-point responses. Our construction is fully explicit, and we are prepared to add clarifications to make the dependence on the smoothness parameter s fully transparent.
read point-by-point responses
-
Referee: [§4, Theorem 4.2] §4, Theorem 4.2 (main approximation result): the claimed near-optimality requires that the network size N(ε,s) and the magnitudes of the selected weights remain free of exponential factors in the smoothness parameter s; the explicit construction (via 1D lifting or tensorization) must be inspected to confirm that no such exp(c s) blow-up occurs in channel count or weight bounds, as this would reduce the result to standard Sobolev-type rates rather than an alleviation of dimensionality effects.
Authors: The proof of Theorem 4.2 gives a fully explicit construction that first builds 1D ReLU networks for the marginal functions and then lifts them to 2D via a tensor-product structure realized by multi-channel convolutions with zero-padding. Direct inspection of the channel counts, depths, and weight selections in this lifting shows that both the number of channels and the weight magnitudes scale polynomially in 1/ε with the exponent depending on s only through the approximation rate itself; no exponential factor exp(c s) appears in the network size or in the weight bounds. This is what permits the claimed near-optimal rates that alleviate the dimensionality effect for the Korobov class. We will add a short lemma (or a dedicated remark immediately after the proof) that explicitly records the s-dependence of N(ε,s) and the weight bounds to make the absence of exp(c s) factors immediate. revision: partial
-
Referee: [§3.2] §3.2 (network architecture definition): the continuous-weight model is invoked without a quantitative comparison to the discrete-weight case; if the construction relies on weight values whose bit-complexity or dynamic range grows with s, the practical relevance of the rate must be clarified.
Authors: Section 3.2 introduces the continuous-weight model because the goal is to establish expressivity upper bounds; this is standard in the approximation-theory literature. In the explicit construction of Theorem 4.2 the selected weights are bounded by constants independent of s (in fact, the convolutional filters use only weights in {0,1} after normalization). Consequently the dynamic range does not grow with s, and any subsequent discretization to finite precision incurs only a logarithmic bit-cost independent of s. We will insert a brief paragraph at the end of §3.2 that states this bound and notes that the approximation rates therefore remain meaningful under quantization with precision polynomial in ε. revision: partial
Circularity Check
No circularity; derivation is a self-contained constructive proof
full rationale
The paper describes an explicit constructive method to build 2D CNNs (multi-channel conv + ReLU + FC) that approximate Korobov functions, followed by a complexity analysis showing near-optimal rates under the continuous-weight model. No equations, fitted parameters, or self-citations are visible that would make any claimed rate or network size reduce by definition to the target error or to prior results by the same authors. The central claim rests on a mathematical construction whose validity can be checked independently of the final rate statement, satisfying the criteria for a non-circular derivation.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard mathematical properties of ReLU activations and zero-padded convolutions hold in the continuous-weight model.
Forward citations
Cited by 1 Pith paper
-
Neural Flow Operators can Approximate any Operator: Abstract Frameworks and Universal Approcimations
Neural flow operators with composition and separation structures are proven to universally approximate any operator in finite and infinite dimensions, recovering ResNet-type and plain architectures via time discretizations.
Reference graph
Works this paper leans on
-
[1]
Shallow an d deep networks are near- optimal approximators of Korobov functions
Moise Blanchard and Mohammed Amine Bennouna. Shallow an d deep networks are near- optimal approximators of Korobov functions. In International Conference on Learning Rep- resentations, 2021
work page 2021
-
[2]
Hans-Joachim Bungartz and Michael Griebel. Sparse grid s. Acta Numerica, 13:147–269, 2004
work page 2004
-
[3]
Dinh Dung, Vladimir N. Temlyakov, and Tino Ullrich. Hyperbolic Cross Approximation . Springer International Publishing, 2018
work page 2018
-
[4]
Learning Korobov fun ctions by correntropy and convo- lutional neural networks
Zhiying Fang, Tong Mao, and Jun Fan. Learning Korobov fun ctions by correntropy and convo- lutional neural networks. Neural Computation, 36(4):718–743, 2024
work page 2024
-
[5]
Ian Goodfellow, Y oshua Bengio, and Aaron Courville. Deep Learning. MIT press, 2016
work page 2016
-
[6]
De ep convolutional neural networks with zero-padding: Feature extraction and learning
Zhi Han, Baichen Liu, Shao-Bo Lin, and Ding-Xuan Zhou. De ep convolutional neural networks with zero-padding: Feature extraction and learning. arXiv preprint arXiv:2307.16203, 2023
-
[7]
Approximation propert ies of deep ReLU CNNs
Juncai He, Lin Li, and Jinchao Xu. Approximation propert ies of deep ReLU CNNs. Research in the Mathematical Sciences , 9(3):38, 2022
work page 2022
-
[8]
MgNet: A unified framework of mul tigrid and convolutional neural network
Juncai He and Jinchao Xu. MgNet: A unified framework of mul tigrid and convolutional neural network. Science China Mathematics, 62:1331–1354, 2019
work page 2019
-
[9]
De ep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. De ep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pa ttern Recogni- tion, pages 770–778, 2016. 14
work page 2016
-
[10]
Densely connected convolutional networks
Gao Huang, Zhuang Liu, Laurens V an Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4700–4708, 2017
work page 2017
-
[11]
Physics-informed machine learning
George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Par is Perdikaris, Sifan Wang, and Liu Y ang. Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021
work page 2021
-
[12]
Michael Kohler and Sophie Langer. Statistical theory f or image classification using deep convo- lutional neural networks with cross-entropy loss. arXiv preprint arXiv:2011.13602, 2020
-
[13]
Imagenet classification with deep convolutional neural networks
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton . Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems , 25, 2012
work page 2012
-
[14]
Y ann LeCun, Y oshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015
work page 2015
-
[15]
Medical image classification with convolutional neural network
Qing Li, Weidong Cai, Xiaogang Wang, Y un Zhou, David Dag an Feng, and Mei Chen. Medical image classification with convolutional neural network. In 13th International Conference on Control Automation Robotics & Vision (ICARCV) , pages 844–848, 2014
work page 2014
-
[16]
Approximat ing functions with multi-features by deep convolutional neural networks
Tong Mao, Zhongjie Shi, and Ding-Xuan Zhou. Approximat ing functions with multi-features by deep convolutional neural networks. Analysis and Applications, 21(01):93–125, 2023
work page 2023
-
[17]
Approximation of function s from Korobov spaces by deep convolutional neural networks
Tong Mao and Ding-Xuan Zhou. Approximation of function s from Korobov spaces by deep convolutional neural networks. Advances in Computational Mathematics , 48(6):84, 2022
work page 2022
-
[18]
New error bounds for de ep ReLU networks using sparse grids
Hadrien Montanelli and Qiang Du. New error bounds for de ep ReLU networks using sparse grids. SIAM Journal on Mathematics of Data Science , 1(1):78–92, 2019
work page 2019
-
[19]
Tractability of Multivariate Problems: V olume I: Lin- ear Information
Erich Novak and Henryk Wo´ zniakowski. Tractability of Multivariate Problems: V olume I: Lin- ear Information. European Mathematical Society, 2008
work page 2008
-
[20]
Expo nential ReLU DNN expression of holomorphic maps in high dimension
Joost Opschoor, Christoph Schwab, and Jakob Zech. Expo nential ReLU DNN expression of holomorphic maps in high dimension. Constructive Approximation, 55(1):537–582, 2022
work page 2022
-
[21]
Equivalence of approximation by convolutional neural networks and fully-connected networks
Philipp Petersen and Felix V oigtlaender. Equivalence of approximation by convolutional neural networks and fully-connected networks. Proceedings of the American Mathematical Society , 148(4):1567–1581, 2020
work page 2020
-
[22]
U- net: Convolutional networks for biomedical image segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 234–241. Springer, 2015
work page 2015
-
[23]
Optimal approximation rates for dee p ReLU neural networks on Sobolev and Besov spaces
Jonathan W Siegel. Optimal approximation rates for dee p ReLU neural networks on Sobolev and Besov spaces. Journal of Machine Learning Research, 24(357):1–52, 2023
work page 2023
-
[24]
V ery deep convolutional net works for large-scale image recogni- tion
K Simonyan and A Zisserman. V ery deep convolutional net works for large-scale image recogni- tion. In 3rd International Conference on Learning Representations (ICLR 2015). Computational and Biological Learning Society, 2015
work page 2015
-
[25]
Error bounds for approximations with deep ReLU networks
Dmitry Y arotsky. Error bounds for approximations with deep ReLU networks. Neural Networks, 94:103–114, 2017. 15
work page 2017
-
[26]
Object detection with deep learning: A review
Zhong-Qiu Zhao, Peng Zheng, Shou-tao Xu, and Xindong Wu . Object detection with deep learning: A review. IEEE Transactions on Neural Networks and Learning Systems, 30(11):3212– 3232, 2019
work page 2019
-
[27]
Theory of deep convolutional neural ne tworks: Downsampling
Ding-Xuan Zhou. Theory of deep convolutional neural ne tworks: Downsampling. Neural Net- works, 124:319–327, 2020
work page 2020
-
[28]
Universality of deep convolutional ne ural networks
Ding-Xuan Zhou. Universality of deep convolutional ne ural networks. Applied and Computa- tional Harmonic Analysis, 48(2):787–794, 2020. Appendices A Basic CNN Constructions In this section, we collect some important 2D deep ReLU CNN co nstructions which will be used repeatedly to construct more complex networks. Lemma 1 (Widening CNNs). Let k , d,W1,W2, ...
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.