pith. sign in

arxiv: 2503.07976 · v2 · submitted 2025-03-11 · 📊 stat.ML · cs.LG

Two-Dimensional Deep ReLU CNN Approximation for Korobov Functions: A Constructive Approach

Pith reviewed 2026-05-23 01:12 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords CNN approximationKorobov functionsReLU networksconstructive approximationcurse of dimensionalityfunction approximationdeep learning theorytwo-dimensional CNNs
0
0 comments X

The pith

Two-dimensional deep ReLU CNNs approximate Korobov functions at near-optimal rates via an explicit construction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a fully explicit construction of two-dimensional deep CNNs consisting of multi-channel convolutional layers with zero-padding and ReLU activations followed by a fully connected layer. It proves that these networks achieve approximation rates for Korobov functions that are close to the best possible under the continuous weight selection model. A sympathetic reader would care because the result indicates that the curse of dimensionality can be substantially reduced for this class of smooth functions when using 2D CNNs. The work supplies both the network architecture and a rigorous complexity analysis of the constructed networks.

Core claim

We propose a fully constructive approach for building 2D CNNs to approximate Korobov functions and provide a rigorous analysis of the complexity of the constructed networks. Our results demonstrate that 2D CNNs achieve near-optimal approximation rates under the continuous weight selection model, significantly alleviating the curse of dimensionality.

What carries the argument

An explicit, constructive procedure that assembles multi-channel 2D convolutional layers with zero-padding and ReLU activations followed by a fully connected layer to realize the approximation.

If this is right

  • Korobov functions in two dimensions can be approximated by 2D CNNs with rates that improve on generic high-dimensional bounds.
  • The number of parameters and layers required remains controlled independently of certain smoothness parameters.
  • The same constructive pattern supplies a template for analyzing other periodic or mixed-derivative function classes.
  • Theoretical support is given for preferring 2D CNN architectures over fully connected networks for these targets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the continuous-weight rates survive discretization of the weights, the same networks could be trained by gradient descent with only modest loss of efficiency.
  • The construction may generalize to inputs of dimension greater than two once the padding and channel-count rules are adjusted accordingly.
  • Similar constructive arguments could be attempted for other activation functions or for approximation in different norms.

Load-bearing premise

The continuous weight selection model accurately reflects practical CNN training and the explicit construction extends without hidden constants that grow with the smoothness parameter of the Korobov class.

What would settle it

Compute the actual L2 approximation error achieved by the constructed 2D CNNs on a concrete Korobov function as depth and channel count increase and check whether the observed rate stays within a fixed multiple of the theoretical near-optimal bound.

read the original abstract

This paper investigates approximation capabilities of two-dimensional (2D) deep convolutional neural networks (CNNs), with Korobov functions serving as a benchmark. We focus on 2D CNNs, comprising multi-channel convolutional layers with zero-padding and ReLU activations, followed by a fully connected layer. We propose a fully constructive approach for building 2D CNNs to approximate Korobov functions and provide a rigorous analysis of the complexity of the constructed networks. Our results demonstrate that 2D CNNs achieve near-optimal approximation rates under the continuous weight selection model, significantly alleviating the curse of dimensionality. This work provides a solid theoretical foundation for 2D CNNs and illustrates their potential for broader applications in function approximation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a fully constructive method for approximating Korobov functions on the unit square using two-dimensional deep ReLU CNNs consisting of multi-channel convolutional layers with zero-padding, ReLU activations, and a terminal fully connected layer. Under the continuous weight selection model it claims to prove that the constructed networks attain near-optimal approximation rates (matching Kolmogorov widths of the Korobov class) with network size and depth scaling polynomially in 1/ε, thereby alleviating the curse of dimensionality.

Significance. A verified constructive result establishing s-independent polynomial rates for 2D CNN approximation of Korobov functions would supply a concrete theoretical foundation for CNN expressivity in anisotropic smoothness settings and would be a useful reference point for subsequent work on CNN approximation theory.

major comments (2)
  1. [§4, Theorem 4.2] §4, Theorem 4.2 (main approximation result): the claimed near-optimality requires that the network size N(ε,s) and the magnitudes of the selected weights remain free of exponential factors in the smoothness parameter s; the explicit construction (via 1D lifting or tensorization) must be inspected to confirm that no such exp(c s) blow-up occurs in channel count or weight bounds, as this would reduce the result to standard Sobolev-type rates rather than an alleviation of dimensionality effects.
  2. [§3.2] §3.2 (network architecture definition): the continuous-weight model is invoked without a quantitative comparison to the discrete-weight case; if the construction relies on weight values whose bit-complexity or dynamic range grows with s, the practical relevance of the rate must be clarified.
minor comments (2)
  1. [Abstract] The abstract states 'rigorous analysis' but does not indicate the precise scaling exponents or the dependence on s; adding one sentence summarizing the network-size bound would improve readability.
  2. [§2] Notation for the Korobov class K^s_2 and the mixed-derivative seminorm should be introduced once in §2 and used consistently thereafter.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and the detailed comments on our manuscript. We address each major comment below with point-by-point responses. Our construction is fully explicit, and we are prepared to add clarifications to make the dependence on the smoothness parameter s fully transparent.

read point-by-point responses
  1. Referee: [§4, Theorem 4.2] §4, Theorem 4.2 (main approximation result): the claimed near-optimality requires that the network size N(ε,s) and the magnitudes of the selected weights remain free of exponential factors in the smoothness parameter s; the explicit construction (via 1D lifting or tensorization) must be inspected to confirm that no such exp(c s) blow-up occurs in channel count or weight bounds, as this would reduce the result to standard Sobolev-type rates rather than an alleviation of dimensionality effects.

    Authors: The proof of Theorem 4.2 gives a fully explicit construction that first builds 1D ReLU networks for the marginal functions and then lifts them to 2D via a tensor-product structure realized by multi-channel convolutions with zero-padding. Direct inspection of the channel counts, depths, and weight selections in this lifting shows that both the number of channels and the weight magnitudes scale polynomially in 1/ε with the exponent depending on s only through the approximation rate itself; no exponential factor exp(c s) appears in the network size or in the weight bounds. This is what permits the claimed near-optimal rates that alleviate the dimensionality effect for the Korobov class. We will add a short lemma (or a dedicated remark immediately after the proof) that explicitly records the s-dependence of N(ε,s) and the weight bounds to make the absence of exp(c s) factors immediate. revision: partial

  2. Referee: [§3.2] §3.2 (network architecture definition): the continuous-weight model is invoked without a quantitative comparison to the discrete-weight case; if the construction relies on weight values whose bit-complexity or dynamic range grows with s, the practical relevance of the rate must be clarified.

    Authors: Section 3.2 introduces the continuous-weight model because the goal is to establish expressivity upper bounds; this is standard in the approximation-theory literature. In the explicit construction of Theorem 4.2 the selected weights are bounded by constants independent of s (in fact, the convolutional filters use only weights in {0,1} after normalization). Consequently the dynamic range does not grow with s, and any subsequent discretization to finite precision incurs only a logarithmic bit-cost independent of s. We will insert a brief paragraph at the end of §3.2 that states this bound and notes that the approximation rates therefore remain meaningful under quantization with precision polynomial in ε. revision: partial

Circularity Check

0 steps flagged

No circularity; derivation is a self-contained constructive proof

full rationale

The paper describes an explicit constructive method to build 2D CNNs (multi-channel conv + ReLU + FC) that approximate Korobov functions, followed by a complexity analysis showing near-optimal rates under the continuous-weight model. No equations, fitted parameters, or self-citations are visible that would make any claimed rate or network size reduce by definition to the target error or to prior results by the same authors. The central claim rests on a mathematical construction whose validity can be checked independently of the final rate statement, satisfying the criteria for a non-circular derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard results from approximation theory and properties of ReLU and convolution; no free parameters, ad-hoc axioms, or new entities are introduced in the abstract.

axioms (1)
  • standard math Standard mathematical properties of ReLU activations and zero-padded convolutions hold in the continuous-weight model.
    Invoked to support the network construction and rate analysis.

pith-pipeline@v0.9.0 · 5654 in / 1190 out tokens · 40237 ms · 2026-05-23T01:12:01.292028+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Neural Flow Operators can Approximate any Operator: Abstract Frameworks and Universal Approcimations

    cs.LG 2026-05 unverdicted novelty 7.0

    Neural flow operators with composition and separation structures are proven to universally approximate any operator in finite and infinite dimensions, recovering ResNet-type and plain architectures via time discretizations.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · cited by 1 Pith paper

  1. [1]

    Shallow an d deep networks are near- optimal approximators of Korobov functions

    Moise Blanchard and Mohammed Amine Bennouna. Shallow an d deep networks are near- optimal approximators of Korobov functions. In International Conference on Learning Rep- resentations, 2021

  2. [2]

    Sparse grid s

    Hans-Joachim Bungartz and Michael Griebel. Sparse grid s. Acta Numerica, 13:147–269, 2004

  3. [3]

    Temlyakov, and Tino Ullrich

    Dinh Dung, Vladimir N. Temlyakov, and Tino Ullrich. Hyperbolic Cross Approximation . Springer International Publishing, 2018

  4. [4]

    Learning Korobov fun ctions by correntropy and convo- lutional neural networks

    Zhiying Fang, Tong Mao, and Jun Fan. Learning Korobov fun ctions by correntropy and convo- lutional neural networks. Neural Computation, 36(4):718–743, 2024

  5. [5]

    Deep Learning

    Ian Goodfellow, Y oshua Bengio, and Aaron Courville. Deep Learning. MIT press, 2016

  6. [6]

    De ep convolutional neural networks with zero-padding: Feature extraction and learning

    Zhi Han, Baichen Liu, Shao-Bo Lin, and Ding-Xuan Zhou. De ep convolutional neural networks with zero-padding: Feature extraction and learning. arXiv preprint arXiv:2307.16203, 2023

  7. [7]

    Approximation propert ies of deep ReLU CNNs

    Juncai He, Lin Li, and Jinchao Xu. Approximation propert ies of deep ReLU CNNs. Research in the Mathematical Sciences , 9(3):38, 2022

  8. [8]

    MgNet: A unified framework of mul tigrid and convolutional neural network

    Juncai He and Jinchao Xu. MgNet: A unified framework of mul tigrid and convolutional neural network. Science China Mathematics, 62:1331–1354, 2019

  9. [9]

    De ep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. De ep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pa ttern Recogni- tion, pages 770–778, 2016. 14

  10. [10]

    Densely connected convolutional networks

    Gao Huang, Zhuang Liu, Laurens V an Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4700–4708, 2017

  11. [11]

    Physics-informed machine learning

    George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Par is Perdikaris, Sifan Wang, and Liu Y ang. Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021

  12. [12]

    Statistical theory f or image classification using deep convo- lutional neural networks with cross-entropy loss

    Michael Kohler and Sophie Langer. Statistical theory f or image classification using deep convo- lutional neural networks with cross-entropy loss. arXiv preprint arXiv:2011.13602, 2020

  13. [13]

    Imagenet classification with deep convolutional neural networks

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton . Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems , 25, 2012

  14. [14]

    Deep learning

    Y ann LeCun, Y oshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015

  15. [15]

    Medical image classification with convolutional neural network

    Qing Li, Weidong Cai, Xiaogang Wang, Y un Zhou, David Dag an Feng, and Mei Chen. Medical image classification with convolutional neural network. In 13th International Conference on Control Automation Robotics & Vision (ICARCV) , pages 844–848, 2014

  16. [16]

    Approximat ing functions with multi-features by deep convolutional neural networks

    Tong Mao, Zhongjie Shi, and Ding-Xuan Zhou. Approximat ing functions with multi-features by deep convolutional neural networks. Analysis and Applications, 21(01):93–125, 2023

  17. [17]

    Approximation of function s from Korobov spaces by deep convolutional neural networks

    Tong Mao and Ding-Xuan Zhou. Approximation of function s from Korobov spaces by deep convolutional neural networks. Advances in Computational Mathematics , 48(6):84, 2022

  18. [18]

    New error bounds for de ep ReLU networks using sparse grids

    Hadrien Montanelli and Qiang Du. New error bounds for de ep ReLU networks using sparse grids. SIAM Journal on Mathematics of Data Science , 1(1):78–92, 2019

  19. [19]

    Tractability of Multivariate Problems: V olume I: Lin- ear Information

    Erich Novak and Henryk Wo´ zniakowski. Tractability of Multivariate Problems: V olume I: Lin- ear Information. European Mathematical Society, 2008

  20. [20]

    Expo nential ReLU DNN expression of holomorphic maps in high dimension

    Joost Opschoor, Christoph Schwab, and Jakob Zech. Expo nential ReLU DNN expression of holomorphic maps in high dimension. Constructive Approximation, 55(1):537–582, 2022

  21. [21]

    Equivalence of approximation by convolutional neural networks and fully-connected networks

    Philipp Petersen and Felix V oigtlaender. Equivalence of approximation by convolutional neural networks and fully-connected networks. Proceedings of the American Mathematical Society , 148(4):1567–1581, 2020

  22. [22]

    U- net: Convolutional networks for biomedical image segmentation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 234–241. Springer, 2015

  23. [23]

    Optimal approximation rates for dee p ReLU neural networks on Sobolev and Besov spaces

    Jonathan W Siegel. Optimal approximation rates for dee p ReLU neural networks on Sobolev and Besov spaces. Journal of Machine Learning Research, 24(357):1–52, 2023

  24. [24]

    V ery deep convolutional net works for large-scale image recogni- tion

    K Simonyan and A Zisserman. V ery deep convolutional net works for large-scale image recogni- tion. In 3rd International Conference on Learning Representations (ICLR 2015). Computational and Biological Learning Society, 2015

  25. [25]

    Error bounds for approximations with deep ReLU networks

    Dmitry Y arotsky. Error bounds for approximations with deep ReLU networks. Neural Networks, 94:103–114, 2017. 15

  26. [26]

    Object detection with deep learning: A review

    Zhong-Qiu Zhao, Peng Zheng, Shou-tao Xu, and Xindong Wu . Object detection with deep learning: A review. IEEE Transactions on Neural Networks and Learning Systems, 30(11):3212– 3232, 2019

  27. [27]

    Theory of deep convolutional neural ne tworks: Downsampling

    Ding-Xuan Zhou. Theory of deep convolutional neural ne tworks: Downsampling. Neural Net- works, 124:319–327, 2020

  28. [28]

    Universality of deep convolutional ne ural networks

    Ding-Xuan Zhou. Universality of deep convolutional ne ural networks. Applied and Computa- tional Harmonic Analysis, 48(2):787–794, 2020. Appendices A Basic CNN Constructions In this section, we collect some important 2D deep ReLU CNN co nstructions which will be used repeatedly to construct more complex networks. Lemma 1 (Widening CNNs). Let k , d,W1,W2, ...