pith. sign in

arxiv: 1911.05467 · v3 · pith:T6STN56Fnew · submitted 2019-11-07 · 💻 cs.LG · cs.NA· math.NA

ChebNet: Efficient and Stable Constructions of Deep Neural Networks with Rectified Power Units via Chebyshev Approximations

Pith reviewed 2026-05-24 16:02 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.NA
keywords ChebNetrectified power unitsChebyshev approximationsdeep neural networksfunction approximationspectral accuracynumerical stability
0
0 comments X

The pith

ChebNets construct deep RePU networks from hierarchical Chebyshev approximations that match power-series accuracy for smooth functions while gaining much greater numerical stability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to convert hierarchical Chebyshev polynomial approximations, performed in the frequency domain, into deep neural networks that use rectified power units as activations. This produces networks whose approximation error for smooth functions is no larger than that achieved by the earlier power-series constructions, yet the new networks remain stable under numerical evaluation. A reader would care because power-series routes, although theoretically optimal in complexity and error, become unusable in practice due to instability, blocking access to spectral accuracy in neural approximations of smooth targets.

Core claim

In a previous study it is shown that deep neural networks built with rectified power units can give better approximation for sufficient smooth functions than those built with rectified linear units, by converting polynomial approximations using power series into deep neural networks with optimal complexity and no approximation error. However, in practice, power series approximations are not easy to obtain due to the associated stability issue. In this paper, we propose a new and more stable way to construct RePU deep neural networks based on Chebyshev polynomial approximations. By using a hierarchical structure of Chebyshev polynomial approximation in frequency domain, we obtain efficient a

What carries the argument

Hierarchical Chebyshev polynomial approximation in the frequency domain, converted into a deep RePU network.

If this is right

  • Approximation rates for smooth functions remain at least as good as those from power-series RePU nets.
  • Numerical stability improves substantially compared with power-series constructions.
  • Fine-tuning of the resulting ChebNets produces better practical accuracy than fine-tuning of power-series versions.
  • Spectral accuracy becomes attainable in deep RePU networks through this stable initialization route.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same frequency-domain hierarchy might be tested with other orthogonal polynomial families to obtain stability tailored to particular function classes.
  • ChebNet initializations could be inserted into existing optimizers to check whether they improve convergence rates on high-precision scientific-computing tasks.
  • The construction supplies a concrete way to embed known polynomial approximation theory inside neural-network training loops without losing the theory's guarantees.

Load-bearing premise

A hierarchical Chebyshev approximation performed in the frequency domain can be converted into a deep RePU network that achieves the same optimal complexity and zero approximation error previously obtained only from power-series polynomials.

What would settle it

Direct numerical comparison of floating-point error growth or condition numbers between a high-degree ChebNet and its power-series RePU counterpart when both approximate the same smooth test function, such as exp(-x^2) on an interval.

Figures

Figures reproduced from arXiv: 1911.05467 by Bo Li, Haijun Yu, Shanshan Tang.

Figure 1
Figure 1. Figure 1: Results of PowerNet and ChebNet approximating Gauss function with [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Results of PowerNet and ChebNet approximating the function [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The coefficients of Legendre expansion: cj , j = 0, . . . , N (Left) and power series expansion: ˜cj , j = 0, . . . , N (Right) for Gauss function with N = 15. To explain why big coefficients happens, we calculate the condition numbers of BN and HN , denoted by κ(BN ) and κ(HN ), and the results are showed in [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The coefficients of Chebyshev expansion: [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The coefficients of Legendre expansion: cj , j = 0, . . . , N (Left) and coefficients of power series expansion: ˜cj , j = 0, . . . , N (Right) for function f2 with N = 30 [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The coefficients of Chebyshev expansion: [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
read the original abstract

In a previous study [B. Li, S. Tang and H. Yu, Commun. Comput. Phy. 27(2):379-411, 2020], it is shown that deep neural networks built with rectified power units (RePU) as activation functions can give better approximation for sufficient smooth functions than those built with rectified linear units, by converting polynomial approximations using power series into deep neural networks with optimal complexity and no approximation error. However, in practice, power series approximations are not easy to obtain due to the associated stability issue. In this paper, we propose a new and more stable way to construct RePU deep neural networks based on Chebyshev polynomial approximations. By using a hierarchical structure of Chebyshev polynomial approximation in frequency domain, we obtain efficient and stable deep neural network construction, which we call ChebNet. The approximation of smooth functions by ChebNets is no worse than the approximation by deep RePU nets using power series. On the same time, ChebNets are much more stable. Numerical results show that the constructed ChebNets can be further fine-tuned to obtain much better results than those obtained by tuning deep RePU nets constructed by power series approach. As spectral accuracy is hard to obtain by direct training of deep neural networks, ChebNets provide a practical way to obtain spectral accuracy, it is expected to be useful in real applications that require efficient approximations of smooth functions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes ChebNet, a construction of deep RePU networks via hierarchical Chebyshev polynomial approximations performed in the frequency domain. It claims this yields efficient, stable networks whose approximation quality for smooth functions is no worse than that of power-series-based RePU nets from prior work, while offering substantially better stability; numerical experiments are said to show that ChebNets fine-tune to superior results and thereby provide a practical route to spectral accuracy.

Significance. If the claimed exact conversion to RePU networks preserves optimal depth/width and zero approximation error while improving stability, the work would supply a concrete, usable alternative to power-series constructions that suffer from numerical instability. The reported fine-tuning gains constitute positive empirical evidence of practical advantage in settings where direct DNN training fails to reach spectral accuracy.

major comments (2)
  1. [ChebNet construction (abstract and §3)] The central claim that the hierarchical frequency-domain Chebyshev construction converts to a deep RePU network with the same optimal complexity and zero approximation error as the power-series route (Li et al., 2020) is asserted in the abstract and introduction but is not accompanied by an explicit mapping, basis-change analysis, or error-bound derivation. This equivalence is load-bearing for the statement that approximation quality is 'no worse.'
  2. [Abstract and numerical-results section] The repeated assertion that 'ChebNets are much more stable' lacks any quantitative stability metric (condition numbers, perturbation sensitivity, or floating-point error growth) comparing the Chebyshev hierarchy to the power-series construction; without such evidence the stability advantage remains unverified.
minor comments (1)
  1. [Abstract] Abstract: 'On the same time' should read 'At the same time.'

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and recommendation for major revision. We address each major comment below and will revise the manuscript to strengthen the presentation of the construction and stability claims.

read point-by-point responses
  1. Referee: [ChebNet construction (abstract and §3)] The central claim that the hierarchical frequency-domain Chebyshev construction converts to a deep RePU network with the same optimal complexity and zero approximation error as the power-series route (Li et al., 2020) is asserted in the abstract and introduction but is not accompanied by an explicit mapping, basis-change analysis, or error-bound derivation. This equivalence is load-bearing for the statement that approximation quality is 'no worse.'

    Authors: We agree that an explicit mapping, basis-change analysis, and error-bound derivation are needed to fully support the claim. The hierarchical frequency-domain construction is intended to permit direct conversion to RePU networks via the three-term recurrence of Chebyshev polynomials (which can be realized layer-wise with RePU activations) while preserving the same depth/width as the power-series route from Li et al. (2020). To address the gap, we will add a dedicated subsection in §3 that (i) gives the explicit change-of-basis from Chebyshev to monomial coefficients, (ii) shows the resulting RePU network has identical complexity, and (iii) derives the error bound confirming the approximation quality is no worse (zero additional error beyond the underlying polynomial approximation). revision: yes

  2. Referee: [Abstract and numerical-results section] The repeated assertion that 'ChebNets are much more stable' lacks any quantitative stability metric (condition numbers, perturbation sensitivity, or floating-point error growth) comparing the Chebyshev hierarchy to the power-series construction; without such evidence the stability advantage remains unverified.

    Authors: We acknowledge that the stability claim requires quantitative support. The frequency-domain Chebyshev hierarchy is expected to be more stable because Chebyshev polynomials are bounded on [-1,1] and the recurrence avoids the rapid growth of monomial coefficients that occurs in power-series expansions. We will add direct quantitative comparisons in the numerical-results section, including condition numbers of the weight matrices, sensitivity to small perturbations in the input coefficients, and observed floating-point error growth for both constructions on the same test functions. These metrics will be reported alongside the existing fine-tuning experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity; Chebyshev construction is independent of power-series prior result

full rationale

The paper cites its own prior work only for the base fact that power-series polynomials convert to RePU nets with optimal depth/width and zero error. The new contribution is a separate hierarchical Chebyshev approximation performed in the frequency domain, which is a standard, externally verifiable technique not derived from the power-series case. The claim that ChebNets achieve approximation quality no worse than the power-series route follows directly from the known minimax properties of Chebyshev polynomials plus the shared conversion method; it does not reduce the new construction to the old one by definition. No equation or step equates the Chebyshev hierarchy to its own input or to a self-citation that itself lacks independent support. The stability advantage is presented as an empirical and theoretical consequence of the frequency-domain approach, not a renaming or fitted prediction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on standard approximation-theory facts about Chebyshev polynomials and on the conversion lemma established in the authors' prior RePU paper; no new free parameters or invented physical entities are introduced.

axioms (2)
  • standard math Chebyshev polynomials admit stable hierarchical approximations in the frequency domain that can be realized by rectified power units
    Invoked when the abstract states that the hierarchical structure yields efficient and stable networks.
  • domain assumption The conversion from polynomial approximation to deep RePU network preserves optimal complexity and zero approximation error
    Carried over from the cited prior work on power series; required for the 'no worse' claim.
invented entities (1)
  • ChebNet no independent evidence
    purpose: Label for the hierarchical Chebyshev-based RePU network construction
    New name introduced for the proposed method; no independent evidence supplied.

pith-pipeline@v0.9.0 · 5806 in / 1434 out tokens · 26929 ms · 2026-05-24T16:02:43.631278+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Distributional Off-Policy Evaluation with Deep Quantile Process Regression

    stat.ML 2026-04 unverdicted novelty 6.0

    DQPOPE estimates the entire return distribution in off-policy evaluation via deep quantile process regression, providing statistical advantages over standard single-value methods with equivalent sample sizes.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Solving the Schroedinger equation using Smolyak interpolants

    Avila, G., Carrington, T., 2013. Solving the Schroedinger equation using Smolyak interpolants. J. Chem. Phys. 139 (13), 134114

  2. [2]

    High dimensional polynomial interpolation on sparse grids

    Barthelmann, V., Novak, E., Ritter, K., 2000. High dimensional polynomial interpolation on sparse grids. Adv. Comput. Math. 12 (4), 273–288

  3. [3]

    Greedy layer-wise training of deep networks

    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., 2007. Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems. pp. 153–160

  4. [4]

    P., 2000

    Boyd, J. P., 2000. Chebyshev and Fourier Spectral Methods. Dover Publications, INC

  5. [5]

    J., 1992

    Bungartz, H. J., 1992. An adaptive Poisson solver using hierarchical bases and sparse grids. In: Iterative Methods in Linear Algebra. Brussels, Belgium, pp. 293–310

  6. [6]

    J., Griebel, M., 2004

    Bungartz, H. J., Griebel, M., 2004. Sparse grids. Acta Numer. 13, 1–123

  7. [7]

    Approximation by superpositions of a sigmoidal function

    Cybenko, G., 1989. Approximation by superpositions of a sigmoidal function. Math. Control Signal Systems 2 (4), 303–314

  8. [8]

    Exponential convergence of the deep neural network approximation for analytic functions

    E, W., Wang, Q., 2018. Exponential convergence of the deep neural network approximation for analytic functions. Sci. China Math. 61 (10), 1733–1740. URL https://link.springer.com/article/10.1007/s11425-018-9387-x

  9. [9]

    The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems

    E, W., Yu, B., 2018. The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6 (1), 1–12

  10. [10]

    The power of depth for feedforward neural networks

    Eldan, R., Shamir, O., 2016. The power of depth for feedforward neural networks. JMLR Workshop Conf. Proc. 49, 1–34

  11. [11]

    Optimally scaled and optimally conditioned vandermonde and vandermonde-like matrices

    Gautschi, W., 2011. Optimally scaled and optimally conditioned vandermonde and vandermonde-like matrices. BIT Nu- merical Mathematics 51 (1), 103–125. URL http://link.springer.com/10.1007/s10543-010-0293-1

  12. [12]

    Sparse grids for the Schr¨ odinger equation

    Griebel, M., Hamaekers, J., 2007. Sparse grids for the Schr¨ odinger equation. Math. Model. Numer. Anal. 41 (2), 215–247

  13. [13]

    A sparse grid discontinuous galerkin method for high-dimensional transport equations and its application to kinetic simulations

    Guo, W., Cheng, Y., 2016. A sparse grid discontinuous galerkin method for high-dimensional transport equations and its application to kinetic simulations. SIAM J. Sci. Comput. 38 (6), A3381–A3409

  14. [14]

    Solving high-dimensional partial differential equations using deep learning

    Han, J., Jentzen, A., E, W., 2018. Solving high-dimensional partial differential equations using deep learning. PNAS 115 (34), 8505–8510

  15. [15]

    Deep residual learning for image recognition

    He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778

  16. [16]

    Deep neural networks for acoustic modeling in speech recognition

    Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Kingsbury, B., Sainath, T., 2012. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29

  17. [17]

    A fast learning algorithm for deep belief nets

    Hinton, G., Osindero, S., Teh, Y.-W., 2006. A fast learning algorithm for deep belief nets. Neural Computation 18 (7), 1527–1554

  18. [18]

    Multilayer feedforward networks are universal approximators

    Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2 (5), 359–366

  19. [19]

    E., 2012

    Krizhevsky, A., Sutskever, I., Hinton, G. E., 2012. ImageNet classification with deep convolutional neural networks. Neural Information Processing Systems 141 (5), 1097–1105

  20. [20]

    A theoretical analysis of deep neural networks and parametric PDEs

    Kutyniok, G., Petersen, P., Raslan, M., Schneider, R., 2019. A theoretical analysis of deep neural networks and parametric PDEs. arXiv:1904.00377

  21. [21]

    Deep learning

    LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521 (7553), 436–444

  22. [22]

    Better approximations of high dimensional smooth functions by deep neural networks with rectified power units

    Li, B., Tang, S., Yu, H., 2019. Better approximations of high dimensional smooth functions by deep neural networks with rectified power units. arXiv:1903.05858, to appear on Commun. Comput. Phys. URL http://admin.global-sci.org/uploads/online_news/CiCP/201911050902-12788.pdf

  23. [23]

    PowerNet: Efficient representations of polynomials and smooth functions by deep neural networks with rectified power units

    Li, B., Tang, S., Yu, H., 2019. PowerNet: Efficient representations of polynomials and smooth functions by deep neural networks with rectified power units. arXiv:1909.05136. URL https://arxiv.org/abs/1909.05136

  24. [24]

    Why Deep Neural Networks for Function Approximation?

    Liang, S., Srikant, R., 2016. Why deep neural networks for function approximation? arXiv:1610.04161. URL https://arxiv.org/abs/1610.04161

  25. [25]

    A sparse finite element method with high accuracy: Part I

    Lin, Q., Yan, N., Zhou, A., 2001. A sparse finite element method with high accuracy: Part I. Numer. Math. 88 (4), 731–742

  26. [26]

    N., 1993

    Mhaskar, H. N., 1993. Approximation properties of a multilayered feedforward artificial neural network. Adv. Comput. Math. 1 (1), 61–80. URL http://link.springer.com/10.1007/BF02070821

  27. [27]

    N., 1996

    Mhaskar, H. N., 1996. Neural networks for optimal approximation of smooth and analytic functions. Neural Computation 8 (1), 164–177

  28. [28]

    New error bounds for deep ReLU networks using sparse grids

    Montanelli, H., Du, Q., 2019. New error bounds for deep ReLU networks using sparse grids. SIAM J. Math. Data Sci. 1 (1), 78–92. 17

  29. [29]

    A sparse grid stochastic collocation method for partial differential equations with random input data

    Nobile, F., Tempone, R., Webster, C., 2008. A sparse grid stochastic collocation method for partial differential equations with random input data. SIAM J. Numer. Anal. 46 (5), 2309–2345

  30. [30]

    Opschoor, J. A. A., Schwab, C., Zech, J., 2019. Exponential ReLU DNN expression of holomorphic maps in high dimension. Tech. Rep. 35, SAM ETH Z¨ urich. URL https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2019/2019-35.pdf

  31. [31]

    Optimal approximation of piecewise smooth functions using deep ReLU neural networks

    Petersen, P., Voigtlaender, F., 2018. Optimal approximation of piecewise smooth functions using deep ReLU neural networks. Neural Networks 108, 296–330

  32. [32]

    B., Trefethen, L

    Platte, R. B., Trefethen, L. N., 2010. Chebfun: A New Kind of Numerical Computing. In: Fitt, A. D., Norbury, J., Ockendon, H., Wilson, E. (Eds.), Progress in Industrial Mathematics at ECMI 2008. Mathematics in Industry. Springer, Berlin, Heidelberg, pp. 69–87

  33. [33]

    Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review

    Poggio, T., Mhaskar, H., Rosasco, L., Miranda, B., Liao, Q., 2017. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review. Int. J. Autom. Comput. 14 (5), 503–519

  34. [34]

    A nodal sparse grid spectral element method for multi-dimensional elliptic partial differential equations

    Rong, Z., Shen, J., Yu, H., 2017. A nodal sparse grid spectral element method for multi-dimensional elliptic partial differential equations. Int. J. Numer. Anal. Model. 14 (4-5), 762–783

  35. [35]

    A., 2003

    Schwab, C., Todor, R. A., 2003. Sparse finite elements for elliptic problems with stochastic loading. Numerische Mathe- matik 95 (4), 707–734

  36. [36]

    Sparse spectral approximations of high-dimensional problems based on hyperbolic cross

    Shen, J., Wang, L., 2010. Sparse spectral approximations of high-dimensional problems based on hyperbolic cross. SIAM J Numer Anal 48 (4), 1087–1109

  37. [37]

    Approximations by orthonormal mapped Chebyshev functions for higher-dimensional problems in unbounded domains

    Shen, J., Wang, L., Yu, H., 2014. Approximations by orthonormal mapped Chebyshev functions for higher-dimensional problems in unbounded domains. J. Comput. Appl. Math. 265, 264–275

  38. [38]

    Efficient spectral-element methods for the electronic Schr¨ odinger equation

    Shen, J., Wang, Y., Yu, H., 2016. Efficient spectral-element methods for the electronic Schr¨ odinger equation. In: Garcke, J., Pfl¨ uger, D. (Eds.), Sparse Grids and Applications - Stuttgart 2014. Lecture Notes in Computational Science and Engineering. Springer International Publishing, pp. 265–289

  39. [39]

    Efficient spectral sparse grid methods and applications to high-dimensional elliptic problems

    Shen, J., Yu, H., 2010. Efficient spectral sparse grid methods and applications to high-dimensional elliptic problems. SIAM J. Sci. Comput. 32 (6), 3228–3250

  40. [40]

    Efficient spectral sparse grid methods and applications to high-dimensional elliptic equations II: Unbounded domains

    Shen, J., Yu, H., 2012. Efficient spectral sparse grid methods and applications to high-dimensional elliptic equations II: Unbounded domains. SIAM J. Sci. Comput. 34 (2), 1141–1164

  41. [41]

    A., 1963

    Smolyak, S. A., 1963. Quadrature and interpolation formulas for tensor products of certain classes of functions. Dokl Akad Nauk SSSR 148 (5), 1042–1045

  42. [42]

    Representation benefits of deep feedforward networks

    Telgarsky, M., 2015. Representation benefits of deep feedforward networks. ArXiv150908101 Cs

  43. [43]

    Benefits of depth in neural networks

    Telgarsky, M., 2016. Benefits of depth in neural networks. In: JMLR: Workshop and Conference Proceedings. Vol. 49. pp. 1–23

  44. [44]

    Why are high-dimensional finance problems often of low effective dimension? SIAM J

    Wang, X., Sloan, I., 2005. Why are high-dimensional finance problems often of low effective dimension? SIAM J. Sci. Comput. 27 (1), 159–183

  45. [45]

    Error bounds for approximations with deep ReLU networks

    Yarotsky, D., 2017. Error bounds for approximations with deep ReLU networks. Neural Networks 94, 103–114

  46. [46]

    On the regularity of the electronic Schr¨ odinger equation in Hilbert spaces of mixed derivatives

    Yserentant, H., 2004. On the regularity of the electronic Schr¨ odinger equation in Hilbert spaces of mixed derivatives. Numer. Math. 98 (4), 731–759

  47. [47]

    Deep potential molecular dynamics: A scalable model with the accuracy of quantum mechanics

    Zhang, L., Han, J., Wang, H., Car, R., E, W., 2018. Deep potential molecular dynamics: A scalable model with the accuracy of quantum mechanics. Phys. Rev. Lett. 120 (14), 143001. 18