ChebNet: Efficient and Stable Constructions of Deep Neural Networks with Rectified Power Units via Chebyshev Approximations
Pith reviewed 2026-05-24 16:02 UTC · model grok-4.3
The pith
ChebNets construct deep RePU networks from hierarchical Chebyshev approximations that match power-series accuracy for smooth functions while gaining much greater numerical stability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In a previous study it is shown that deep neural networks built with rectified power units can give better approximation for sufficient smooth functions than those built with rectified linear units, by converting polynomial approximations using power series into deep neural networks with optimal complexity and no approximation error. However, in practice, power series approximations are not easy to obtain due to the associated stability issue. In this paper, we propose a new and more stable way to construct RePU deep neural networks based on Chebyshev polynomial approximations. By using a hierarchical structure of Chebyshev polynomial approximation in frequency domain, we obtain efficient a
What carries the argument
Hierarchical Chebyshev polynomial approximation in the frequency domain, converted into a deep RePU network.
If this is right
- Approximation rates for smooth functions remain at least as good as those from power-series RePU nets.
- Numerical stability improves substantially compared with power-series constructions.
- Fine-tuning of the resulting ChebNets produces better practical accuracy than fine-tuning of power-series versions.
- Spectral accuracy becomes attainable in deep RePU networks through this stable initialization route.
Where Pith is reading between the lines
- The same frequency-domain hierarchy might be tested with other orthogonal polynomial families to obtain stability tailored to particular function classes.
- ChebNet initializations could be inserted into existing optimizers to check whether they improve convergence rates on high-precision scientific-computing tasks.
- The construction supplies a concrete way to embed known polynomial approximation theory inside neural-network training loops without losing the theory's guarantees.
Load-bearing premise
A hierarchical Chebyshev approximation performed in the frequency domain can be converted into a deep RePU network that achieves the same optimal complexity and zero approximation error previously obtained only from power-series polynomials.
What would settle it
Direct numerical comparison of floating-point error growth or condition numbers between a high-degree ChebNet and its power-series RePU counterpart when both approximate the same smooth test function, such as exp(-x^2) on an interval.
Figures
read the original abstract
In a previous study [B. Li, S. Tang and H. Yu, Commun. Comput. Phy. 27(2):379-411, 2020], it is shown that deep neural networks built with rectified power units (RePU) as activation functions can give better approximation for sufficient smooth functions than those built with rectified linear units, by converting polynomial approximations using power series into deep neural networks with optimal complexity and no approximation error. However, in practice, power series approximations are not easy to obtain due to the associated stability issue. In this paper, we propose a new and more stable way to construct RePU deep neural networks based on Chebyshev polynomial approximations. By using a hierarchical structure of Chebyshev polynomial approximation in frequency domain, we obtain efficient and stable deep neural network construction, which we call ChebNet. The approximation of smooth functions by ChebNets is no worse than the approximation by deep RePU nets using power series. On the same time, ChebNets are much more stable. Numerical results show that the constructed ChebNets can be further fine-tuned to obtain much better results than those obtained by tuning deep RePU nets constructed by power series approach. As spectral accuracy is hard to obtain by direct training of deep neural networks, ChebNets provide a practical way to obtain spectral accuracy, it is expected to be useful in real applications that require efficient approximations of smooth functions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes ChebNet, a construction of deep RePU networks via hierarchical Chebyshev polynomial approximations performed in the frequency domain. It claims this yields efficient, stable networks whose approximation quality for smooth functions is no worse than that of power-series-based RePU nets from prior work, while offering substantially better stability; numerical experiments are said to show that ChebNets fine-tune to superior results and thereby provide a practical route to spectral accuracy.
Significance. If the claimed exact conversion to RePU networks preserves optimal depth/width and zero approximation error while improving stability, the work would supply a concrete, usable alternative to power-series constructions that suffer from numerical instability. The reported fine-tuning gains constitute positive empirical evidence of practical advantage in settings where direct DNN training fails to reach spectral accuracy.
major comments (2)
- [ChebNet construction (abstract and §3)] The central claim that the hierarchical frequency-domain Chebyshev construction converts to a deep RePU network with the same optimal complexity and zero approximation error as the power-series route (Li et al., 2020) is asserted in the abstract and introduction but is not accompanied by an explicit mapping, basis-change analysis, or error-bound derivation. This equivalence is load-bearing for the statement that approximation quality is 'no worse.'
- [Abstract and numerical-results section] The repeated assertion that 'ChebNets are much more stable' lacks any quantitative stability metric (condition numbers, perturbation sensitivity, or floating-point error growth) comparing the Chebyshev hierarchy to the power-series construction; without such evidence the stability advantage remains unverified.
minor comments (1)
- [Abstract] Abstract: 'On the same time' should read 'At the same time.'
Simulated Author's Rebuttal
We thank the referee for the constructive comments and recommendation for major revision. We address each major comment below and will revise the manuscript to strengthen the presentation of the construction and stability claims.
read point-by-point responses
-
Referee: [ChebNet construction (abstract and §3)] The central claim that the hierarchical frequency-domain Chebyshev construction converts to a deep RePU network with the same optimal complexity and zero approximation error as the power-series route (Li et al., 2020) is asserted in the abstract and introduction but is not accompanied by an explicit mapping, basis-change analysis, or error-bound derivation. This equivalence is load-bearing for the statement that approximation quality is 'no worse.'
Authors: We agree that an explicit mapping, basis-change analysis, and error-bound derivation are needed to fully support the claim. The hierarchical frequency-domain construction is intended to permit direct conversion to RePU networks via the three-term recurrence of Chebyshev polynomials (which can be realized layer-wise with RePU activations) while preserving the same depth/width as the power-series route from Li et al. (2020). To address the gap, we will add a dedicated subsection in §3 that (i) gives the explicit change-of-basis from Chebyshev to monomial coefficients, (ii) shows the resulting RePU network has identical complexity, and (iii) derives the error bound confirming the approximation quality is no worse (zero additional error beyond the underlying polynomial approximation). revision: yes
-
Referee: [Abstract and numerical-results section] The repeated assertion that 'ChebNets are much more stable' lacks any quantitative stability metric (condition numbers, perturbation sensitivity, or floating-point error growth) comparing the Chebyshev hierarchy to the power-series construction; without such evidence the stability advantage remains unverified.
Authors: We acknowledge that the stability claim requires quantitative support. The frequency-domain Chebyshev hierarchy is expected to be more stable because Chebyshev polynomials are bounded on [-1,1] and the recurrence avoids the rapid growth of monomial coefficients that occurs in power-series expansions. We will add direct quantitative comparisons in the numerical-results section, including condition numbers of the weight matrices, sensitivity to small perturbations in the input coefficients, and observed floating-point error growth for both constructions on the same test functions. These metrics will be reported alongside the existing fine-tuning experiments. revision: yes
Circularity Check
No significant circularity; Chebyshev construction is independent of power-series prior result
full rationale
The paper cites its own prior work only for the base fact that power-series polynomials convert to RePU nets with optimal depth/width and zero error. The new contribution is a separate hierarchical Chebyshev approximation performed in the frequency domain, which is a standard, externally verifiable technique not derived from the power-series case. The claim that ChebNets achieve approximation quality no worse than the power-series route follows directly from the known minimax properties of Chebyshev polynomials plus the shared conversion method; it does not reduce the new construction to the old one by definition. No equation or step equates the Chebyshev hierarchy to its own input or to a self-citation that itself lacks independent support. The stability advantage is presented as an empirical and theoretical consequence of the frequency-domain approach, not a renaming or fitted prediction.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Chebyshev polynomials admit stable hierarchical approximations in the frequency domain that can be realized by rectified power units
- domain assumption The conversion from polynomial approximation to deep RePU network preserves optimal complexity and zero approximation error
invented entities (1)
-
ChebNet
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
By using a hierarchical structure of Chebyshev polynomial approximation in frequency domain, we obtain efficient and stable deep neural network construction... The approximation of smooth functions by ChebNets is no worse than the approximation by deep RePU nets using power series.
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat induction / embed_injective unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 1. For n ≥ 1, assume p(x) = ∑ cj Tj(x) ... there exists a σ2 neural network with at most ⌊log2 n⌋ + 1 hidden layers ... O(n) neurons and total non-zero weights.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Distributional Off-Policy Evaluation with Deep Quantile Process Regression
DQPOPE estimates the entire return distribution in off-policy evaluation via deep quantile process regression, providing statistical advantages over standard single-value methods with equivalent sample sizes.
Reference graph
Works this paper leans on
-
[1]
Solving the Schroedinger equation using Smolyak interpolants
Avila, G., Carrington, T., 2013. Solving the Schroedinger equation using Smolyak interpolants. J. Chem. Phys. 139 (13), 134114
work page 2013
-
[2]
High dimensional polynomial interpolation on sparse grids
Barthelmann, V., Novak, E., Ritter, K., 2000. High dimensional polynomial interpolation on sparse grids. Adv. Comput. Math. 12 (4), 273–288
work page 2000
-
[3]
Greedy layer-wise training of deep networks
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., 2007. Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems. pp. 153–160
work page 2007
- [4]
- [5]
-
[6]
Bungartz, H. J., Griebel, M., 2004. Sparse grids. Acta Numer. 13, 1–123
work page 2004
-
[7]
Approximation by superpositions of a sigmoidal function
Cybenko, G., 1989. Approximation by superpositions of a sigmoidal function. Math. Control Signal Systems 2 (4), 303–314
work page 1989
-
[8]
Exponential convergence of the deep neural network approximation for analytic functions
E, W., Wang, Q., 2018. Exponential convergence of the deep neural network approximation for analytic functions. Sci. China Math. 61 (10), 1733–1740. URL https://link.springer.com/article/10.1007/s11425-018-9387-x
-
[9]
The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems
E, W., Yu, B., 2018. The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6 (1), 1–12
work page 2018
-
[10]
The power of depth for feedforward neural networks
Eldan, R., Shamir, O., 2016. The power of depth for feedforward neural networks. JMLR Workshop Conf. Proc. 49, 1–34
work page 2016
-
[11]
Optimally scaled and optimally conditioned vandermonde and vandermonde-like matrices
Gautschi, W., 2011. Optimally scaled and optimally conditioned vandermonde and vandermonde-like matrices. BIT Nu- merical Mathematics 51 (1), 103–125. URL http://link.springer.com/10.1007/s10543-010-0293-1
-
[12]
Sparse grids for the Schr¨ odinger equation
Griebel, M., Hamaekers, J., 2007. Sparse grids for the Schr¨ odinger equation. Math. Model. Numer. Anal. 41 (2), 215–247
work page 2007
-
[13]
Guo, W., Cheng, Y., 2016. A sparse grid discontinuous galerkin method for high-dimensional transport equations and its application to kinetic simulations. SIAM J. Sci. Comput. 38 (6), A3381–A3409
work page 2016
-
[14]
Solving high-dimensional partial differential equations using deep learning
Han, J., Jentzen, A., E, W., 2018. Solving high-dimensional partial differential equations using deep learning. PNAS 115 (34), 8505–8510
work page 2018
-
[15]
Deep residual learning for image recognition
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778
work page 2016
-
[16]
Deep neural networks for acoustic modeling in speech recognition
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Kingsbury, B., Sainath, T., 2012. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29
work page 2012
-
[17]
A fast learning algorithm for deep belief nets
Hinton, G., Osindero, S., Teh, Y.-W., 2006. A fast learning algorithm for deep belief nets. Neural Computation 18 (7), 1527–1554
work page 2006
-
[18]
Multilayer feedforward networks are universal approximators
Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2 (5), 359–366
work page 1989
- [19]
-
[20]
A theoretical analysis of deep neural networks and parametric PDEs
Kutyniok, G., Petersen, P., Raslan, M., Schneider, R., 2019. A theoretical analysis of deep neural networks and parametric PDEs. arXiv:1904.00377
-
[21]
LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521 (7553), 436–444
work page 2015
-
[22]
Li, B., Tang, S., Yu, H., 2019. Better approximations of high dimensional smooth functions by deep neural networks with rectified power units. arXiv:1903.05858, to appear on Commun. Comput. Phys. URL http://admin.global-sci.org/uploads/online_news/CiCP/201911050902-12788.pdf
-
[23]
Li, B., Tang, S., Yu, H., 2019. PowerNet: Efficient representations of polynomials and smooth functions by deep neural networks with rectified power units. arXiv:1909.05136. URL https://arxiv.org/abs/1909.05136
-
[24]
Why Deep Neural Networks for Function Approximation?
Liang, S., Srikant, R., 2016. Why deep neural networks for function approximation? arXiv:1610.04161. URL https://arxiv.org/abs/1610.04161
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[25]
A sparse finite element method with high accuracy: Part I
Lin, Q., Yan, N., Zhou, A., 2001. A sparse finite element method with high accuracy: Part I. Numer. Math. 88 (4), 731–742
work page 2001
-
[26]
Mhaskar, H. N., 1993. Approximation properties of a multilayered feedforward artificial neural network. Adv. Comput. Math. 1 (1), 61–80. URL http://link.springer.com/10.1007/BF02070821
- [27]
-
[28]
New error bounds for deep ReLU networks using sparse grids
Montanelli, H., Du, Q., 2019. New error bounds for deep ReLU networks using sparse grids. SIAM J. Math. Data Sci. 1 (1), 78–92. 17
work page 2019
-
[29]
A sparse grid stochastic collocation method for partial differential equations with random input data
Nobile, F., Tempone, R., Webster, C., 2008. A sparse grid stochastic collocation method for partial differential equations with random input data. SIAM J. Numer. Anal. 46 (5), 2309–2345
work page 2008
-
[30]
Opschoor, J. A. A., Schwab, C., Zech, J., 2019. Exponential ReLU DNN expression of holomorphic maps in high dimension. Tech. Rep. 35, SAM ETH Z¨ urich. URL https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2019/2019-35.pdf
work page 2019
-
[31]
Optimal approximation of piecewise smooth functions using deep ReLU neural networks
Petersen, P., Voigtlaender, F., 2018. Optimal approximation of piecewise smooth functions using deep ReLU neural networks. Neural Networks 108, 296–330
work page 2018
-
[32]
Platte, R. B., Trefethen, L. N., 2010. Chebfun: A New Kind of Numerical Computing. In: Fitt, A. D., Norbury, J., Ockendon, H., Wilson, E. (Eds.), Progress in Industrial Mathematics at ECMI 2008. Mathematics in Industry. Springer, Berlin, Heidelberg, pp. 69–87
work page 2010
-
[33]
Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review
Poggio, T., Mhaskar, H., Rosasco, L., Miranda, B., Liao, Q., 2017. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review. Int. J. Autom. Comput. 14 (5), 503–519
work page 2017
-
[34]
Rong, Z., Shen, J., Yu, H., 2017. A nodal sparse grid spectral element method for multi-dimensional elliptic partial differential equations. Int. J. Numer. Anal. Model. 14 (4-5), 762–783
work page 2017
- [35]
-
[36]
Sparse spectral approximations of high-dimensional problems based on hyperbolic cross
Shen, J., Wang, L., 2010. Sparse spectral approximations of high-dimensional problems based on hyperbolic cross. SIAM J Numer Anal 48 (4), 1087–1109
work page 2010
-
[37]
Shen, J., Wang, L., Yu, H., 2014. Approximations by orthonormal mapped Chebyshev functions for higher-dimensional problems in unbounded domains. J. Comput. Appl. Math. 265, 264–275
work page 2014
-
[38]
Efficient spectral-element methods for the electronic Schr¨ odinger equation
Shen, J., Wang, Y., Yu, H., 2016. Efficient spectral-element methods for the electronic Schr¨ odinger equation. In: Garcke, J., Pfl¨ uger, D. (Eds.), Sparse Grids and Applications - Stuttgart 2014. Lecture Notes in Computational Science and Engineering. Springer International Publishing, pp. 265–289
work page 2016
-
[39]
Efficient spectral sparse grid methods and applications to high-dimensional elliptic problems
Shen, J., Yu, H., 2010. Efficient spectral sparse grid methods and applications to high-dimensional elliptic problems. SIAM J. Sci. Comput. 32 (6), 3228–3250
work page 2010
-
[40]
Shen, J., Yu, H., 2012. Efficient spectral sparse grid methods and applications to high-dimensional elliptic equations II: Unbounded domains. SIAM J. Sci. Comput. 34 (2), 1141–1164
work page 2012
- [41]
-
[42]
Representation benefits of deep feedforward networks
Telgarsky, M., 2015. Representation benefits of deep feedforward networks. ArXiv150908101 Cs
work page 2015
-
[43]
Benefits of depth in neural networks
Telgarsky, M., 2016. Benefits of depth in neural networks. In: JMLR: Workshop and Conference Proceedings. Vol. 49. pp. 1–23
work page 2016
-
[44]
Why are high-dimensional finance problems often of low effective dimension? SIAM J
Wang, X., Sloan, I., 2005. Why are high-dimensional finance problems often of low effective dimension? SIAM J. Sci. Comput. 27 (1), 159–183
work page 2005
-
[45]
Error bounds for approximations with deep ReLU networks
Yarotsky, D., 2017. Error bounds for approximations with deep ReLU networks. Neural Networks 94, 103–114
work page 2017
-
[46]
On the regularity of the electronic Schr¨ odinger equation in Hilbert spaces of mixed derivatives
Yserentant, H., 2004. On the regularity of the electronic Schr¨ odinger equation in Hilbert spaces of mixed derivatives. Numer. Math. 98 (4), 731–759
work page 2004
-
[47]
Deep potential molecular dynamics: A scalable model with the accuracy of quantum mechanics
Zhang, L., Han, J., Wang, H., Car, R., E, W., 2018. Deep potential molecular dynamics: A scalable model with the accuracy of quantum mechanics. Phys. Rev. Lett. 120 (14), 143001. 18
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.