ChebNet: Efficient and Stable Constructions of Deep Neural Networks with Rectified Power Units via Chebyshev Approximations

Bo Li; Haijun Yu; Shanshan Tang

arxiv: 1911.05467 · v3 · pith:T6STN56Fnew · submitted 2019-11-07 · 💻 cs.LG · cs.NA· math.NA

ChebNet: Efficient and Stable Constructions of Deep Neural Networks with Rectified Power Units via Chebyshev Approximations

Shanshan Tang , Bo Li , Haijun Yu This is my paper

Pith reviewed 2026-05-24 16:02 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.NA

keywords ChebNetrectified power unitsChebyshev approximationsdeep neural networksfunction approximationspectral accuracynumerical stability

0 comments

The pith

ChebNets construct deep RePU networks from hierarchical Chebyshev approximations that match power-series accuracy for smooth functions while gaining much greater numerical stability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to convert hierarchical Chebyshev polynomial approximations, performed in the frequency domain, into deep neural networks that use rectified power units as activations. This produces networks whose approximation error for smooth functions is no larger than that achieved by the earlier power-series constructions, yet the new networks remain stable under numerical evaluation. A reader would care because power-series routes, although theoretically optimal in complexity and error, become unusable in practice due to instability, blocking access to spectral accuracy in neural approximations of smooth targets.

Core claim

In a previous study it is shown that deep neural networks built with rectified power units can give better approximation for sufficient smooth functions than those built with rectified linear units, by converting polynomial approximations using power series into deep neural networks with optimal complexity and no approximation error. However, in practice, power series approximations are not easy to obtain due to the associated stability issue. In this paper, we propose a new and more stable way to construct RePU deep neural networks based on Chebyshev polynomial approximations. By using a hierarchical structure of Chebyshev polynomial approximation in frequency domain, we obtain efficient a

What carries the argument

Hierarchical Chebyshev polynomial approximation in the frequency domain, converted into a deep RePU network.

If this is right

Approximation rates for smooth functions remain at least as good as those from power-series RePU nets.
Numerical stability improves substantially compared with power-series constructions.
Fine-tuning of the resulting ChebNets produces better practical accuracy than fine-tuning of power-series versions.
Spectral accuracy becomes attainable in deep RePU networks through this stable initialization route.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same frequency-domain hierarchy might be tested with other orthogonal polynomial families to obtain stability tailored to particular function classes.
ChebNet initializations could be inserted into existing optimizers to check whether they improve convergence rates on high-precision scientific-computing tasks.
The construction supplies a concrete way to embed known polynomial approximation theory inside neural-network training loops without losing the theory's guarantees.

Load-bearing premise

A hierarchical Chebyshev approximation performed in the frequency domain can be converted into a deep RePU network that achieves the same optimal complexity and zero approximation error previously obtained only from power-series polynomials.

What would settle it

Direct numerical comparison of floating-point error growth or condition numbers between a high-degree ChebNet and its power-series RePU counterpart when both approximate the same smooth test function, such as exp(-x^2) on an interval.

Figures

Figures reproduced from arXiv: 1911.05467 by Bo Li, Haijun Yu, Shanshan Tang.

**Figure 2.** Figure 2: Results of PowerNet and ChebNet approximating the function [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: The coefficients of Legendre expansion: cj , j = 0, . . . , N (Left) and power series expansion: ˜cj , j = 0, . . . , N (Right) for Gauss function with N = 15. To explain why big coefficients happens, we calculate the condition numbers of BN and HN , denoted by κ(BN ) and κ(HN ), and the results are showed in [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: The coefficients of Chebyshev expansion: [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: The coefficients of Legendre expansion: cj , j = 0, . . . , N (Left) and coefficients of power series expansion: ˜cj , j = 0, . . . , N (Right) for function f2 with N = 30 [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: The coefficients of Chebyshev expansion: [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

read the original abstract

In a previous study [B. Li, S. Tang and H. Yu, Commun. Comput. Phy. 27(2):379-411, 2020], it is shown that deep neural networks built with rectified power units (RePU) as activation functions can give better approximation for sufficient smooth functions than those built with rectified linear units, by converting polynomial approximations using power series into deep neural networks with optimal complexity and no approximation error. However, in practice, power series approximations are not easy to obtain due to the associated stability issue. In this paper, we propose a new and more stable way to construct RePU deep neural networks based on Chebyshev polynomial approximations. By using a hierarchical structure of Chebyshev polynomial approximation in frequency domain, we obtain efficient and stable deep neural network construction, which we call ChebNet. The approximation of smooth functions by ChebNets is no worse than the approximation by deep RePU nets using power series. On the same time, ChebNets are much more stable. Numerical results show that the constructed ChebNets can be further fine-tuned to obtain much better results than those obtained by tuning deep RePU nets constructed by power series approach. As spectral accuracy is hard to obtain by direct training of deep neural networks, ChebNets provide a practical way to obtain spectral accuracy, it is expected to be useful in real applications that require efficient approximations of smooth functions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ChebNet gives a plausible stable alternative via hierarchical Chebyshev but the zero-error optimal-complexity mapping from frequency domain still needs explicit confirmation.

read the letter

The key point is that this work replaces the power-series route to RePU networks with a hierarchical Chebyshev construction in the frequency domain, claiming the same approximation power for smooth functions plus markedly better numerical stability. That is the actual new piece relative to the cited 2020 paper. The authors correctly flag that power series are unstable in practice and position Chebyshev as a fix that still yields deep RePU nets. Numerical experiments are said to show that the resulting ChebNets fine-tune to better accuracy than the power-series versions, which is a concrete practical observation worth checking. The construction itself appears to rest on standard Chebyshev properties plus the earlier RePU conversion result, so there is no obvious circularity. The soft spot is exactly the one flagged in the stress-test note: the abstract asserts that the hierarchical frequency-domain scheme converts to a RePU net with identical depth/width and zero approximation error, yet supplies no derivation or explicit basis-change accounting. If extra RePU layers or operations are needed to move between bases, the complexity or error claims could shift. Until the full mapping is laid out with bounds, the central equivalence remains an assertion rather than a demonstrated fact. The paper is aimed at people building surrogate models or spectral-accuracy networks for smooth functions in scientific computing. A reader already working on RePU or polynomial-based architectures would find the stability angle useful to test. It is coherent enough on its own terms to merit referee time; the idea is distinct and the stability motivation is real, even if the proofs need tightening.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes ChebNet, a construction of deep RePU networks via hierarchical Chebyshev polynomial approximations performed in the frequency domain. It claims this yields efficient, stable networks whose approximation quality for smooth functions is no worse than that of power-series-based RePU nets from prior work, while offering substantially better stability; numerical experiments are said to show that ChebNets fine-tune to superior results and thereby provide a practical route to spectral accuracy.

Significance. If the claimed exact conversion to RePU networks preserves optimal depth/width and zero approximation error while improving stability, the work would supply a concrete, usable alternative to power-series constructions that suffer from numerical instability. The reported fine-tuning gains constitute positive empirical evidence of practical advantage in settings where direct DNN training fails to reach spectral accuracy.

major comments (2)

[ChebNet construction (abstract and §3)] The central claim that the hierarchical frequency-domain Chebyshev construction converts to a deep RePU network with the same optimal complexity and zero approximation error as the power-series route (Li et al., 2020) is asserted in the abstract and introduction but is not accompanied by an explicit mapping, basis-change analysis, or error-bound derivation. This equivalence is load-bearing for the statement that approximation quality is 'no worse.'
[Abstract and numerical-results section] The repeated assertion that 'ChebNets are much more stable' lacks any quantitative stability metric (condition numbers, perturbation sensitivity, or floating-point error growth) comparing the Chebyshev hierarchy to the power-series construction; without such evidence the stability advantage remains unverified.

minor comments (1)

[Abstract] Abstract: 'On the same time' should read 'At the same time.'

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and recommendation for major revision. We address each major comment below and will revise the manuscript to strengthen the presentation of the construction and stability claims.

read point-by-point responses

Referee: [ChebNet construction (abstract and §3)] The central claim that the hierarchical frequency-domain Chebyshev construction converts to a deep RePU network with the same optimal complexity and zero approximation error as the power-series route (Li et al., 2020) is asserted in the abstract and introduction but is not accompanied by an explicit mapping, basis-change analysis, or error-bound derivation. This equivalence is load-bearing for the statement that approximation quality is 'no worse.'

Authors: We agree that an explicit mapping, basis-change analysis, and error-bound derivation are needed to fully support the claim. The hierarchical frequency-domain construction is intended to permit direct conversion to RePU networks via the three-term recurrence of Chebyshev polynomials (which can be realized layer-wise with RePU activations) while preserving the same depth/width as the power-series route from Li et al. (2020). To address the gap, we will add a dedicated subsection in §3 that (i) gives the explicit change-of-basis from Chebyshev to monomial coefficients, (ii) shows the resulting RePU network has identical complexity, and (iii) derives the error bound confirming the approximation quality is no worse (zero additional error beyond the underlying polynomial approximation). revision: yes
Referee: [Abstract and numerical-results section] The repeated assertion that 'ChebNets are much more stable' lacks any quantitative stability metric (condition numbers, perturbation sensitivity, or floating-point error growth) comparing the Chebyshev hierarchy to the power-series construction; without such evidence the stability advantage remains unverified.

Authors: We acknowledge that the stability claim requires quantitative support. The frequency-domain Chebyshev hierarchy is expected to be more stable because Chebyshev polynomials are bounded on [-1,1] and the recurrence avoids the rapid growth of monomial coefficients that occurs in power-series expansions. We will add direct quantitative comparisons in the numerical-results section, including condition numbers of the weight matrices, sensitivity to small perturbations in the input coefficients, and observed floating-point error growth for both constructions on the same test functions. These metrics will be reported alongside the existing fine-tuning experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity; Chebyshev construction is independent of power-series prior result

full rationale

The paper cites its own prior work only for the base fact that power-series polynomials convert to RePU nets with optimal depth/width and zero error. The new contribution is a separate hierarchical Chebyshev approximation performed in the frequency domain, which is a standard, externally verifiable technique not derived from the power-series case. The claim that ChebNets achieve approximation quality no worse than the power-series route follows directly from the known minimax properties of Chebyshev polynomials plus the shared conversion method; it does not reduce the new construction to the old one by definition. No equation or step equates the Chebyshev hierarchy to its own input or to a self-citation that itself lacks independent support. The stability advantage is presented as an empirical and theoretical consequence of the frequency-domain approach, not a renaming or fitted prediction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on standard approximation-theory facts about Chebyshev polynomials and on the conversion lemma established in the authors' prior RePU paper; no new free parameters or invented physical entities are introduced.

axioms (2)

standard math Chebyshev polynomials admit stable hierarchical approximations in the frequency domain that can be realized by rectified power units
Invoked when the abstract states that the hierarchical structure yields efficient and stable networks.
domain assumption The conversion from polynomial approximation to deep RePU network preserves optimal complexity and zero approximation error
Carried over from the cited prior work on power series; required for the 'no worse' claim.

invented entities (1)

ChebNet no independent evidence
purpose: Label for the hierarchical Chebyshev-based RePU network construction
New name introduced for the proposed method; no independent evidence supplied.

pith-pipeline@v0.9.0 · 5806 in / 1434 out tokens · 26929 ms · 2026-05-24T16:02:43.631278+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By using a hierarchical structure of Chebyshev polynomial approximation in frequency domain, we obtain efficient and stable deep neural network construction... The approximation of smooth functions by ChebNets is no worse than the approximation by deep RePU nets using power series.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat induction / embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 1. For n ≥ 1, assume p(x) = ∑ cj Tj(x) ... there exists a σ2 neural network with at most ⌊log2 n⌋ + 1 hidden layers ... O(n) neurons and total non-zero weights.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Distributional Off-Policy Evaluation with Deep Quantile Process Regression
stat.ML 2026-04 unverdicted novelty 6.0

DQPOPE estimates the entire return distribution in off-policy evaluation via deep quantile process regression, providing statistical advantages over standard single-value methods with equivalent sample sizes.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Solving the Schroedinger equation using Smolyak interpolants

Avila, G., Carrington, T., 2013. Solving the Schroedinger equation using Smolyak interpolants. J. Chem. Phys. 139 (13), 134114

work page 2013
[2]

High dimensional polynomial interpolation on sparse grids

Barthelmann, V., Novak, E., Ritter, K., 2000. High dimensional polynomial interpolation on sparse grids. Adv. Comput. Math. 12 (4), 273–288

work page 2000
[3]

Greedy layer-wise training of deep networks

Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., 2007. Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems. pp. 153–160

work page 2007
[4]

P., 2000

Boyd, J. P., 2000. Chebyshev and Fourier Spectral Methods. Dover Publications, INC

work page 2000
[5]

J., 1992

Bungartz, H. J., 1992. An adaptive Poisson solver using hierarchical bases and sparse grids. In: Iterative Methods in Linear Algebra. Brussels, Belgium, pp. 293–310

work page 1992
[6]

J., Griebel, M., 2004

Bungartz, H. J., Griebel, M., 2004. Sparse grids. Acta Numer. 13, 1–123

work page 2004
[7]

Approximation by superpositions of a sigmoidal function

Cybenko, G., 1989. Approximation by superpositions of a sigmoidal function. Math. Control Signal Systems 2 (4), 303–314

work page 1989
[8]

Exponential convergence of the deep neural network approximation for analytic functions

E, W., Wang, Q., 2018. Exponential convergence of the deep neural network approximation for analytic functions. Sci. China Math. 61 (10), 1733–1740. URL https://link.springer.com/article/10.1007/s11425-018-9387-x

work page doi:10.1007/s11425-018-9387-x 2018
[9]

The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems

E, W., Yu, B., 2018. The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6 (1), 1–12

work page 2018
[10]

The power of depth for feedforward neural networks

Eldan, R., Shamir, O., 2016. The power of depth for feedforward neural networks. JMLR Workshop Conf. Proc. 49, 1–34

work page 2016
[11]

Optimally scaled and optimally conditioned vandermonde and vandermonde-like matrices

Gautschi, W., 2011. Optimally scaled and optimally conditioned vandermonde and vandermonde-like matrices. BIT Nu- merical Mathematics 51 (1), 103–125. URL http://link.springer.com/10.1007/s10543-010-0293-1

work page doi:10.1007/s10543-010-0293-1 2011
[12]

Sparse grids for the Schr¨ odinger equation

Griebel, M., Hamaekers, J., 2007. Sparse grids for the Schr¨ odinger equation. Math. Model. Numer. Anal. 41 (2), 215–247

work page 2007
[13]

A sparse grid discontinuous galerkin method for high-dimensional transport equations and its application to kinetic simulations

Guo, W., Cheng, Y., 2016. A sparse grid discontinuous galerkin method for high-dimensional transport equations and its application to kinetic simulations. SIAM J. Sci. Comput. 38 (6), A3381–A3409

work page 2016
[14]

Solving high-dimensional partial diﬀerential equations using deep learning

Han, J., Jentzen, A., E, W., 2018. Solving high-dimensional partial diﬀerential equations using deep learning. PNAS 115 (34), 8505–8510

work page 2018
[15]

Deep residual learning for image recognition

He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778

work page 2016
[16]

Deep neural networks for acoustic modeling in speech recognition

Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Kingsbury, B., Sainath, T., 2012. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29

work page 2012
[17]

A fast learning algorithm for deep belief nets

Hinton, G., Osindero, S., Teh, Y.-W., 2006. A fast learning algorithm for deep belief nets. Neural Computation 18 (7), 1527–1554

work page 2006
[18]

Multilayer feedforward networks are universal approximators

Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2 (5), 359–366

work page 1989
[19]

E., 2012

Krizhevsky, A., Sutskever, I., Hinton, G. E., 2012. ImageNet classiﬁcation with deep convolutional neural networks. Neural Information Processing Systems 141 (5), 1097–1105

work page 2012
[20]

A theoretical analysis of deep neural networks and parametric PDEs

Kutyniok, G., Petersen, P., Raslan, M., Schneider, R., 2019. A theoretical analysis of deep neural networks and parametric PDEs. arXiv:1904.00377

work page arXiv 2019
[21]

Deep learning

LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521 (7553), 436–444

work page 2015
[22]

Better approximations of high dimensional smooth functions by deep neural networks with rectiﬁed power units

Li, B., Tang, S., Yu, H., 2019. Better approximations of high dimensional smooth functions by deep neural networks with rectiﬁed power units. arXiv:1903.05858, to appear on Commun. Comput. Phys. URL http://admin.global-sci.org/uploads/online_news/CiCP/201911050902-12788.pdf

work page arXiv 2019
[23]

PowerNet: Eﬃcient representations of polynomials and smooth functions by deep neural networks with rectiﬁed power units

Li, B., Tang, S., Yu, H., 2019. PowerNet: Eﬃcient representations of polynomials and smooth functions by deep neural networks with rectiﬁed power units. arXiv:1909.05136. URL https://arxiv.org/abs/1909.05136

work page arXiv 2019
[24]

Why Deep Neural Networks for Function Approximation?

Liang, S., Srikant, R., 2016. Why deep neural networks for function approximation? arXiv:1610.04161. URL https://arxiv.org/abs/1610.04161

work page internal anchor Pith review Pith/arXiv arXiv 2016
[25]

A sparse ﬁnite element method with high accuracy: Part I

Lin, Q., Yan, N., Zhou, A., 2001. A sparse ﬁnite element method with high accuracy: Part I. Numer. Math. 88 (4), 731–742

work page 2001
[26]

N., 1993

Mhaskar, H. N., 1993. Approximation properties of a multilayered feedforward artiﬁcial neural network. Adv. Comput. Math. 1 (1), 61–80. URL http://link.springer.com/10.1007/BF02070821

work page doi:10.1007/bf02070821 1993
[27]

N., 1996

Mhaskar, H. N., 1996. Neural networks for optimal approximation of smooth and analytic functions. Neural Computation 8 (1), 164–177

work page 1996
[28]

New error bounds for deep ReLU networks using sparse grids

Montanelli, H., Du, Q., 2019. New error bounds for deep ReLU networks using sparse grids. SIAM J. Math. Data Sci. 1 (1), 78–92. 17

work page 2019
[29]

A sparse grid stochastic collocation method for partial diﬀerential equations with random input data

Nobile, F., Tempone, R., Webster, C., 2008. A sparse grid stochastic collocation method for partial diﬀerential equations with random input data. SIAM J. Numer. Anal. 46 (5), 2309–2345

work page 2008
[30]

Opschoor, J. A. A., Schwab, C., Zech, J., 2019. Exponential ReLU DNN expression of holomorphic maps in high dimension. Tech. Rep. 35, SAM ETH Z¨ urich. URL https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2019/2019-35.pdf

work page 2019
[31]

Optimal approximation of piecewise smooth functions using deep ReLU neural networks

Petersen, P., Voigtlaender, F., 2018. Optimal approximation of piecewise smooth functions using deep ReLU neural networks. Neural Networks 108, 296–330

work page 2018
[32]

B., Trefethen, L

Platte, R. B., Trefethen, L. N., 2010. Chebfun: A New Kind of Numerical Computing. In: Fitt, A. D., Norbury, J., Ockendon, H., Wilson, E. (Eds.), Progress in Industrial Mathematics at ECMI 2008. Mathematics in Industry. Springer, Berlin, Heidelberg, pp. 69–87

work page 2010
[33]

Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review

Poggio, T., Mhaskar, H., Rosasco, L., Miranda, B., Liao, Q., 2017. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review. Int. J. Autom. Comput. 14 (5), 503–519

work page 2017
[34]

A nodal sparse grid spectral element method for multi-dimensional elliptic partial diﬀerential equations

Rong, Z., Shen, J., Yu, H., 2017. A nodal sparse grid spectral element method for multi-dimensional elliptic partial diﬀerential equations. Int. J. Numer. Anal. Model. 14 (4-5), 762–783

work page 2017
[35]

A., 2003

Schwab, C., Todor, R. A., 2003. Sparse ﬁnite elements for elliptic problems with stochastic loading. Numerische Mathe- matik 95 (4), 707–734

work page 2003
[36]

Sparse spectral approximations of high-dimensional problems based on hyperbolic cross

Shen, J., Wang, L., 2010. Sparse spectral approximations of high-dimensional problems based on hyperbolic cross. SIAM J Numer Anal 48 (4), 1087–1109

work page 2010
[37]

Approximations by orthonormal mapped Chebyshev functions for higher-dimensional problems in unbounded domains

Shen, J., Wang, L., Yu, H., 2014. Approximations by orthonormal mapped Chebyshev functions for higher-dimensional problems in unbounded domains. J. Comput. Appl. Math. 265, 264–275

work page 2014
[38]

Eﬃcient spectral-element methods for the electronic Schr¨ odinger equation

Shen, J., Wang, Y., Yu, H., 2016. Eﬃcient spectral-element methods for the electronic Schr¨ odinger equation. In: Garcke, J., Pﬂ¨ uger, D. (Eds.), Sparse Grids and Applications - Stuttgart 2014. Lecture Notes in Computational Science and Engineering. Springer International Publishing, pp. 265–289

work page 2016
[39]

Eﬃcient spectral sparse grid methods and applications to high-dimensional elliptic problems

Shen, J., Yu, H., 2010. Eﬃcient spectral sparse grid methods and applications to high-dimensional elliptic problems. SIAM J. Sci. Comput. 32 (6), 3228–3250

work page 2010
[40]

Eﬃcient spectral sparse grid methods and applications to high-dimensional elliptic equations II: Unbounded domains

Shen, J., Yu, H., 2012. Eﬃcient spectral sparse grid methods and applications to high-dimensional elliptic equations II: Unbounded domains. SIAM J. Sci. Comput. 34 (2), 1141–1164

work page 2012
[41]

A., 1963

Smolyak, S. A., 1963. Quadrature and interpolation formulas for tensor products of certain classes of functions. Dokl Akad Nauk SSSR 148 (5), 1042–1045

work page 1963
[42]

Representation beneﬁts of deep feedforward networks

Telgarsky, M., 2015. Representation beneﬁts of deep feedforward networks. ArXiv150908101 Cs

work page 2015
[43]

Beneﬁts of depth in neural networks

Telgarsky, M., 2016. Beneﬁts of depth in neural networks. In: JMLR: Workshop and Conference Proceedings. Vol. 49. pp. 1–23

work page 2016
[44]

Why are high-dimensional ﬁnance problems often of low eﬀective dimension? SIAM J

Wang, X., Sloan, I., 2005. Why are high-dimensional ﬁnance problems often of low eﬀective dimension? SIAM J. Sci. Comput. 27 (1), 159–183

work page 2005
[45]

Error bounds for approximations with deep ReLU networks

Yarotsky, D., 2017. Error bounds for approximations with deep ReLU networks. Neural Networks 94, 103–114

work page 2017
[46]

On the regularity of the electronic Schr¨ odinger equation in Hilbert spaces of mixed derivatives

Yserentant, H., 2004. On the regularity of the electronic Schr¨ odinger equation in Hilbert spaces of mixed derivatives. Numer. Math. 98 (4), 731–759

work page 2004
[47]

Deep potential molecular dynamics: A scalable model with the accuracy of quantum mechanics

Zhang, L., Han, J., Wang, H., Car, R., E, W., 2018. Deep potential molecular dynamics: A scalable model with the accuracy of quantum mechanics. Phys. Rev. Lett. 120 (14), 143001. 18

work page 2018

[1] [1]

Solving the Schroedinger equation using Smolyak interpolants

Avila, G., Carrington, T., 2013. Solving the Schroedinger equation using Smolyak interpolants. J. Chem. Phys. 139 (13), 134114

work page 2013

[2] [2]

High dimensional polynomial interpolation on sparse grids

Barthelmann, V., Novak, E., Ritter, K., 2000. High dimensional polynomial interpolation on sparse grids. Adv. Comput. Math. 12 (4), 273–288

work page 2000

[3] [3]

Greedy layer-wise training of deep networks

Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., 2007. Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems. pp. 153–160

work page 2007

[4] [4]

P., 2000

Boyd, J. P., 2000. Chebyshev and Fourier Spectral Methods. Dover Publications, INC

work page 2000

[5] [5]

J., 1992

Bungartz, H. J., 1992. An adaptive Poisson solver using hierarchical bases and sparse grids. In: Iterative Methods in Linear Algebra. Brussels, Belgium, pp. 293–310

work page 1992

[6] [6]

J., Griebel, M., 2004

Bungartz, H. J., Griebel, M., 2004. Sparse grids. Acta Numer. 13, 1–123

work page 2004

[7] [7]

Approximation by superpositions of a sigmoidal function

Cybenko, G., 1989. Approximation by superpositions of a sigmoidal function. Math. Control Signal Systems 2 (4), 303–314

work page 1989

[8] [8]

Exponential convergence of the deep neural network approximation for analytic functions

E, W., Wang, Q., 2018. Exponential convergence of the deep neural network approximation for analytic functions. Sci. China Math. 61 (10), 1733–1740. URL https://link.springer.com/article/10.1007/s11425-018-9387-x

work page doi:10.1007/s11425-018-9387-x 2018

[9] [9]

The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems

E, W., Yu, B., 2018. The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6 (1), 1–12

work page 2018

[10] [10]

The power of depth for feedforward neural networks

Eldan, R., Shamir, O., 2016. The power of depth for feedforward neural networks. JMLR Workshop Conf. Proc. 49, 1–34

work page 2016

[11] [11]

Optimally scaled and optimally conditioned vandermonde and vandermonde-like matrices

Gautschi, W., 2011. Optimally scaled and optimally conditioned vandermonde and vandermonde-like matrices. BIT Nu- merical Mathematics 51 (1), 103–125. URL http://link.springer.com/10.1007/s10543-010-0293-1

work page doi:10.1007/s10543-010-0293-1 2011

[12] [12]

Sparse grids for the Schr¨ odinger equation

Griebel, M., Hamaekers, J., 2007. Sparse grids for the Schr¨ odinger equation. Math. Model. Numer. Anal. 41 (2), 215–247

work page 2007

[13] [13]

A sparse grid discontinuous galerkin method for high-dimensional transport equations and its application to kinetic simulations

Guo, W., Cheng, Y., 2016. A sparse grid discontinuous galerkin method for high-dimensional transport equations and its application to kinetic simulations. SIAM J. Sci. Comput. 38 (6), A3381–A3409

work page 2016

[14] [14]

Solving high-dimensional partial diﬀerential equations using deep learning

Han, J., Jentzen, A., E, W., 2018. Solving high-dimensional partial diﬀerential equations using deep learning. PNAS 115 (34), 8505–8510

work page 2018

[15] [15]

Deep residual learning for image recognition

He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778

work page 2016

[16] [16]

Deep neural networks for acoustic modeling in speech recognition

Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Kingsbury, B., Sainath, T., 2012. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29

work page 2012

[17] [17]

A fast learning algorithm for deep belief nets

Hinton, G., Osindero, S., Teh, Y.-W., 2006. A fast learning algorithm for deep belief nets. Neural Computation 18 (7), 1527–1554

work page 2006

[18] [18]

Multilayer feedforward networks are universal approximators

Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2 (5), 359–366

work page 1989

[19] [19]

E., 2012

Krizhevsky, A., Sutskever, I., Hinton, G. E., 2012. ImageNet classiﬁcation with deep convolutional neural networks. Neural Information Processing Systems 141 (5), 1097–1105

work page 2012

[20] [20]

A theoretical analysis of deep neural networks and parametric PDEs

Kutyniok, G., Petersen, P., Raslan, M., Schneider, R., 2019. A theoretical analysis of deep neural networks and parametric PDEs. arXiv:1904.00377

work page arXiv 2019

[21] [21]

Deep learning

LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521 (7553), 436–444

work page 2015

[22] [22]

Better approximations of high dimensional smooth functions by deep neural networks with rectiﬁed power units

Li, B., Tang, S., Yu, H., 2019. Better approximations of high dimensional smooth functions by deep neural networks with rectiﬁed power units. arXiv:1903.05858, to appear on Commun. Comput. Phys. URL http://admin.global-sci.org/uploads/online_news/CiCP/201911050902-12788.pdf

work page arXiv 2019

[23] [23]

PowerNet: Eﬃcient representations of polynomials and smooth functions by deep neural networks with rectiﬁed power units

Li, B., Tang, S., Yu, H., 2019. PowerNet: Eﬃcient representations of polynomials and smooth functions by deep neural networks with rectiﬁed power units. arXiv:1909.05136. URL https://arxiv.org/abs/1909.05136

work page arXiv 2019

[24] [24]

Why Deep Neural Networks for Function Approximation?

Liang, S., Srikant, R., 2016. Why deep neural networks for function approximation? arXiv:1610.04161. URL https://arxiv.org/abs/1610.04161

work page internal anchor Pith review Pith/arXiv arXiv 2016

[25] [25]

A sparse ﬁnite element method with high accuracy: Part I

Lin, Q., Yan, N., Zhou, A., 2001. A sparse ﬁnite element method with high accuracy: Part I. Numer. Math. 88 (4), 731–742

work page 2001

[26] [26]

N., 1993

Mhaskar, H. N., 1993. Approximation properties of a multilayered feedforward artiﬁcial neural network. Adv. Comput. Math. 1 (1), 61–80. URL http://link.springer.com/10.1007/BF02070821

work page doi:10.1007/bf02070821 1993

[27] [27]

N., 1996

Mhaskar, H. N., 1996. Neural networks for optimal approximation of smooth and analytic functions. Neural Computation 8 (1), 164–177

work page 1996

[28] [28]

New error bounds for deep ReLU networks using sparse grids

Montanelli, H., Du, Q., 2019. New error bounds for deep ReLU networks using sparse grids. SIAM J. Math. Data Sci. 1 (1), 78–92. 17

work page 2019

[29] [29]

A sparse grid stochastic collocation method for partial diﬀerential equations with random input data

Nobile, F., Tempone, R., Webster, C., 2008. A sparse grid stochastic collocation method for partial diﬀerential equations with random input data. SIAM J. Numer. Anal. 46 (5), 2309–2345

work page 2008

[30] [30]

Opschoor, J. A. A., Schwab, C., Zech, J., 2019. Exponential ReLU DNN expression of holomorphic maps in high dimension. Tech. Rep. 35, SAM ETH Z¨ urich. URL https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2019/2019-35.pdf

work page 2019

[31] [31]

Optimal approximation of piecewise smooth functions using deep ReLU neural networks

Petersen, P., Voigtlaender, F., 2018. Optimal approximation of piecewise smooth functions using deep ReLU neural networks. Neural Networks 108, 296–330

work page 2018

[32] [32]

B., Trefethen, L

Platte, R. B., Trefethen, L. N., 2010. Chebfun: A New Kind of Numerical Computing. In: Fitt, A. D., Norbury, J., Ockendon, H., Wilson, E. (Eds.), Progress in Industrial Mathematics at ECMI 2008. Mathematics in Industry. Springer, Berlin, Heidelberg, pp. 69–87

work page 2010

[33] [33]

Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review

Poggio, T., Mhaskar, H., Rosasco, L., Miranda, B., Liao, Q., 2017. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review. Int. J. Autom. Comput. 14 (5), 503–519

work page 2017

[34] [34]

A nodal sparse grid spectral element method for multi-dimensional elliptic partial diﬀerential equations

Rong, Z., Shen, J., Yu, H., 2017. A nodal sparse grid spectral element method for multi-dimensional elliptic partial diﬀerential equations. Int. J. Numer. Anal. Model. 14 (4-5), 762–783

work page 2017

[35] [35]

A., 2003

Schwab, C., Todor, R. A., 2003. Sparse ﬁnite elements for elliptic problems with stochastic loading. Numerische Mathe- matik 95 (4), 707–734

work page 2003

[36] [36]

Sparse spectral approximations of high-dimensional problems based on hyperbolic cross

Shen, J., Wang, L., 2010. Sparse spectral approximations of high-dimensional problems based on hyperbolic cross. SIAM J Numer Anal 48 (4), 1087–1109

work page 2010

[37] [37]

Approximations by orthonormal mapped Chebyshev functions for higher-dimensional problems in unbounded domains

Shen, J., Wang, L., Yu, H., 2014. Approximations by orthonormal mapped Chebyshev functions for higher-dimensional problems in unbounded domains. J. Comput. Appl. Math. 265, 264–275

work page 2014

[38] [38]

Eﬃcient spectral-element methods for the electronic Schr¨ odinger equation

Shen, J., Wang, Y., Yu, H., 2016. Eﬃcient spectral-element methods for the electronic Schr¨ odinger equation. In: Garcke, J., Pﬂ¨ uger, D. (Eds.), Sparse Grids and Applications - Stuttgart 2014. Lecture Notes in Computational Science and Engineering. Springer International Publishing, pp. 265–289

work page 2016

[39] [39]

Eﬃcient spectral sparse grid methods and applications to high-dimensional elliptic problems

Shen, J., Yu, H., 2010. Eﬃcient spectral sparse grid methods and applications to high-dimensional elliptic problems. SIAM J. Sci. Comput. 32 (6), 3228–3250

work page 2010

[40] [40]

Eﬃcient spectral sparse grid methods and applications to high-dimensional elliptic equations II: Unbounded domains

Shen, J., Yu, H., 2012. Eﬃcient spectral sparse grid methods and applications to high-dimensional elliptic equations II: Unbounded domains. SIAM J. Sci. Comput. 34 (2), 1141–1164

work page 2012

[41] [41]

A., 1963

Smolyak, S. A., 1963. Quadrature and interpolation formulas for tensor products of certain classes of functions. Dokl Akad Nauk SSSR 148 (5), 1042–1045

work page 1963

[42] [42]

Representation beneﬁts of deep feedforward networks

Telgarsky, M., 2015. Representation beneﬁts of deep feedforward networks. ArXiv150908101 Cs

work page 2015

[43] [43]

Beneﬁts of depth in neural networks

Telgarsky, M., 2016. Beneﬁts of depth in neural networks. In: JMLR: Workshop and Conference Proceedings. Vol. 49. pp. 1–23

work page 2016

[44] [44]

Why are high-dimensional ﬁnance problems often of low eﬀective dimension? SIAM J

Wang, X., Sloan, I., 2005. Why are high-dimensional ﬁnance problems often of low eﬀective dimension? SIAM J. Sci. Comput. 27 (1), 159–183

work page 2005

[45] [45]

Error bounds for approximations with deep ReLU networks

Yarotsky, D., 2017. Error bounds for approximations with deep ReLU networks. Neural Networks 94, 103–114

work page 2017

[46] [46]

On the regularity of the electronic Schr¨ odinger equation in Hilbert spaces of mixed derivatives

Yserentant, H., 2004. On the regularity of the electronic Schr¨ odinger equation in Hilbert spaces of mixed derivatives. Numer. Math. 98 (4), 731–759

work page 2004

[47] [47]

Deep potential molecular dynamics: A scalable model with the accuracy of quantum mechanics

Zhang, L., Han, J., Wang, H., Car, R., E, W., 2018. Deep potential molecular dynamics: A scalable model with the accuracy of quantum mechanics. Phys. Rev. Lett. 120 (14), 143001. 18

work page 2018