Representation Costs in Data Science: Foundations and the Quasi-Banach Spaces of Deep Neural Networks
Pith reviewed 2026-06-27 04:21 UTC · model grok-4.3
The pith
Deep ReLU networks of depth L induce p-normable quasi-Banach native spaces with p equal to 2/L.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the representation cost of depth-L feedforward ReLU networks induces native spaces that are p-normable quasi-Banach spaces with p = 2/L. Consequently the inductive bias expressed by this cost cannot be captured by any norm when the depth is greater than 2.
What carries the argument
The representation cost, defined as the infimum of a parameter-space regularizer over all parameter vectors that realize a given function; this cost induces the native function space on which representer theorems are proved.
If this is right
- Representer theorems hold for arbitrary parametric methods on their native spaces.
- Sufficiently overparameterized parametric models become equivalent to their nonparametric counterparts on the native space.
- Kernel methods, wavelets, and shallow networks appear as special cases inside the same abstract framework.
- The native spaces of deep ReLU networks are quasi-Banach but not Banach when L > 2.
Where Pith is reading between the lines
- Optimization algorithms for deep networks may need to be redesigned around quasi-norms rather than norms once depth exceeds two.
- Generalization bounds derived from norm-based complexity measures may need replacement by quasi-norm analogues for deep architectures.
- The same abstract construction could be applied to other activation functions or architectures to classify their native spaces.
Load-bearing premise
Defining representation cost as the infimum of a parameter regularizer is assumed to produce a well-behaved native function space in which representer theorems and nonparametric equivalences hold.
What would settle it
An explicit construction, for depth-3 ReLU networks, of two functions f and g such that the representation cost of f + g exceeds the sum of the costs of f and g by an arbitrary factor would show the space fails to be normable.
read the original abstract
We develop a general framework for analyzing representation costs of parametric data-fitting methods through their parameter-space regularizers. From this abstract perspective, we define representation costs for arbitrary parametric models and reveal their induced (native) function spaces. This unifies recent function-space views of data-fitting methods. We also prove that many natural results hold in this abstract setting, including representer theorems for parametric methods on their native spaces. The framework also rigorously connects parametric methods with their equivalent nonparametric descriptions under sufficient overparameterization. Classical methods and their native spaces, such as kernel methods / reproducing kernel Hilbert spaces, wavelets / Besov spaces, and shallow neural networks / variation spaces emerge as special cases of our abstract framework. A byproduct of "axiomatizing" the study of representation costs is that we also immediately obtain new results for deep neural networks: For depth-$L$ feedforward ReLU networks, their induced native spaces are $p$-normable quasi-Banach spaces with $p = 2/L$. This reveals that the inductive bias of deep neural networks (as given by the representation cost) cannot be captured by norms for depths $L > 2$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops an abstract framework for representation costs of parametric data-fitting methods defined via infima of parameter-space regularizers. From this it defines induced native function spaces, proves representer theorems in the abstract setting, and establishes equivalences to nonparametric formulations under sufficient overparameterization. Classical cases (RKHS for kernels, Besov spaces for wavelets, variation spaces for shallow networks) are recovered as special cases. A central new result is that depth-L feedforward ReLU networks induce p-normable quasi-Banach native spaces with p = 2/L, implying that their representation-cost inductive bias cannot be captured by a norm when L > 2.
Significance. If the derivations hold, the framework supplies a unified language for representation costs across parametric models and yields a concrete, non-norm characterization of the function space bias of deep ReLU networks. Recovery of known spaces plus the explicit p = 2/L quasi-Banach claim for DNNs would constitute a substantive theoretical contribution to the function-space analysis of deep learning.
major comments (2)
- [Abstract / DNN native-space section] Abstract and the section introducing the DNN result: the claim that the native space is p-normable with p = 2/L is asserted as an immediate byproduct of the axiomatization, yet the manuscript must exhibit the explicit verification that the representation-cost functional satisfies the p-triangle inequality with this precise exponent; without that step the central claim for L > 2 rests on an uninspected derivation.
- [Framework definition] Definition of representation cost (the infimum over parameters realizing a given function): it is not yet shown that this functional is lower semi-continuous or satisfies the requisite quasi-norm axioms on the function space it induces; this property is load-bearing for both the representer theorem and the quasi-Banach conclusion.
minor comments (2)
- Notation for the parameter-space regularizer and the induced native-space quasi-norm should be introduced with a single consistent symbol and clearly distinguished from the classical norm case.
- The statement that 'many natural results hold in this abstract setting' would benefit from an enumerated list of the theorems proved, with pointers to their statements.
Simulated Author's Rebuttal
We thank the referee for the constructive report and positive assessment of the framework's potential. We address the two major comments below, agreeing that explicit verifications strengthen the presentation and will be incorporated in revision.
read point-by-point responses
-
Referee: [Abstract / DNN native-space section] Abstract and the section introducing the DNN result: the claim that the native space is p-normable with p = 2/L is asserted as an immediate byproduct of the axiomatization, yet the manuscript must exhibit the explicit verification that the representation-cost functional satisfies the p-triangle inequality with this precise exponent; without that step the central claim for L > 2 rests on an uninspected derivation.
Authors: We agree that the p-triangle inequality requires explicit verification for the DNN case rather than relying solely on the abstract axiomatization. In the revised manuscript we will insert a new lemma (in the DNN native-space section) that directly computes the representation cost for depth-L ReLU networks and verifies the p-triangle inequality holds with exponent p = 2/L. This step-by-step derivation will be self-contained and independent of the general framework. revision: yes
-
Referee: [Framework definition] Definition of representation cost (the infimum over parameters realizing a given function): it is not yet shown that this functional is lower semi-continuous or satisfies the requisite quasi-norm axioms on the function space it induces; this property is load-bearing for both the representer theorem and the quasi-Banach conclusion.
Authors: The abstract framework assumes the parameter regularizer satisfies standard conditions that imply the induced representation cost is a quasi-norm; however, the referee is correct that lower semi-continuity with respect to the induced function-space topology is not stated explicitly. We will add a short lemma immediately after the definition of the representation cost that proves lower semi-continuity under the maintained hypotheses on the parameter map. This will also confirm the quasi-norm axioms hold on the native space. revision: yes
Circularity Check
No significant circularity; framework is definitional and self-contained
full rationale
The paper defines representation cost as the infimum of a parameter regularizer over realizations of a function, then derives the induced native space and its properties (including the p=2/L quasi-Banach structure for depth-L ReLU nets) directly from that definition applied to specific models. This recovers known spaces (RKHS, Besov, variation) as special cases without any reduction of the central claim to a fitted parameter, self-citation chain, or input-by-construction. The derivation is axiomatic rather than predictive, with no load-bearing step that collapses to its own inputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Representation costs of parametric models are defined through infima of parameter-space regularizers.
- domain assumption The induced native spaces admit representer theorems and nonparametric equivalences under sufficient overparameterization.
invented entities (1)
-
Native quasi-Banach spaces for depth-L ReLU networks with p=2/L
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Fernando Albiac and Nigel J. Kalton. Lipschitz structure of quasi-Banach spaces.Israel Journal of Mathematics, 170(1):317–335, 2009
2009
-
[2]
Locally bounded linear topological spaces.Proceedings of the Imperial Academy, 18(10):588–594, 1942
Tosio Aoki. Locally bounded linear topological spaces.Proceedings of the Imperial Academy, 18(10):588–594, 1942
1942
-
[3]
Theory of reproducing kernels.Transactions of the American Mathe- matical Society, 68(3):337–404, 1950
Nachman Aronszajn. Theory of reproducing kernels.Transactions of the American Mathe- matical Society, 68(3):337–404, 1950
1950
-
[4]
Understanding deep neural networks with rectified linear units
Raman Arora, Amitabh Basu, Poorya Mianjy, and Anirbit Mukherjee. Understanding deep neural networks with rectified linear units. InInternational Conference on Learning Representations (ICLR), 2018
2018
-
[5]
Implicit regularization in deep matrix factorization.Advances in Neural Information Processing Systems (NeurIPS), 32, 2019
Sanjeev Arora, Nadav Cohen, Wei Hu, and Yuping Luo. Implicit regularization in deep matrix factorization.Advances in Neural Information Processing Systems (NeurIPS), 32, 2019
2019
-
[6]
Breaking the curse of dimensionality with convex neural networks.Journal of Machine Learning Research, 18(1):629–681, 2017
Francis Bach. Breaking the curse of dimensionality with convex neural networks.Journal of Machine Learning Research, 18(1):629–681, 2017
2017
-
[7]
Optimization with sparsity-inducing penalties.Foundations and Trends®in Machine Learning, 4(1):1–106, 2012
Francis Bach, Rodolphe Jenatton, Julien Mairal, and Guillaume Obozinski. Optimization with sparsity-inducing penalties.Foundations and Trends®in Machine Learning, 4(1):1–106, 2012
2012
-
[8]
Better neural network expressivity: subdividing the simplex.arXiv preprint arXiv:2505.14338, 2025
Egor Bakaev, Florestan Brunck, Christoph Hertrich, Jack Stade, and Amir Yehudayoff. Better neural network expressivity: subdividing the simplex.arXiv preprint arXiv:2505.14338, 2025. 73
arXiv 2025
-
[9]
Andrew R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3):930–945, 1993
1993
-
[10]
Andrew R. Barron. Approximation and estimation bounds for artificial neural networks. Machine Learning, 14(1):115–133, 1994
1994
-
[11]
Barron, Albert Cohen, Wolfgang Dahmen, and Ronald A
Andrew R. Barron, Albert Cohen, Wolfgang Dahmen, and Ronald A. DeVore. Approximation and learning by greedy algorithms.Annals of Statistics, 36(1):64–94, 2008
2008
-
[12]
A Lipschitz spaces view of infinitely wide shallow neural networks.SIAM Journal on Mathematical Analysis, 58(3):2786–2828, 2026
Francesca Bartolucci, Marcello Carioni, Jos´ e A Iglesias, Yury Korolev, Emanuele Naldi, and Stefano Vigogna. A Lipschitz spaces view of infinitely wide shallow neural networks.SIAM Journal on Mathematical Analysis, 58(3):2786–2828, 2026
2026
-
[13]
Understanding neural networks with reproducing kernel Banach spaces.Applied and Computational Harmonic Analysis, 62:194–236, 2023
Francesca Bartolucci, Ernesto De Vito, Lorenzo Rosasco, and Stefano Vigogna. Understanding neural networks with reproducing kernel Banach spaces.Applied and Computational Harmonic Analysis, 62:194–236, 2023
2023
-
[14]
Francesca Bartolucci, Ernesto De Vito, Lorenzo Rosasco, and Stefano Vigogna. Neural reproducing kernel Banach spaces and representer theorems for deep networks.arXiv preprint arXiv:2403.08750, 2024
arXiv 2024
-
[15]
On deep learning as a remedy for the curse of dimen- sionality in nonparametric regression.Annals of Statistics, 47(4):2261–2285, 2019
Benedikt Bauer and Michael Kohler. On deep learning as a remedy for the curse of dimen- sionality in nonparametric regression.Annals of Statistics, 47(4):2261–2285, 2019
2019
-
[16]
On the inductive bias of infinite-depth ResNets and the bottleneck rank
Enric Boix-Adsera. On the inductive bias of infinite-depth ResNets and the bottleneck rank. arXiv preprint arXiv:2501.19149, 2025
arXiv 2025
-
[17]
Bandeira
Nicolas Boumal, Vladislav Voroninski, and Afonso S. Bandeira. Deterministic guarantees for Burer–Monteiro factorizations of smooth semidefinite programs.Communications on Pure and Applied Mathematics, 73(3):581–608, 2020
2020
-
[18]
On representer theorems and convex regularization.SIAM Journal on Optimization, 29(2):1260–1281, 2019
Claire Boyer, Antonin Chambolle, Yohann De Castro, Vincent Duval, Fr´ ed´ eric De Gournay, and Pierre Weiss. On representer theorems and convex regularization.SIAM Journal on Optimization, 29(2):1260–1281, 2019
2019
-
[19]
Sparsity of solutions for variational inverse problems with finite-dimensional data.Calculus of Variations and Partial Differential Equations, 59(1):1–26, 2020
Kristian Bredies and Marcello Carioni. Sparsity of solutions for variational inverse problems with finite-dimensional data.Calculus of Variations and Partial Differential Equations, 59(1):1–26, 2020
2020
-
[20]
Inverse problems in spaces of measures
Kristian Bredies and Hanna Katriina Pikkarainen. Inverse problems in spaces of measures. ESAIM: Control, Optimisation and Calculus of Variations, 19(1):190–218, 2013
2013
-
[21]
Univer- sitext
Haim Brezis.Functional Analysis, Sobolev Spaces and Partial Differential Equations. Univer- sitext. Springer, 2011
2011
-
[22]
Monteiro
Samuel Burer and Renato D.C. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization.Mathematical Programming, 95(2):329–357, 2003
2003
-
[23]
Monteiro
Samuel Burer and Renato D.C. Monteiro. Local minima and convergence in low-rank semidef- inite programming.Mathematical Programming, 103(3):427–444, 2005
2005
-
[24]
Pitman Research Notes in Mathematics 207
Giuseppe Buttazzo.Semicontinuity, relaxation and integral representation in the calculus of variations. Pitman Research Notes in Mathematics 207. Longman, Harlow, 1989. 74
1989
-
[25]
Optimal approximation with sparsely connected deep neural networks.SIAM Journal on Mathematics of Data Science, 1(1):8–45, 2019
Helmut B¨ olcskei, Philipp Grohs, Gitta Kutyniok, and Philipp Petersen. Optimal approximation with sparsely connected deep neural networks.SIAM Journal on Mathematics of Data Science, 1(1):8–45, 2019
2019
-
[26]
Parrilo, and Alan S
Venkat Chandrasekaran, Benjamin Recht, Pablo A. Parrilo, and Alan S. Willsky. The convex geometry of linear inverse problems.Foundations of Computational Mathematics, 12(6):805–849, 2012
2012
-
[27]
Multi-layer neural networks as trainable ladders of Hilbert spaces
Zhengdao Chen. Multi-layer neural networks as trainable ladders of Hilbert spaces. In International Conference on Machine Learning, pages 4294–4329. PMLR, 2023
2023
-
[28]
Neural Hilbert ladders: Multi-layer neural networks in function space.Journal of Machine Learning Research, 25(109):1–65, 2024
Zhengdao Chen. Neural Hilbert ladders: Multi-layer neural networks in function space.Journal of Machine Learning Research, 25(109):1–65, 2024
2024
-
[29]
On the representation of solutions to elliptic PDEs in Barron spaces.Advances in Neural Information Processing Systems, 34:6454–6465, 2021
Ziang Chen, Jianfeng Lu, and Yulong Lu. On the representation of solutions to elliptic PDEs in Barron spaces.Advances in Neural Information Processing Systems, 34:6454–6465, 2021
2021
-
[30]
On the global convergence of gradient descent for over- parameterized models using optimal transport.Advances in Neural Information Processing Systems, 31, 2018
Lenaic Chizat and Francis Bach. On the global convergence of gradient descent for over- parameterized models using optimal transport.Advances in Neural Information Processing Systems, 31, 2018
2018
-
[31]
Springer, 2 edition, 1990
John B Conway.A Course in Functional Analysis, volume 96 ofGraduate Texts in Mathematics. Springer, 2 edition, 1990
1990
-
[32]
Compositional sparsity, approximation classes, and parametric transport equations.Constructive Approximation, 61(2):219–283, 2025
Wolfgang Dahmen. Compositional sparsity, approximation classes, and parametric transport equations.Constructive Approximation, 61(2):219–283, 2025
2025
-
[33]
Representation costs of linear neural networks: Analysis and design.Advances in Neural Information Processing Systems, 34:26884–26896, 2021
Zhen Dai, Mina Karzand, and Nathan Srebro. Representation costs of linear neural networks: Analysis and design.Advances in Neural Information Processing Systems, 34:26884–26896, 2021
2021
-
[34]
Birkh¨ auser Boston, 1993
Gianni Dal Maso.An introduction toΓ-convergence, volume 8 ofProgress in Nonlinear Differential Equations and Their Applications. Birkh¨ auser Boston, 1993
1993
-
[35]
SIAM, Philadelphia, PA, 1992
Ingrid Daubechies.Ten Lectures on Wavelets. SIAM, Philadelphia, PA, 1992
1992
-
[36]
An iterative thresholding algorithm for linear inverse problems with a sparsity constraint.Communications on Pure and Applied Mathematics, 57(11):1413–1457, 2004
Ingrid Daubechies, Michel Defrise, and Christine De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint.Communications on Pure and Applied Mathematics, 57(11):1413–1457, 2004
2004
-
[37]
Nonlinear approximation and (deep) ReLU networks.Constructive Approximation, 55(1):127– 172, 2022
Ingrid Daubechies, Ronald DeVore, Simon Foucart, Boris Hanin, and Guergana Petrova. Nonlinear approximation and (deep) ReLU networks.Constructive Approximation, 55(1):127– 172, 2022
2022
-
[38]
Carl de Boor and Robert E. Lynch. On splines and their minimum properties.Journal of Mathematics and Mechanics, 15(6):953–969, 1966
1966
-
[39]
Neural network approximation.Acta Numerica, 30:327–444, 2021
Ronald DeVore, Boris Hanin, and Guergana Petrova. Neural network approximation.Acta Numerica, 30:327–444, 2021
2021
-
[40]
Nowak, Rahul Parhi, and Jonathan W
Ronald DeVore, Robert D. Nowak, Rahul Parhi, and Jonathan W. Siegel. Weighted variation spaces and approximation by shallow ReLU networks.Applied and Computational Harmonic Analysis, 74(101713), 2025. 75
2025
-
[41]
Ronald A. DeVore. Nonlinear approximation.Acta Numerica, 7:51–150, 1998
1998
-
[42]
DeVore and George G
Ronald A. DeVore and George G. Lorentz.Constructive Approximation, volume 303 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin, Heidelberg, 1993
1993
-
[43]
DeVore and Robert C
Ronald A. DeVore and Robert C. Sharpley. Besov spaces on domains in Rd.Transactions of the American Mathematical Society, 335(2):843–864, 1993
1993
-
[44]
David L. Donoho. Unconditional bases are optimal bases for data compression and for statistical estimation.Applied and Computational Harmonic Analysis, 1(1):100–115, 1993
1993
-
[45]
David L. Donoho. High-dimensional data analysis: The curses and blessings of dimensionality,
-
[46]
AMS Mathematical Challenges Lecture
-
[47]
Donoho and Iain M
David L. Donoho and Iain M. Johnstone. Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3):425–455, 1994
1994
-
[48]
Donoho and Iain M
David L. Donoho and Iain M. Johnstone. Adapting to unknown smoothness via wavelet shrinkage.Journal of the American Statistical Association, 90(432):1200–1224, 1995
1995
-
[49]
Donoho and Iain M
David L. Donoho and Iain M. Johnstone. Minimax estimation via wavelet shrinkage.Annals of Statistics, 26(3):879–921, 1998
1998
-
[50]
The Barron space and the flow-induced function spaces for neural network models.Constructive Approximation, 55(1):369–406, 2022
Weinan E, Chao Ma, and Lei Wu. The Barron space and the flow-induced function spaces for neural network models.Constructive Approximation, 55(1):369–406, 2022
2022
-
[51]
On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics
Weinan E and Stephan Wojtowytsch. On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics. CSIAM Transactions on Applied Mathematics, 1(3):387–440, 2020
2020
-
[52]
Representation formulas and pointwise properties for Barron functions.Calculus of Variations and Partial Differential Equations, 61(2):46, 2022
Weinan E and Stephan Wojtowytsch. Representation formulas and pointwise properties for Barron functions.Calculus of Variations and Partial Differential Equations, 61(2):46, 2022
2022
-
[53]
SIAM, 1999
Ivar Ekeland and Roger Temam.Convex analysis and variational problems. SIAM, 1999
1999
-
[54]
Deep neural network approximation theory.IEEE Transactions on Information Theory, 67(5):2581–2623, 2021
Dennis Elbr¨ achter, Dmytro Perekrestenko, Philipp Grohs, and Helmut B¨ olcskei. Deep neural network approximation theory.IEEE Transactions on Information Theory, 67(5):2581–2623, 2021
2021
-
[55]
Elbr¨ achter, Julius Berner, and Philipp Grohs
Dennis M. Elbr¨ achter, Julius Berner, and Philipp Grohs. How degenerate is the parametrization of neural networks with the ReLU activation function?Advances in Neural Information Processing Systems, 32, 2019
2019
-
[56]
American mathematical society, 2nd edition, 2010
Lawrence C Evans.Partial differential equations, volume 19. American mathematical society, 2nd edition, 2010
2010
-
[57]
PhD thesis, Stanford University, 2002
Maryam Fazel.Matrix Rank Minimization with Applications. PhD thesis, Stanford University, 2002
2002
-
[58]
Fisher and Joseph W
Stephen D. Fisher and Joseph W. Jerome. Spline solutions to L1 extremal problems in one and several variables.Journal of Approximation Theory, 13(1):73–83, 1975
1975
-
[59]
Exact solutions of infinite dimensional total-variation regularized problems.Information and Inference: A Journal of the IMA, 8(3):407–443, 2019
Axel Flinth and Pierre Weiss. Exact solutions of infinite dimensional total-variation regularized problems.Information and Inference: A Journal of the IMA, 8(3):407–443, 2019. 76
2019
-
[60]
Sriperumbudur
Kenji Fukumizu, Gert Lanckriet, and Bharath K. Sriperumbudur. Learning in Hilbert vs. Banach spaces: A measure embedding viewpoint.Advances in Neural Information Processing Systems, 24, 2011
2011
-
[61]
A survey on Lipschitz-free Banach spaces.Commentationes Mathematicae, 55(2):89–118, 2015
Gilles Godefroy. A survey on Lipschitz-free Banach spaces.Commentationes Mathematicae, 55(2):89–118, 2015
2015
-
[62]
Least absolute shrinkage is equivalent to quadratic penalization
Yves Grandvalet. Least absolute shrinkage is equivalent to quadratic penalization. In International Conference on Artificial Neural Networks, pages 201–206. Springer, 1998
1998
-
[63]
Approximation spaces of deep neural networks.Constructive Approximation, 55(1):259–367, 2022
R´ emi Gribonval, Gitta Kutyniok, Morten Nielsen, and Felix Voigtlaender. Approximation spaces of deep neural networks.Constructive Approximation, 55(1):259–367, 2022
2022
-
[64]
Lee, Daniel Soudry, and Nati Srebro
Suriya Gunasekar, Jason D. Lee, Daniel Soudry, and Nati Srebro. Implicit bias of gradient descent on linear convolutional networks.Advances in Neural Information Processing Systems, 31, 2018
2018
-
[65]
Implicit regularization in matrix factorization.Advances in Neural Information Processing Systems, 30, 2017
Suriya Gunasekar, Blake E Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, and Nati Srebro. Implicit regularization in matrix factorization.Advances in Neural Information Processing Systems, 30, 2017
2017
-
[66]
Comparing biases for minimal network construction with back-propagation.Advances in Neural Information Processing Systems, 1, 1988
Stephen Hanson and Lorien Pratt. Comparing biases for minimal network construction with back-propagation.Advances in Neural Information Processing Systems, 1, 1988
1988
-
[67]
ReLU deep neural networks and linear finite elements.Journal of Computational Mathematics, 38(3):502–527, 2020
Juncai He, Lin Li, Jinchao Xu, and Chunyue Zheng. ReLU deep neural networks and linear finite elements.Journal of Computational Mathematics, 38(3):502–527, 2020
2020
-
[68]
Deep networks are reproducing kernel chains.arXiv preprint arXiv:2501.03697, 2025
Tjeerd Jan Heeringa, Len Spek, and Christoph Brune. Deep networks are reproducing kernel chains.arXiv preprint arXiv:2501.03697, 2025
arXiv 2025
-
[69]
Towards lower bounds on the depth of ReLU neural networks
Christoph Hertrich, Amitabh Basu, Marco Di Summa, and Martin Skutella. Towards lower bounds on the depth of ReLU neural networks. InAdvances in Neural Information Processing Systems, volume 34, pages 3336–3348, 2021
2021
-
[70]
Shuo Huang, Lorenzo Fiorito, Lorenzo Rosasco, and Tomaso Poggio. Learning sparse compo- sitional functions with norm-constrained neural networks.arXiv preprint arXiv:2605.25608, 2026
Pith/arXiv arXiv 2026
-
[71]
Hunter and Bruno Nachtergaele.Applied Analysis
John K. Hunter and Bruno Nachtergaele.Applied Analysis. World Scientific Publishing Company, 2001
2001
-
[72]
Bottleneck structure in learned features: Low-dimension vs regularity tradeoff
Arthur Jacot. Bottleneck structure in learned features: Low-dimension vs regularity tradeoff. Advances in Neural Information Processing Systems, 36:23607–23629, 2023
2023
-
[73]
Implicit bias of large depth networks: a notion of rank for nonlinear functions
Arthur Jacot. Implicit bias of large depth networks: a notion of rank for nonlinear functions. InInternational Conference on Learning Representations (ICLR), 2023
2023
-
[74]
Feature learning in L2-regularized DNNs: Attraction/repulsion and sparsity.Advances in Neural Information Processing Systems, 35:6763–6774, 2022
Arthur Jacot, Eugene Golikov, Cl´ ement Hongler, and Franck Gabriel. Feature learning in L2-regularized DNNs: Attraction/repulsion and sparsity.Advances in Neural Information Processing Systems, 35:6763–6774, 2022
2022
-
[75]
Kimeldorf and Grace Wahba
George S. Kimeldorf and Grace Wahba. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines.The Annals of Mathematical Statistics, 41(2):495–502, 1970. 77
1970
-
[76]
Kimeldorf and Grace Wahba
George S. Kimeldorf and Grace Wahba. Spline functions and stochastic processes.Sankhy¯ a: The Indian Journal of Statistics, Series A, pages 173–180, 1970
1970
-
[77]
Kimeldorf and Grace Wahba
George S. Kimeldorf and Grace Wahba. Some results on Tchebycheffian spline functions. Journal of mathematical analysis and applications, 33(1):82–95, 1971
1971
-
[78]
Two-layer neural networks with values in a Banach space.SIAM Journal on Mathematical Analysis, 54(6):6358–6389, 2022
Yury Korolev. Two-layer neural networks with values in a Banach space.SIAM Journal on Mathematical Analysis, 54(6):6358–6389, 2022
2022
-
[79]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks.Advances in Neural Information Processing Systems, 25, 2012
2012
-
[80]
A simple weight decay can improve generalization.Advances in Neural Information Processing Systems, 4, 1991
Anders Krogh and John Hertz. A simple weight decay can improve generalization.Advances in Neural Information Processing Systems, 4, 1991
1991
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.