pith. sign in

arxiv: 2606.14954 · v3 · pith:JA3NE6CJnew · submitted 2026-06-12 · 🧮 math.FA · cs.LG· math.OC· stat.ML

Representation Costs in Data Science: Foundations and the Quasi-Banach Spaces of Deep Neural Networks

Pith reviewed 2026-06-27 04:21 UTC · model grok-4.3

classification 🧮 math.FA cs.LGmath.OCstat.ML
keywords representation costsnative function spacesquasi-Banach spacesdeep neural networksReLU networksrepresenter theoremsparametric modelsinductive bias
0
0 comments X

The pith

Deep ReLU networks of depth L induce p-normable quasi-Banach native spaces with p equal to 2/L.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an abstract framework that defines the representation cost of any parametric model as the infimum of a parameter-space regularizer over all parameters realizing a given function. From this definition the framework extracts the model's induced native function space and proves that representer theorems and equivalences to nonparametric descriptions hold in the abstract setting. Classical cases recover reproducing kernel Hilbert spaces from kernel methods, Besov spaces from wavelets, and variation spaces from shallow networks. For feedforward ReLU networks of depth L the same construction yields p-normable quasi-Banach spaces with p = 2/L, showing that the representation-cost bias cannot be expressed by a norm once L exceeds 2.

Core claim

The central claim is that the representation cost of depth-L feedforward ReLU networks induces native spaces that are p-normable quasi-Banach spaces with p = 2/L. Consequently the inductive bias expressed by this cost cannot be captured by any norm when the depth is greater than 2.

What carries the argument

The representation cost, defined as the infimum of a parameter-space regularizer over all parameter vectors that realize a given function; this cost induces the native function space on which representer theorems are proved.

If this is right

  • Representer theorems hold for arbitrary parametric methods on their native spaces.
  • Sufficiently overparameterized parametric models become equivalent to their nonparametric counterparts on the native space.
  • Kernel methods, wavelets, and shallow networks appear as special cases inside the same abstract framework.
  • The native spaces of deep ReLU networks are quasi-Banach but not Banach when L > 2.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Optimization algorithms for deep networks may need to be redesigned around quasi-norms rather than norms once depth exceeds two.
  • Generalization bounds derived from norm-based complexity measures may need replacement by quasi-norm analogues for deep architectures.
  • The same abstract construction could be applied to other activation functions or architectures to classify their native spaces.

Load-bearing premise

Defining representation cost as the infimum of a parameter regularizer is assumed to produce a well-behaved native function space in which representer theorems and nonparametric equivalences hold.

What would settle it

An explicit construction, for depth-3 ReLU networks, of two functions f and g such that the representation cost of f + g exceeds the sum of the costs of f and g by an arbitrary factor would show the space fails to be normable.

read the original abstract

We develop a general framework for analyzing representation costs of parametric data-fitting methods through their parameter-space regularizers. From this abstract perspective, we define representation costs for arbitrary parametric models and reveal their induced (native) function spaces. This unifies recent function-space views of data-fitting methods. We also prove that many natural results hold in this abstract setting, including representer theorems for parametric methods on their native spaces. The framework also rigorously connects parametric methods with their equivalent nonparametric descriptions under sufficient overparameterization. Classical methods and their native spaces, such as kernel methods / reproducing kernel Hilbert spaces, wavelets / Besov spaces, and shallow neural networks / variation spaces emerge as special cases of our abstract framework. A byproduct of "axiomatizing" the study of representation costs is that we also immediately obtain new results for deep neural networks: For depth-$L$ feedforward ReLU networks, their induced native spaces are $p$-normable quasi-Banach spaces with $p = 2/L$. This reveals that the inductive bias of deep neural networks (as given by the representation cost) cannot be captured by norms for depths $L > 2$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript develops an abstract framework for representation costs of parametric data-fitting methods defined via infima of parameter-space regularizers. From this it defines induced native function spaces, proves representer theorems in the abstract setting, and establishes equivalences to nonparametric formulations under sufficient overparameterization. Classical cases (RKHS for kernels, Besov spaces for wavelets, variation spaces for shallow networks) are recovered as special cases. A central new result is that depth-L feedforward ReLU networks induce p-normable quasi-Banach native spaces with p = 2/L, implying that their representation-cost inductive bias cannot be captured by a norm when L > 2.

Significance. If the derivations hold, the framework supplies a unified language for representation costs across parametric models and yields a concrete, non-norm characterization of the function space bias of deep ReLU networks. Recovery of known spaces plus the explicit p = 2/L quasi-Banach claim for DNNs would constitute a substantive theoretical contribution to the function-space analysis of deep learning.

major comments (2)
  1. [Abstract / DNN native-space section] Abstract and the section introducing the DNN result: the claim that the native space is p-normable with p = 2/L is asserted as an immediate byproduct of the axiomatization, yet the manuscript must exhibit the explicit verification that the representation-cost functional satisfies the p-triangle inequality with this precise exponent; without that step the central claim for L > 2 rests on an uninspected derivation.
  2. [Framework definition] Definition of representation cost (the infimum over parameters realizing a given function): it is not yet shown that this functional is lower semi-continuous or satisfies the requisite quasi-norm axioms on the function space it induces; this property is load-bearing for both the representer theorem and the quasi-Banach conclusion.
minor comments (2)
  1. Notation for the parameter-space regularizer and the induced native-space quasi-norm should be introduced with a single consistent symbol and clearly distinguished from the classical norm case.
  2. The statement that 'many natural results hold in this abstract setting' would benefit from an enumerated list of the theorems proved, with pointers to their statements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive report and positive assessment of the framework's potential. We address the two major comments below, agreeing that explicit verifications strengthen the presentation and will be incorporated in revision.

read point-by-point responses
  1. Referee: [Abstract / DNN native-space section] Abstract and the section introducing the DNN result: the claim that the native space is p-normable with p = 2/L is asserted as an immediate byproduct of the axiomatization, yet the manuscript must exhibit the explicit verification that the representation-cost functional satisfies the p-triangle inequality with this precise exponent; without that step the central claim for L > 2 rests on an uninspected derivation.

    Authors: We agree that the p-triangle inequality requires explicit verification for the DNN case rather than relying solely on the abstract axiomatization. In the revised manuscript we will insert a new lemma (in the DNN native-space section) that directly computes the representation cost for depth-L ReLU networks and verifies the p-triangle inequality holds with exponent p = 2/L. This step-by-step derivation will be self-contained and independent of the general framework. revision: yes

  2. Referee: [Framework definition] Definition of representation cost (the infimum over parameters realizing a given function): it is not yet shown that this functional is lower semi-continuous or satisfies the requisite quasi-norm axioms on the function space it induces; this property is load-bearing for both the representer theorem and the quasi-Banach conclusion.

    Authors: The abstract framework assumes the parameter regularizer satisfies standard conditions that imply the induced representation cost is a quasi-norm; however, the referee is correct that lower semi-continuity with respect to the induced function-space topology is not stated explicitly. We will add a short lemma immediately after the definition of the representation cost that proves lower semi-continuity under the maintained hypotheses on the parameter map. This will also confirm the quasi-norm axioms hold on the native space. revision: yes

Circularity Check

0 steps flagged

No significant circularity; framework is definitional and self-contained

full rationale

The paper defines representation cost as the infimum of a parameter regularizer over realizations of a function, then derives the induced native space and its properties (including the p=2/L quasi-Banach structure for depth-L ReLU nets) directly from that definition applied to specific models. This recovers known spaces (RKHS, Besov, variation) as special cases without any reduction of the central claim to a fitted parameter, self-citation chain, or input-by-construction. The derivation is axiomatic rather than predictive, with no load-bearing step that collapses to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Review performed from abstract only; the ledger records the definitional starting points stated in the abstract.

axioms (2)
  • domain assumption Representation costs of parametric models are defined through infima of parameter-space regularizers.
    This is the explicit starting definition of the framework in the abstract.
  • domain assumption The induced native spaces admit representer theorems and nonparametric equivalences under sufficient overparameterization.
    The abstract states that these results hold in the abstract setting.
invented entities (1)
  • Native quasi-Banach spaces for depth-L ReLU networks with p=2/L no independent evidence
    purpose: To characterize the inductive bias induced by the representation cost of deep networks
    The abstract introduces these spaces as the output of applying the framework to DNNs; no independent evidence outside the derivation is mentioned.

pith-pipeline@v0.9.1-grok · 5744 in / 1441 out tokens · 42348 ms · 2026-06-27T04:21:08.780066+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

161 extracted references · 2 linked inside Pith

  1. [1]

    Fernando Albiac and Nigel J. Kalton. Lipschitz structure of quasi-Banach spaces.Israel Journal of Mathematics, 170(1):317–335, 2009

  2. [2]

    Locally bounded linear topological spaces.Proceedings of the Imperial Academy, 18(10):588–594, 1942

    Tosio Aoki. Locally bounded linear topological spaces.Proceedings of the Imperial Academy, 18(10):588–594, 1942

  3. [3]

    Theory of reproducing kernels.Transactions of the American Mathe- matical Society, 68(3):337–404, 1950

    Nachman Aronszajn. Theory of reproducing kernels.Transactions of the American Mathe- matical Society, 68(3):337–404, 1950

  4. [4]

    Understanding deep neural networks with rectified linear units

    Raman Arora, Amitabh Basu, Poorya Mianjy, and Anirbit Mukherjee. Understanding deep neural networks with rectified linear units. InInternational Conference on Learning Representations (ICLR), 2018

  5. [5]

    Implicit regularization in deep matrix factorization.Advances in Neural Information Processing Systems (NeurIPS), 32, 2019

    Sanjeev Arora, Nadav Cohen, Wei Hu, and Yuping Luo. Implicit regularization in deep matrix factorization.Advances in Neural Information Processing Systems (NeurIPS), 32, 2019

  6. [6]

    Breaking the curse of dimensionality with convex neural networks.Journal of Machine Learning Research, 18(1):629–681, 2017

    Francis Bach. Breaking the curse of dimensionality with convex neural networks.Journal of Machine Learning Research, 18(1):629–681, 2017

  7. [7]

    Optimization with sparsity-inducing penalties.Foundations and Trends®in Machine Learning, 4(1):1–106, 2012

    Francis Bach, Rodolphe Jenatton, Julien Mairal, and Guillaume Obozinski. Optimization with sparsity-inducing penalties.Foundations and Trends®in Machine Learning, 4(1):1–106, 2012

  8. [8]

    Better neural network expressivity: subdividing the simplex.arXiv preprint arXiv:2505.14338, 2025

    Egor Bakaev, Florestan Brunck, Christoph Hertrich, Jack Stade, and Amir Yehudayoff. Better neural network expressivity: subdividing the simplex.arXiv preprint arXiv:2505.14338, 2025. 73

  9. [9]

    Andrew R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3):930–945, 1993

  10. [10]

    Andrew R. Barron. Approximation and estimation bounds for artificial neural networks. Machine Learning, 14(1):115–133, 1994

  11. [11]

    Barron, Albert Cohen, Wolfgang Dahmen, and Ronald A

    Andrew R. Barron, Albert Cohen, Wolfgang Dahmen, and Ronald A. DeVore. Approximation and learning by greedy algorithms.Annals of Statistics, 36(1):64–94, 2008

  12. [12]

    A Lipschitz spaces view of infinitely wide shallow neural networks.SIAM Journal on Mathematical Analysis, 58(3):2786–2828, 2026

    Francesca Bartolucci, Marcello Carioni, Jos´ e A Iglesias, Yury Korolev, Emanuele Naldi, and Stefano Vigogna. A Lipschitz spaces view of infinitely wide shallow neural networks.SIAM Journal on Mathematical Analysis, 58(3):2786–2828, 2026

  13. [13]

    Understanding neural networks with reproducing kernel Banach spaces.Applied and Computational Harmonic Analysis, 62:194–236, 2023

    Francesca Bartolucci, Ernesto De Vito, Lorenzo Rosasco, and Stefano Vigogna. Understanding neural networks with reproducing kernel Banach spaces.Applied and Computational Harmonic Analysis, 62:194–236, 2023

  14. [14]

    Neural reproducing kernel Banach spaces and representer theorems for deep networks.arXiv preprint arXiv:2403.08750, 2024

    Francesca Bartolucci, Ernesto De Vito, Lorenzo Rosasco, and Stefano Vigogna. Neural reproducing kernel Banach spaces and representer theorems for deep networks.arXiv preprint arXiv:2403.08750, 2024

  15. [15]

    On deep learning as a remedy for the curse of dimen- sionality in nonparametric regression.Annals of Statistics, 47(4):2261–2285, 2019

    Benedikt Bauer and Michael Kohler. On deep learning as a remedy for the curse of dimen- sionality in nonparametric regression.Annals of Statistics, 47(4):2261–2285, 2019

  16. [16]

    On the inductive bias of infinite-depth ResNets and the bottleneck rank

    Enric Boix-Adsera. On the inductive bias of infinite-depth ResNets and the bottleneck rank. arXiv preprint arXiv:2501.19149, 2025

  17. [17]

    Bandeira

    Nicolas Boumal, Vladislav Voroninski, and Afonso S. Bandeira. Deterministic guarantees for Burer–Monteiro factorizations of smooth semidefinite programs.Communications on Pure and Applied Mathematics, 73(3):581–608, 2020

  18. [18]

    On representer theorems and convex regularization.SIAM Journal on Optimization, 29(2):1260–1281, 2019

    Claire Boyer, Antonin Chambolle, Yohann De Castro, Vincent Duval, Fr´ ed´ eric De Gournay, and Pierre Weiss. On representer theorems and convex regularization.SIAM Journal on Optimization, 29(2):1260–1281, 2019

  19. [19]

    Sparsity of solutions for variational inverse problems with finite-dimensional data.Calculus of Variations and Partial Differential Equations, 59(1):1–26, 2020

    Kristian Bredies and Marcello Carioni. Sparsity of solutions for variational inverse problems with finite-dimensional data.Calculus of Variations and Partial Differential Equations, 59(1):1–26, 2020

  20. [20]

    Inverse problems in spaces of measures

    Kristian Bredies and Hanna Katriina Pikkarainen. Inverse problems in spaces of measures. ESAIM: Control, Optimisation and Calculus of Variations, 19(1):190–218, 2013

  21. [21]

    Univer- sitext

    Haim Brezis.Functional Analysis, Sobolev Spaces and Partial Differential Equations. Univer- sitext. Springer, 2011

  22. [22]

    Monteiro

    Samuel Burer and Renato D.C. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization.Mathematical Programming, 95(2):329–357, 2003

  23. [23]

    Monteiro

    Samuel Burer and Renato D.C. Monteiro. Local minima and convergence in low-rank semidef- inite programming.Mathematical Programming, 103(3):427–444, 2005

  24. [24]

    Pitman Research Notes in Mathematics 207

    Giuseppe Buttazzo.Semicontinuity, relaxation and integral representation in the calculus of variations. Pitman Research Notes in Mathematics 207. Longman, Harlow, 1989. 74

  25. [25]

    Optimal approximation with sparsely connected deep neural networks.SIAM Journal on Mathematics of Data Science, 1(1):8–45, 2019

    Helmut B¨ olcskei, Philipp Grohs, Gitta Kutyniok, and Philipp Petersen. Optimal approximation with sparsely connected deep neural networks.SIAM Journal on Mathematics of Data Science, 1(1):8–45, 2019

  26. [26]

    Parrilo, and Alan S

    Venkat Chandrasekaran, Benjamin Recht, Pablo A. Parrilo, and Alan S. Willsky. The convex geometry of linear inverse problems.Foundations of Computational Mathematics, 12(6):805–849, 2012

  27. [27]

    Multi-layer neural networks as trainable ladders of Hilbert spaces

    Zhengdao Chen. Multi-layer neural networks as trainable ladders of Hilbert spaces. In International Conference on Machine Learning, pages 4294–4329. PMLR, 2023

  28. [28]

    Neural Hilbert ladders: Multi-layer neural networks in function space.Journal of Machine Learning Research, 25(109):1–65, 2024

    Zhengdao Chen. Neural Hilbert ladders: Multi-layer neural networks in function space.Journal of Machine Learning Research, 25(109):1–65, 2024

  29. [29]

    On the representation of solutions to elliptic PDEs in Barron spaces.Advances in Neural Information Processing Systems, 34:6454–6465, 2021

    Ziang Chen, Jianfeng Lu, and Yulong Lu. On the representation of solutions to elliptic PDEs in Barron spaces.Advances in Neural Information Processing Systems, 34:6454–6465, 2021

  30. [30]

    On the global convergence of gradient descent for over- parameterized models using optimal transport.Advances in Neural Information Processing Systems, 31, 2018

    Lenaic Chizat and Francis Bach. On the global convergence of gradient descent for over- parameterized models using optimal transport.Advances in Neural Information Processing Systems, 31, 2018

  31. [31]

    Springer, 2 edition, 1990

    John B Conway.A Course in Functional Analysis, volume 96 ofGraduate Texts in Mathematics. Springer, 2 edition, 1990

  32. [32]

    Compositional sparsity, approximation classes, and parametric transport equations.Constructive Approximation, 61(2):219–283, 2025

    Wolfgang Dahmen. Compositional sparsity, approximation classes, and parametric transport equations.Constructive Approximation, 61(2):219–283, 2025

  33. [33]

    Representation costs of linear neural networks: Analysis and design.Advances in Neural Information Processing Systems, 34:26884–26896, 2021

    Zhen Dai, Mina Karzand, and Nathan Srebro. Representation costs of linear neural networks: Analysis and design.Advances in Neural Information Processing Systems, 34:26884–26896, 2021

  34. [34]

    Birkh¨ auser Boston, 1993

    Gianni Dal Maso.An introduction toΓ-convergence, volume 8 ofProgress in Nonlinear Differential Equations and Their Applications. Birkh¨ auser Boston, 1993

  35. [35]

    SIAM, Philadelphia, PA, 1992

    Ingrid Daubechies.Ten Lectures on Wavelets. SIAM, Philadelphia, PA, 1992

  36. [36]

    An iterative thresholding algorithm for linear inverse problems with a sparsity constraint.Communications on Pure and Applied Mathematics, 57(11):1413–1457, 2004

    Ingrid Daubechies, Michel Defrise, and Christine De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint.Communications on Pure and Applied Mathematics, 57(11):1413–1457, 2004

  37. [37]

    Nonlinear approximation and (deep) ReLU networks.Constructive Approximation, 55(1):127– 172, 2022

    Ingrid Daubechies, Ronald DeVore, Simon Foucart, Boris Hanin, and Guergana Petrova. Nonlinear approximation and (deep) ReLU networks.Constructive Approximation, 55(1):127– 172, 2022

  38. [38]

    Carl de Boor and Robert E. Lynch. On splines and their minimum properties.Journal of Mathematics and Mechanics, 15(6):953–969, 1966

  39. [39]

    Neural network approximation.Acta Numerica, 30:327–444, 2021

    Ronald DeVore, Boris Hanin, and Guergana Petrova. Neural network approximation.Acta Numerica, 30:327–444, 2021

  40. [40]

    Nowak, Rahul Parhi, and Jonathan W

    Ronald DeVore, Robert D. Nowak, Rahul Parhi, and Jonathan W. Siegel. Weighted variation spaces and approximation by shallow ReLU networks.Applied and Computational Harmonic Analysis, 74(101713), 2025. 75

  41. [41]

    Ronald A. DeVore. Nonlinear approximation.Acta Numerica, 7:51–150, 1998

  42. [42]

    DeVore and George G

    Ronald A. DeVore and George G. Lorentz.Constructive Approximation, volume 303 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin, Heidelberg, 1993

  43. [43]

    DeVore and Robert C

    Ronald A. DeVore and Robert C. Sharpley. Besov spaces on domains in Rd.Transactions of the American Mathematical Society, 335(2):843–864, 1993

  44. [44]

    David L. Donoho. Unconditional bases are optimal bases for data compression and for statistical estimation.Applied and Computational Harmonic Analysis, 1(1):100–115, 1993

  45. [45]

    David L. Donoho. High-dimensional data analysis: The curses and blessings of dimensionality,

  46. [46]

    AMS Mathematical Challenges Lecture

  47. [47]

    Donoho and Iain M

    David L. Donoho and Iain M. Johnstone. Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3):425–455, 1994

  48. [48]

    Donoho and Iain M

    David L. Donoho and Iain M. Johnstone. Adapting to unknown smoothness via wavelet shrinkage.Journal of the American Statistical Association, 90(432):1200–1224, 1995

  49. [49]

    Donoho and Iain M

    David L. Donoho and Iain M. Johnstone. Minimax estimation via wavelet shrinkage.Annals of Statistics, 26(3):879–921, 1998

  50. [50]

    The Barron space and the flow-induced function spaces for neural network models.Constructive Approximation, 55(1):369–406, 2022

    Weinan E, Chao Ma, and Lei Wu. The Barron space and the flow-induced function spaces for neural network models.Constructive Approximation, 55(1):369–406, 2022

  51. [51]

    On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics

    Weinan E and Stephan Wojtowytsch. On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics. CSIAM Transactions on Applied Mathematics, 1(3):387–440, 2020

  52. [52]

    Representation formulas and pointwise properties for Barron functions.Calculus of Variations and Partial Differential Equations, 61(2):46, 2022

    Weinan E and Stephan Wojtowytsch. Representation formulas and pointwise properties for Barron functions.Calculus of Variations and Partial Differential Equations, 61(2):46, 2022

  53. [53]

    SIAM, 1999

    Ivar Ekeland and Roger Temam.Convex analysis and variational problems. SIAM, 1999

  54. [54]

    Deep neural network approximation theory.IEEE Transactions on Information Theory, 67(5):2581–2623, 2021

    Dennis Elbr¨ achter, Dmytro Perekrestenko, Philipp Grohs, and Helmut B¨ olcskei. Deep neural network approximation theory.IEEE Transactions on Information Theory, 67(5):2581–2623, 2021

  55. [55]

    Elbr¨ achter, Julius Berner, and Philipp Grohs

    Dennis M. Elbr¨ achter, Julius Berner, and Philipp Grohs. How degenerate is the parametrization of neural networks with the ReLU activation function?Advances in Neural Information Processing Systems, 32, 2019

  56. [56]

    American mathematical society, 2nd edition, 2010

    Lawrence C Evans.Partial differential equations, volume 19. American mathematical society, 2nd edition, 2010

  57. [57]

    PhD thesis, Stanford University, 2002

    Maryam Fazel.Matrix Rank Minimization with Applications. PhD thesis, Stanford University, 2002

  58. [58]

    Fisher and Joseph W

    Stephen D. Fisher and Joseph W. Jerome. Spline solutions to L1 extremal problems in one and several variables.Journal of Approximation Theory, 13(1):73–83, 1975

  59. [59]

    Exact solutions of infinite dimensional total-variation regularized problems.Information and Inference: A Journal of the IMA, 8(3):407–443, 2019

    Axel Flinth and Pierre Weiss. Exact solutions of infinite dimensional total-variation regularized problems.Information and Inference: A Journal of the IMA, 8(3):407–443, 2019. 76

  60. [60]

    Sriperumbudur

    Kenji Fukumizu, Gert Lanckriet, and Bharath K. Sriperumbudur. Learning in Hilbert vs. Banach spaces: A measure embedding viewpoint.Advances in Neural Information Processing Systems, 24, 2011

  61. [61]

    A survey on Lipschitz-free Banach spaces.Commentationes Mathematicae, 55(2):89–118, 2015

    Gilles Godefroy. A survey on Lipschitz-free Banach spaces.Commentationes Mathematicae, 55(2):89–118, 2015

  62. [62]

    Least absolute shrinkage is equivalent to quadratic penalization

    Yves Grandvalet. Least absolute shrinkage is equivalent to quadratic penalization. In International Conference on Artificial Neural Networks, pages 201–206. Springer, 1998

  63. [63]

    Approximation spaces of deep neural networks.Constructive Approximation, 55(1):259–367, 2022

    R´ emi Gribonval, Gitta Kutyniok, Morten Nielsen, and Felix Voigtlaender. Approximation spaces of deep neural networks.Constructive Approximation, 55(1):259–367, 2022

  64. [64]

    Lee, Daniel Soudry, and Nati Srebro

    Suriya Gunasekar, Jason D. Lee, Daniel Soudry, and Nati Srebro. Implicit bias of gradient descent on linear convolutional networks.Advances in Neural Information Processing Systems, 31, 2018

  65. [65]

    Implicit regularization in matrix factorization.Advances in Neural Information Processing Systems, 30, 2017

    Suriya Gunasekar, Blake E Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, and Nati Srebro. Implicit regularization in matrix factorization.Advances in Neural Information Processing Systems, 30, 2017

  66. [66]

    Comparing biases for minimal network construction with back-propagation.Advances in Neural Information Processing Systems, 1, 1988

    Stephen Hanson and Lorien Pratt. Comparing biases for minimal network construction with back-propagation.Advances in Neural Information Processing Systems, 1, 1988

  67. [67]

    ReLU deep neural networks and linear finite elements.Journal of Computational Mathematics, 38(3):502–527, 2020

    Juncai He, Lin Li, Jinchao Xu, and Chunyue Zheng. ReLU deep neural networks and linear finite elements.Journal of Computational Mathematics, 38(3):502–527, 2020

  68. [68]

    Deep networks are reproducing kernel chains.arXiv preprint arXiv:2501.03697, 2025

    Tjeerd Jan Heeringa, Len Spek, and Christoph Brune. Deep networks are reproducing kernel chains.arXiv preprint arXiv:2501.03697, 2025

  69. [69]

    Towards lower bounds on the depth of ReLU neural networks

    Christoph Hertrich, Amitabh Basu, Marco Di Summa, and Martin Skutella. Towards lower bounds on the depth of ReLU neural networks. InAdvances in Neural Information Processing Systems, volume 34, pages 3336–3348, 2021

  70. [70]

    Learning sparse compo- sitional functions with norm-constrained neural networks.arXiv preprint arXiv:2605.25608, 2026

    Shuo Huang, Lorenzo Fiorito, Lorenzo Rosasco, and Tomaso Poggio. Learning sparse compo- sitional functions with norm-constrained neural networks.arXiv preprint arXiv:2605.25608, 2026

  71. [71]

    Hunter and Bruno Nachtergaele.Applied Analysis

    John K. Hunter and Bruno Nachtergaele.Applied Analysis. World Scientific Publishing Company, 2001

  72. [72]

    Bottleneck structure in learned features: Low-dimension vs regularity tradeoff

    Arthur Jacot. Bottleneck structure in learned features: Low-dimension vs regularity tradeoff. Advances in Neural Information Processing Systems, 36:23607–23629, 2023

  73. [73]

    Implicit bias of large depth networks: a notion of rank for nonlinear functions

    Arthur Jacot. Implicit bias of large depth networks: a notion of rank for nonlinear functions. InInternational Conference on Learning Representations (ICLR), 2023

  74. [74]

    Feature learning in L2-regularized DNNs: Attraction/repulsion and sparsity.Advances in Neural Information Processing Systems, 35:6763–6774, 2022

    Arthur Jacot, Eugene Golikov, Cl´ ement Hongler, and Franck Gabriel. Feature learning in L2-regularized DNNs: Attraction/repulsion and sparsity.Advances in Neural Information Processing Systems, 35:6763–6774, 2022

  75. [75]

    Kimeldorf and Grace Wahba

    George S. Kimeldorf and Grace Wahba. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines.The Annals of Mathematical Statistics, 41(2):495–502, 1970. 77

  76. [76]

    Kimeldorf and Grace Wahba

    George S. Kimeldorf and Grace Wahba. Spline functions and stochastic processes.Sankhy¯ a: The Indian Journal of Statistics, Series A, pages 173–180, 1970

  77. [77]

    Kimeldorf and Grace Wahba

    George S. Kimeldorf and Grace Wahba. Some results on Tchebycheffian spline functions. Journal of mathematical analysis and applications, 33(1):82–95, 1971

  78. [78]

    Two-layer neural networks with values in a Banach space.SIAM Journal on Mathematical Analysis, 54(6):6358–6389, 2022

    Yury Korolev. Two-layer neural networks with values in a Banach space.SIAM Journal on Mathematical Analysis, 54(6):6358–6389, 2022

  79. [79]

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks.Advances in Neural Information Processing Systems, 25, 2012

  80. [80]

    A simple weight decay can improve generalization.Advances in Neural Information Processing Systems, 4, 1991

    Anders Krogh and John Hertz. A simple weight decay can improve generalization.Advances in Neural Information Processing Systems, 4, 1991

Showing first 80 references.