Representation Costs in Data Science: Foundations and the Quasi-Banach Spaces of Deep Neural Networks

Greg Ongie; Rahul Parhi

arxiv: 2606.14954 · v3 · pith:JA3NE6CJnew · submitted 2026-06-12 · 🧮 math.FA · cs.LG· math.OC· stat.ML

Representation Costs in Data Science: Foundations and the Quasi-Banach Spaces of Deep Neural Networks

Greg Ongie , Rahul Parhi This is my paper

Pith reviewed 2026-06-27 04:21 UTC · model grok-4.3

classification 🧮 math.FA cs.LGmath.OCstat.ML

keywords representation costsnative function spacesquasi-Banach spacesdeep neural networksReLU networksrepresenter theoremsparametric modelsinductive bias

0 comments

The pith

Deep ReLU networks of depth L induce p-normable quasi-Banach native spaces with p equal to 2/L.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an abstract framework that defines the representation cost of any parametric model as the infimum of a parameter-space regularizer over all parameters realizing a given function. From this definition the framework extracts the model's induced native function space and proves that representer theorems and equivalences to nonparametric descriptions hold in the abstract setting. Classical cases recover reproducing kernel Hilbert spaces from kernel methods, Besov spaces from wavelets, and variation spaces from shallow networks. For feedforward ReLU networks of depth L the same construction yields p-normable quasi-Banach spaces with p = 2/L, showing that the representation-cost bias cannot be expressed by a norm once L exceeds 2.

Core claim

The central claim is that the representation cost of depth-L feedforward ReLU networks induces native spaces that are p-normable quasi-Banach spaces with p = 2/L. Consequently the inductive bias expressed by this cost cannot be captured by any norm when the depth is greater than 2.

What carries the argument

The representation cost, defined as the infimum of a parameter-space regularizer over all parameter vectors that realize a given function; this cost induces the native function space on which representer theorems are proved.

If this is right

Representer theorems hold for arbitrary parametric methods on their native spaces.
Sufficiently overparameterized parametric models become equivalent to their nonparametric counterparts on the native space.
Kernel methods, wavelets, and shallow networks appear as special cases inside the same abstract framework.
The native spaces of deep ReLU networks are quasi-Banach but not Banach when L > 2.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Optimization algorithms for deep networks may need to be redesigned around quasi-norms rather than norms once depth exceeds two.
Generalization bounds derived from norm-based complexity measures may need replacement by quasi-norm analogues for deep architectures.
The same abstract construction could be applied to other activation functions or architectures to classify their native spaces.

Load-bearing premise

Defining representation cost as the infimum of a parameter regularizer is assumed to produce a well-behaved native function space in which representer theorems and nonparametric equivalences hold.

What would settle it

An explicit construction, for depth-3 ReLU networks, of two functions f and g such that the representation cost of f + g exceeds the sum of the costs of f and g by an arbitrary factor would show the space fails to be normable.

read the original abstract

We develop a general framework for analyzing representation costs of parametric data-fitting methods through their parameter-space regularizers. From this abstract perspective, we define representation costs for arbitrary parametric models and reveal their induced (native) function spaces. This unifies recent function-space views of data-fitting methods. We also prove that many natural results hold in this abstract setting, including representer theorems for parametric methods on their native spaces. The framework also rigorously connects parametric methods with their equivalent nonparametric descriptions under sufficient overparameterization. Classical methods and their native spaces, such as kernel methods / reproducing kernel Hilbert spaces, wavelets / Besov spaces, and shallow neural networks / variation spaces emerge as special cases of our abstract framework. A byproduct of "axiomatizing" the study of representation costs is that we also immediately obtain new results for deep neural networks: For depth-$L$ feedforward ReLU networks, their induced native spaces are $p$-normable quasi-Banach spaces with $p = 2/L$. This reveals that the inductive bias of deep neural networks (as given by the representation cost) cannot be captured by norms for depths $L > 2$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sets up an abstract framework for representation costs that recovers the classical cases and derives quasi-Banach native spaces with p=2/L for depth-L ReLU nets as a direct consequence.

read the letter

The core contribution is an axiomatic setup that defines representation cost for any parametric model via infimum over parameters realizing a function, then identifies the induced native space. This recovers RKHS for kernels, Besov spaces for wavelets, and variation spaces for shallow nets without extra work. The new piece is the application to feedforward ReLU networks of depth L, where the native space turns out to be p-normable quasi-Banach with p exactly 2/L. That immediately implies the inductive bias cannot be a norm once L exceeds 2.

The framework also yields representer theorems and nonparametric equivalences under overparameterization in the general setting. These look like straightforward consequences of the definitions rather than deep new analysis, but they are cleanly stated and unify several lines that had been treated separately.

The main uncertainty is whether the p=2/L identification survives the details of how the representation cost is instantiated for the network parameters. The abstract presents it as immediate from the axiomatization, which would be strong if the steps check out, but the derivation itself is not visible here. If the parameter-space regularizer is chosen in a way that forces the quasi-norm structure, the result is structural rather than surprising; if it follows from the network architecture alone, it is more interesting. Either way, the claim is load-bearing and needs the full proofs.

This is aimed at researchers already working on function-space characterizations of learning methods. Someone who wants a single language that covers kernels through deep nets will find the unification useful. The paper shows clear engagement with the literature on the cited special cases and does not appear internally contradictory. It deserves a serious referee to check the derivations and see whether the quasi-Banach observation leads to new consequences beyond the statement itself.

Referee Report

2 major / 2 minor

Summary. The manuscript develops an abstract framework for representation costs of parametric data-fitting methods defined via infima of parameter-space regularizers. From this it defines induced native function spaces, proves representer theorems in the abstract setting, and establishes equivalences to nonparametric formulations under sufficient overparameterization. Classical cases (RKHS for kernels, Besov spaces for wavelets, variation spaces for shallow networks) are recovered as special cases. A central new result is that depth-L feedforward ReLU networks induce p-normable quasi-Banach native spaces with p = 2/L, implying that their representation-cost inductive bias cannot be captured by a norm when L > 2.

Significance. If the derivations hold, the framework supplies a unified language for representation costs across parametric models and yields a concrete, non-norm characterization of the function space bias of deep ReLU networks. Recovery of known spaces plus the explicit p = 2/L quasi-Banach claim for DNNs would constitute a substantive theoretical contribution to the function-space analysis of deep learning.

major comments (2)

[Abstract / DNN native-space section] Abstract and the section introducing the DNN result: the claim that the native space is p-normable with p = 2/L is asserted as an immediate byproduct of the axiomatization, yet the manuscript must exhibit the explicit verification that the representation-cost functional satisfies the p-triangle inequality with this precise exponent; without that step the central claim for L > 2 rests on an uninspected derivation.
[Framework definition] Definition of representation cost (the infimum over parameters realizing a given function): it is not yet shown that this functional is lower semi-continuous or satisfies the requisite quasi-norm axioms on the function space it induces; this property is load-bearing for both the representer theorem and the quasi-Banach conclusion.

minor comments (2)

Notation for the parameter-space regularizer and the induced native-space quasi-norm should be introduced with a single consistent symbol and clearly distinguished from the classical norm case.
The statement that 'many natural results hold in this abstract setting' would benefit from an enumerated list of the theorems proved, with pointers to their statements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive report and positive assessment of the framework's potential. We address the two major comments below, agreeing that explicit verifications strengthen the presentation and will be incorporated in revision.

read point-by-point responses

Referee: [Abstract / DNN native-space section] Abstract and the section introducing the DNN result: the claim that the native space is p-normable with p = 2/L is asserted as an immediate byproduct of the axiomatization, yet the manuscript must exhibit the explicit verification that the representation-cost functional satisfies the p-triangle inequality with this precise exponent; without that step the central claim for L > 2 rests on an uninspected derivation.

Authors: We agree that the p-triangle inequality requires explicit verification for the DNN case rather than relying solely on the abstract axiomatization. In the revised manuscript we will insert a new lemma (in the DNN native-space section) that directly computes the representation cost for depth-L ReLU networks and verifies the p-triangle inequality holds with exponent p = 2/L. This step-by-step derivation will be self-contained and independent of the general framework. revision: yes
Referee: [Framework definition] Definition of representation cost (the infimum over parameters realizing a given function): it is not yet shown that this functional is lower semi-continuous or satisfies the requisite quasi-norm axioms on the function space it induces; this property is load-bearing for both the representer theorem and the quasi-Banach conclusion.

Authors: The abstract framework assumes the parameter regularizer satisfies standard conditions that imply the induced representation cost is a quasi-norm; however, the referee is correct that lower semi-continuity with respect to the induced function-space topology is not stated explicitly. We will add a short lemma immediately after the definition of the representation cost that proves lower semi-continuity under the maintained hypotheses on the parameter map. This will also confirm the quasi-norm axioms hold on the native space. revision: yes

Circularity Check

0 steps flagged

No significant circularity; framework is definitional and self-contained

full rationale

The paper defines representation cost as the infimum of a parameter regularizer over realizations of a function, then derives the induced native space and its properties (including the p=2/L quasi-Banach structure for depth-L ReLU nets) directly from that definition applied to specific models. This recovers known spaces (RKHS, Besov, variation) as special cases without any reduction of the central claim to a fitted parameter, self-citation chain, or input-by-construction. The derivation is axiomatic rather than predictive, with no load-bearing step that collapses to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Review performed from abstract only; the ledger records the definitional starting points stated in the abstract.

axioms (2)

domain assumption Representation costs of parametric models are defined through infima of parameter-space regularizers.
This is the explicit starting definition of the framework in the abstract.
domain assumption The induced native spaces admit representer theorems and nonparametric equivalences under sufficient overparameterization.
The abstract states that these results hold in the abstract setting.

invented entities (1)

Native quasi-Banach spaces for depth-L ReLU networks with p=2/L no independent evidence
purpose: To characterize the inductive bias induced by the representation cost of deep networks
The abstract introduces these spaces as the output of applying the framework to DNNs; no independent evidence outside the derivation is mentioned.

pith-pipeline@v0.9.1-grok · 5744 in / 1441 out tokens · 42348 ms · 2026-06-27T04:21:08.780066+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

161 extracted references · 2 linked inside Pith

[1]

Fernando Albiac and Nigel J. Kalton. Lipschitz structure of quasi-Banach spaces.Israel Journal of Mathematics, 170(1):317–335, 2009

2009
[2]

Locally bounded linear topological spaces.Proceedings of the Imperial Academy, 18(10):588–594, 1942

Tosio Aoki. Locally bounded linear topological spaces.Proceedings of the Imperial Academy, 18(10):588–594, 1942

1942
[3]

Theory of reproducing kernels.Transactions of the American Mathe- matical Society, 68(3):337–404, 1950

Nachman Aronszajn. Theory of reproducing kernels.Transactions of the American Mathe- matical Society, 68(3):337–404, 1950

1950
[4]

Understanding deep neural networks with rectified linear units

Raman Arora, Amitabh Basu, Poorya Mianjy, and Anirbit Mukherjee. Understanding deep neural networks with rectified linear units. InInternational Conference on Learning Representations (ICLR), 2018

2018
[5]

Implicit regularization in deep matrix factorization.Advances in Neural Information Processing Systems (NeurIPS), 32, 2019

Sanjeev Arora, Nadav Cohen, Wei Hu, and Yuping Luo. Implicit regularization in deep matrix factorization.Advances in Neural Information Processing Systems (NeurIPS), 32, 2019

2019
[6]

Breaking the curse of dimensionality with convex neural networks.Journal of Machine Learning Research, 18(1):629–681, 2017

Francis Bach. Breaking the curse of dimensionality with convex neural networks.Journal of Machine Learning Research, 18(1):629–681, 2017

2017
[7]

Optimization with sparsity-inducing penalties.Foundations and Trends®in Machine Learning, 4(1):1–106, 2012

Francis Bach, Rodolphe Jenatton, Julien Mairal, and Guillaume Obozinski. Optimization with sparsity-inducing penalties.Foundations and Trends®in Machine Learning, 4(1):1–106, 2012

2012
[8]

Better neural network expressivity: subdividing the simplex.arXiv preprint arXiv:2505.14338, 2025

Egor Bakaev, Florestan Brunck, Christoph Hertrich, Jack Stade, and Amir Yehudayoff. Better neural network expressivity: subdividing the simplex.arXiv preprint arXiv:2505.14338, 2025. 73

arXiv 2025
[9]

Andrew R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3):930–945, 1993

1993
[10]

Andrew R. Barron. Approximation and estimation bounds for artificial neural networks. Machine Learning, 14(1):115–133, 1994

1994
[11]

Barron, Albert Cohen, Wolfgang Dahmen, and Ronald A

Andrew R. Barron, Albert Cohen, Wolfgang Dahmen, and Ronald A. DeVore. Approximation and learning by greedy algorithms.Annals of Statistics, 36(1):64–94, 2008

2008
[12]

A Lipschitz spaces view of infinitely wide shallow neural networks.SIAM Journal on Mathematical Analysis, 58(3):2786–2828, 2026

Francesca Bartolucci, Marcello Carioni, Jos´ e A Iglesias, Yury Korolev, Emanuele Naldi, and Stefano Vigogna. A Lipschitz spaces view of infinitely wide shallow neural networks.SIAM Journal on Mathematical Analysis, 58(3):2786–2828, 2026

2026
[13]

Understanding neural networks with reproducing kernel Banach spaces.Applied and Computational Harmonic Analysis, 62:194–236, 2023

Francesca Bartolucci, Ernesto De Vito, Lorenzo Rosasco, and Stefano Vigogna. Understanding neural networks with reproducing kernel Banach spaces.Applied and Computational Harmonic Analysis, 62:194–236, 2023

2023
[14]

Neural reproducing kernel Banach spaces and representer theorems for deep networks.arXiv preprint arXiv:2403.08750, 2024

Francesca Bartolucci, Ernesto De Vito, Lorenzo Rosasco, and Stefano Vigogna. Neural reproducing kernel Banach spaces and representer theorems for deep networks.arXiv preprint arXiv:2403.08750, 2024

arXiv 2024
[15]

On deep learning as a remedy for the curse of dimen- sionality in nonparametric regression.Annals of Statistics, 47(4):2261–2285, 2019

Benedikt Bauer and Michael Kohler. On deep learning as a remedy for the curse of dimen- sionality in nonparametric regression.Annals of Statistics, 47(4):2261–2285, 2019

2019
[16]

On the inductive bias of infinite-depth ResNets and the bottleneck rank

Enric Boix-Adsera. On the inductive bias of infinite-depth ResNets and the bottleneck rank. arXiv preprint arXiv:2501.19149, 2025

arXiv 2025
[17]

Bandeira

Nicolas Boumal, Vladislav Voroninski, and Afonso S. Bandeira. Deterministic guarantees for Burer–Monteiro factorizations of smooth semidefinite programs.Communications on Pure and Applied Mathematics, 73(3):581–608, 2020

2020
[18]

On representer theorems and convex regularization.SIAM Journal on Optimization, 29(2):1260–1281, 2019

Claire Boyer, Antonin Chambolle, Yohann De Castro, Vincent Duval, Fr´ ed´ eric De Gournay, and Pierre Weiss. On representer theorems and convex regularization.SIAM Journal on Optimization, 29(2):1260–1281, 2019

2019
[19]

Sparsity of solutions for variational inverse problems with finite-dimensional data.Calculus of Variations and Partial Differential Equations, 59(1):1–26, 2020

Kristian Bredies and Marcello Carioni. Sparsity of solutions for variational inverse problems with finite-dimensional data.Calculus of Variations and Partial Differential Equations, 59(1):1–26, 2020

2020
[20]

Inverse problems in spaces of measures

Kristian Bredies and Hanna Katriina Pikkarainen. Inverse problems in spaces of measures. ESAIM: Control, Optimisation and Calculus of Variations, 19(1):190–218, 2013

2013
[21]

Univer- sitext

Haim Brezis.Functional Analysis, Sobolev Spaces and Partial Differential Equations. Univer- sitext. Springer, 2011

2011
[22]

Monteiro

Samuel Burer and Renato D.C. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization.Mathematical Programming, 95(2):329–357, 2003

2003
[23]

Monteiro

Samuel Burer and Renato D.C. Monteiro. Local minima and convergence in low-rank semidef- inite programming.Mathematical Programming, 103(3):427–444, 2005

2005
[24]

Pitman Research Notes in Mathematics 207

Giuseppe Buttazzo.Semicontinuity, relaxation and integral representation in the calculus of variations. Pitman Research Notes in Mathematics 207. Longman, Harlow, 1989. 74

1989
[25]

Optimal approximation with sparsely connected deep neural networks.SIAM Journal on Mathematics of Data Science, 1(1):8–45, 2019

Helmut B¨ olcskei, Philipp Grohs, Gitta Kutyniok, and Philipp Petersen. Optimal approximation with sparsely connected deep neural networks.SIAM Journal on Mathematics of Data Science, 1(1):8–45, 2019

2019
[26]

Parrilo, and Alan S

Venkat Chandrasekaran, Benjamin Recht, Pablo A. Parrilo, and Alan S. Willsky. The convex geometry of linear inverse problems.Foundations of Computational Mathematics, 12(6):805–849, 2012

2012
[27]

Multi-layer neural networks as trainable ladders of Hilbert spaces

Zhengdao Chen. Multi-layer neural networks as trainable ladders of Hilbert spaces. In International Conference on Machine Learning, pages 4294–4329. PMLR, 2023

2023
[28]

Neural Hilbert ladders: Multi-layer neural networks in function space.Journal of Machine Learning Research, 25(109):1–65, 2024

Zhengdao Chen. Neural Hilbert ladders: Multi-layer neural networks in function space.Journal of Machine Learning Research, 25(109):1–65, 2024

2024
[29]

On the representation of solutions to elliptic PDEs in Barron spaces.Advances in Neural Information Processing Systems, 34:6454–6465, 2021

Ziang Chen, Jianfeng Lu, and Yulong Lu. On the representation of solutions to elliptic PDEs in Barron spaces.Advances in Neural Information Processing Systems, 34:6454–6465, 2021

2021
[30]

On the global convergence of gradient descent for over- parameterized models using optimal transport.Advances in Neural Information Processing Systems, 31, 2018

Lenaic Chizat and Francis Bach. On the global convergence of gradient descent for over- parameterized models using optimal transport.Advances in Neural Information Processing Systems, 31, 2018

2018
[31]

Springer, 2 edition, 1990

John B Conway.A Course in Functional Analysis, volume 96 ofGraduate Texts in Mathematics. Springer, 2 edition, 1990

1990
[32]

Compositional sparsity, approximation classes, and parametric transport equations.Constructive Approximation, 61(2):219–283, 2025

Wolfgang Dahmen. Compositional sparsity, approximation classes, and parametric transport equations.Constructive Approximation, 61(2):219–283, 2025

2025
[33]

Representation costs of linear neural networks: Analysis and design.Advances in Neural Information Processing Systems, 34:26884–26896, 2021

Zhen Dai, Mina Karzand, and Nathan Srebro. Representation costs of linear neural networks: Analysis and design.Advances in Neural Information Processing Systems, 34:26884–26896, 2021

2021
[34]

Birkh¨ auser Boston, 1993

Gianni Dal Maso.An introduction toΓ-convergence, volume 8 ofProgress in Nonlinear Differential Equations and Their Applications. Birkh¨ auser Boston, 1993

1993
[35]

SIAM, Philadelphia, PA, 1992

Ingrid Daubechies.Ten Lectures on Wavelets. SIAM, Philadelphia, PA, 1992

1992
[36]

An iterative thresholding algorithm for linear inverse problems with a sparsity constraint.Communications on Pure and Applied Mathematics, 57(11):1413–1457, 2004

Ingrid Daubechies, Michel Defrise, and Christine De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint.Communications on Pure and Applied Mathematics, 57(11):1413–1457, 2004

2004
[37]

Nonlinear approximation and (deep) ReLU networks.Constructive Approximation, 55(1):127– 172, 2022

Ingrid Daubechies, Ronald DeVore, Simon Foucart, Boris Hanin, and Guergana Petrova. Nonlinear approximation and (deep) ReLU networks.Constructive Approximation, 55(1):127– 172, 2022

2022
[38]

Carl de Boor and Robert E. Lynch. On splines and their minimum properties.Journal of Mathematics and Mechanics, 15(6):953–969, 1966

1966
[39]

Neural network approximation.Acta Numerica, 30:327–444, 2021

Ronald DeVore, Boris Hanin, and Guergana Petrova. Neural network approximation.Acta Numerica, 30:327–444, 2021

2021
[40]

Nowak, Rahul Parhi, and Jonathan W

Ronald DeVore, Robert D. Nowak, Rahul Parhi, and Jonathan W. Siegel. Weighted variation spaces and approximation by shallow ReLU networks.Applied and Computational Harmonic Analysis, 74(101713), 2025. 75

2025
[41]

Ronald A. DeVore. Nonlinear approximation.Acta Numerica, 7:51–150, 1998

1998
[42]

DeVore and George G

Ronald A. DeVore and George G. Lorentz.Constructive Approximation, volume 303 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin, Heidelberg, 1993

1993
[43]

DeVore and Robert C

Ronald A. DeVore and Robert C. Sharpley. Besov spaces on domains in Rd.Transactions of the American Mathematical Society, 335(2):843–864, 1993

1993
[44]

David L. Donoho. Unconditional bases are optimal bases for data compression and for statistical estimation.Applied and Computational Harmonic Analysis, 1(1):100–115, 1993

1993
[45]

David L. Donoho. High-dimensional data analysis: The curses and blessings of dimensionality,
[46]

AMS Mathematical Challenges Lecture
[47]

Donoho and Iain M

David L. Donoho and Iain M. Johnstone. Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3):425–455, 1994

1994
[48]

Donoho and Iain M

David L. Donoho and Iain M. Johnstone. Adapting to unknown smoothness via wavelet shrinkage.Journal of the American Statistical Association, 90(432):1200–1224, 1995

1995
[49]

Donoho and Iain M

David L. Donoho and Iain M. Johnstone. Minimax estimation via wavelet shrinkage.Annals of Statistics, 26(3):879–921, 1998

1998
[50]

The Barron space and the flow-induced function spaces for neural network models.Constructive Approximation, 55(1):369–406, 2022

Weinan E, Chao Ma, and Lei Wu. The Barron space and the flow-induced function spaces for neural network models.Constructive Approximation, 55(1):369–406, 2022

2022
[51]

On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics

Weinan E and Stephan Wojtowytsch. On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics. CSIAM Transactions on Applied Mathematics, 1(3):387–440, 2020

2020
[52]

Representation formulas and pointwise properties for Barron functions.Calculus of Variations and Partial Differential Equations, 61(2):46, 2022

Weinan E and Stephan Wojtowytsch. Representation formulas and pointwise properties for Barron functions.Calculus of Variations and Partial Differential Equations, 61(2):46, 2022

2022
[53]

SIAM, 1999

Ivar Ekeland and Roger Temam.Convex analysis and variational problems. SIAM, 1999

1999
[54]

Deep neural network approximation theory.IEEE Transactions on Information Theory, 67(5):2581–2623, 2021

Dennis Elbr¨ achter, Dmytro Perekrestenko, Philipp Grohs, and Helmut B¨ olcskei. Deep neural network approximation theory.IEEE Transactions on Information Theory, 67(5):2581–2623, 2021

2021
[55]

Elbr¨ achter, Julius Berner, and Philipp Grohs

Dennis M. Elbr¨ achter, Julius Berner, and Philipp Grohs. How degenerate is the parametrization of neural networks with the ReLU activation function?Advances in Neural Information Processing Systems, 32, 2019

2019
[56]

American mathematical society, 2nd edition, 2010

Lawrence C Evans.Partial differential equations, volume 19. American mathematical society, 2nd edition, 2010

2010
[57]

PhD thesis, Stanford University, 2002

Maryam Fazel.Matrix Rank Minimization with Applications. PhD thesis, Stanford University, 2002

2002
[58]

Fisher and Joseph W

Stephen D. Fisher and Joseph W. Jerome. Spline solutions to L1 extremal problems in one and several variables.Journal of Approximation Theory, 13(1):73–83, 1975

1975
[59]

Exact solutions of infinite dimensional total-variation regularized problems.Information and Inference: A Journal of the IMA, 8(3):407–443, 2019

Axel Flinth and Pierre Weiss. Exact solutions of infinite dimensional total-variation regularized problems.Information and Inference: A Journal of the IMA, 8(3):407–443, 2019. 76

2019
[60]

Sriperumbudur

Kenji Fukumizu, Gert Lanckriet, and Bharath K. Sriperumbudur. Learning in Hilbert vs. Banach spaces: A measure embedding viewpoint.Advances in Neural Information Processing Systems, 24, 2011

2011
[61]

A survey on Lipschitz-free Banach spaces.Commentationes Mathematicae, 55(2):89–118, 2015

Gilles Godefroy. A survey on Lipschitz-free Banach spaces.Commentationes Mathematicae, 55(2):89–118, 2015

2015
[62]

Least absolute shrinkage is equivalent to quadratic penalization

Yves Grandvalet. Least absolute shrinkage is equivalent to quadratic penalization. In International Conference on Artificial Neural Networks, pages 201–206. Springer, 1998

1998
[63]

Approximation spaces of deep neural networks.Constructive Approximation, 55(1):259–367, 2022

R´ emi Gribonval, Gitta Kutyniok, Morten Nielsen, and Felix Voigtlaender. Approximation spaces of deep neural networks.Constructive Approximation, 55(1):259–367, 2022

2022
[64]

Lee, Daniel Soudry, and Nati Srebro

Suriya Gunasekar, Jason D. Lee, Daniel Soudry, and Nati Srebro. Implicit bias of gradient descent on linear convolutional networks.Advances in Neural Information Processing Systems, 31, 2018

2018
[65]

Implicit regularization in matrix factorization.Advances in Neural Information Processing Systems, 30, 2017

Suriya Gunasekar, Blake E Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, and Nati Srebro. Implicit regularization in matrix factorization.Advances in Neural Information Processing Systems, 30, 2017

2017
[66]

Comparing biases for minimal network construction with back-propagation.Advances in Neural Information Processing Systems, 1, 1988

Stephen Hanson and Lorien Pratt. Comparing biases for minimal network construction with back-propagation.Advances in Neural Information Processing Systems, 1, 1988

1988
[67]

ReLU deep neural networks and linear finite elements.Journal of Computational Mathematics, 38(3):502–527, 2020

Juncai He, Lin Li, Jinchao Xu, and Chunyue Zheng. ReLU deep neural networks and linear finite elements.Journal of Computational Mathematics, 38(3):502–527, 2020

2020
[68]

Deep networks are reproducing kernel chains.arXiv preprint arXiv:2501.03697, 2025

Tjeerd Jan Heeringa, Len Spek, and Christoph Brune. Deep networks are reproducing kernel chains.arXiv preprint arXiv:2501.03697, 2025

arXiv 2025
[69]

Towards lower bounds on the depth of ReLU neural networks

Christoph Hertrich, Amitabh Basu, Marco Di Summa, and Martin Skutella. Towards lower bounds on the depth of ReLU neural networks. InAdvances in Neural Information Processing Systems, volume 34, pages 3336–3348, 2021

2021
[70]

Learning sparse compo- sitional functions with norm-constrained neural networks.arXiv preprint arXiv:2605.25608, 2026

Shuo Huang, Lorenzo Fiorito, Lorenzo Rosasco, and Tomaso Poggio. Learning sparse compo- sitional functions with norm-constrained neural networks.arXiv preprint arXiv:2605.25608, 2026

Pith/arXiv arXiv 2026
[71]

Hunter and Bruno Nachtergaele.Applied Analysis

John K. Hunter and Bruno Nachtergaele.Applied Analysis. World Scientific Publishing Company, 2001

2001
[72]

Bottleneck structure in learned features: Low-dimension vs regularity tradeoff

Arthur Jacot. Bottleneck structure in learned features: Low-dimension vs regularity tradeoff. Advances in Neural Information Processing Systems, 36:23607–23629, 2023

2023
[73]

Implicit bias of large depth networks: a notion of rank for nonlinear functions

Arthur Jacot. Implicit bias of large depth networks: a notion of rank for nonlinear functions. InInternational Conference on Learning Representations (ICLR), 2023

2023
[74]

Feature learning in L2-regularized DNNs: Attraction/repulsion and sparsity.Advances in Neural Information Processing Systems, 35:6763–6774, 2022

Arthur Jacot, Eugene Golikov, Cl´ ement Hongler, and Franck Gabriel. Feature learning in L2-regularized DNNs: Attraction/repulsion and sparsity.Advances in Neural Information Processing Systems, 35:6763–6774, 2022

2022
[75]

Kimeldorf and Grace Wahba

George S. Kimeldorf and Grace Wahba. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines.The Annals of Mathematical Statistics, 41(2):495–502, 1970. 77

1970
[76]

Kimeldorf and Grace Wahba

George S. Kimeldorf and Grace Wahba. Spline functions and stochastic processes.Sankhy¯ a: The Indian Journal of Statistics, Series A, pages 173–180, 1970

1970
[77]

Kimeldorf and Grace Wahba

George S. Kimeldorf and Grace Wahba. Some results on Tchebycheffian spline functions. Journal of mathematical analysis and applications, 33(1):82–95, 1971

1971
[78]

Two-layer neural networks with values in a Banach space.SIAM Journal on Mathematical Analysis, 54(6):6358–6389, 2022

Yury Korolev. Two-layer neural networks with values in a Banach space.SIAM Journal on Mathematical Analysis, 54(6):6358–6389, 2022

2022
[79]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks.Advances in Neural Information Processing Systems, 25, 2012

2012
[80]

A simple weight decay can improve generalization.Advances in Neural Information Processing Systems, 4, 1991

Anders Krogh and John Hertz. A simple weight decay can improve generalization.Advances in Neural Information Processing Systems, 4, 1991

1991

Showing first 80 references.

[1] [1]

Fernando Albiac and Nigel J. Kalton. Lipschitz structure of quasi-Banach spaces.Israel Journal of Mathematics, 170(1):317–335, 2009

2009

[2] [2]

Locally bounded linear topological spaces.Proceedings of the Imperial Academy, 18(10):588–594, 1942

Tosio Aoki. Locally bounded linear topological spaces.Proceedings of the Imperial Academy, 18(10):588–594, 1942

1942

[3] [3]

Theory of reproducing kernels.Transactions of the American Mathe- matical Society, 68(3):337–404, 1950

Nachman Aronszajn. Theory of reproducing kernels.Transactions of the American Mathe- matical Society, 68(3):337–404, 1950

1950

[4] [4]

Understanding deep neural networks with rectified linear units

Raman Arora, Amitabh Basu, Poorya Mianjy, and Anirbit Mukherjee. Understanding deep neural networks with rectified linear units. InInternational Conference on Learning Representations (ICLR), 2018

2018

[5] [5]

Implicit regularization in deep matrix factorization.Advances in Neural Information Processing Systems (NeurIPS), 32, 2019

Sanjeev Arora, Nadav Cohen, Wei Hu, and Yuping Luo. Implicit regularization in deep matrix factorization.Advances in Neural Information Processing Systems (NeurIPS), 32, 2019

2019

[6] [6]

Breaking the curse of dimensionality with convex neural networks.Journal of Machine Learning Research, 18(1):629–681, 2017

Francis Bach. Breaking the curse of dimensionality with convex neural networks.Journal of Machine Learning Research, 18(1):629–681, 2017

2017

[7] [7]

Optimization with sparsity-inducing penalties.Foundations and Trends®in Machine Learning, 4(1):1–106, 2012

Francis Bach, Rodolphe Jenatton, Julien Mairal, and Guillaume Obozinski. Optimization with sparsity-inducing penalties.Foundations and Trends®in Machine Learning, 4(1):1–106, 2012

2012

[8] [8]

Better neural network expressivity: subdividing the simplex.arXiv preprint arXiv:2505.14338, 2025

Egor Bakaev, Florestan Brunck, Christoph Hertrich, Jack Stade, and Amir Yehudayoff. Better neural network expressivity: subdividing the simplex.arXiv preprint arXiv:2505.14338, 2025. 73

arXiv 2025

[9] [9]

Andrew R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3):930–945, 1993

1993

[10] [10]

Andrew R. Barron. Approximation and estimation bounds for artificial neural networks. Machine Learning, 14(1):115–133, 1994

1994

[11] [11]

Barron, Albert Cohen, Wolfgang Dahmen, and Ronald A

Andrew R. Barron, Albert Cohen, Wolfgang Dahmen, and Ronald A. DeVore. Approximation and learning by greedy algorithms.Annals of Statistics, 36(1):64–94, 2008

2008

[12] [12]

A Lipschitz spaces view of infinitely wide shallow neural networks.SIAM Journal on Mathematical Analysis, 58(3):2786–2828, 2026

Francesca Bartolucci, Marcello Carioni, Jos´ e A Iglesias, Yury Korolev, Emanuele Naldi, and Stefano Vigogna. A Lipschitz spaces view of infinitely wide shallow neural networks.SIAM Journal on Mathematical Analysis, 58(3):2786–2828, 2026

2026

[13] [13]

Understanding neural networks with reproducing kernel Banach spaces.Applied and Computational Harmonic Analysis, 62:194–236, 2023

Francesca Bartolucci, Ernesto De Vito, Lorenzo Rosasco, and Stefano Vigogna. Understanding neural networks with reproducing kernel Banach spaces.Applied and Computational Harmonic Analysis, 62:194–236, 2023

2023

[14] [14]

Neural reproducing kernel Banach spaces and representer theorems for deep networks.arXiv preprint arXiv:2403.08750, 2024

Francesca Bartolucci, Ernesto De Vito, Lorenzo Rosasco, and Stefano Vigogna. Neural reproducing kernel Banach spaces and representer theorems for deep networks.arXiv preprint arXiv:2403.08750, 2024

arXiv 2024

[15] [15]

On deep learning as a remedy for the curse of dimen- sionality in nonparametric regression.Annals of Statistics, 47(4):2261–2285, 2019

Benedikt Bauer and Michael Kohler. On deep learning as a remedy for the curse of dimen- sionality in nonparametric regression.Annals of Statistics, 47(4):2261–2285, 2019

2019

[16] [16]

On the inductive bias of infinite-depth ResNets and the bottleneck rank

Enric Boix-Adsera. On the inductive bias of infinite-depth ResNets and the bottleneck rank. arXiv preprint arXiv:2501.19149, 2025

arXiv 2025

[17] [17]

Bandeira

Nicolas Boumal, Vladislav Voroninski, and Afonso S. Bandeira. Deterministic guarantees for Burer–Monteiro factorizations of smooth semidefinite programs.Communications on Pure and Applied Mathematics, 73(3):581–608, 2020

2020

[18] [18]

On representer theorems and convex regularization.SIAM Journal on Optimization, 29(2):1260–1281, 2019

Claire Boyer, Antonin Chambolle, Yohann De Castro, Vincent Duval, Fr´ ed´ eric De Gournay, and Pierre Weiss. On representer theorems and convex regularization.SIAM Journal on Optimization, 29(2):1260–1281, 2019

2019

[19] [19]

Sparsity of solutions for variational inverse problems with finite-dimensional data.Calculus of Variations and Partial Differential Equations, 59(1):1–26, 2020

Kristian Bredies and Marcello Carioni. Sparsity of solutions for variational inverse problems with finite-dimensional data.Calculus of Variations and Partial Differential Equations, 59(1):1–26, 2020

2020

[20] [20]

Inverse problems in spaces of measures

Kristian Bredies and Hanna Katriina Pikkarainen. Inverse problems in spaces of measures. ESAIM: Control, Optimisation and Calculus of Variations, 19(1):190–218, 2013

2013

[21] [21]

Univer- sitext

Haim Brezis.Functional Analysis, Sobolev Spaces and Partial Differential Equations. Univer- sitext. Springer, 2011

2011

[22] [22]

Monteiro

Samuel Burer and Renato D.C. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization.Mathematical Programming, 95(2):329–357, 2003

2003

[23] [23]

Monteiro

Samuel Burer and Renato D.C. Monteiro. Local minima and convergence in low-rank semidef- inite programming.Mathematical Programming, 103(3):427–444, 2005

2005

[24] [24]

Pitman Research Notes in Mathematics 207

Giuseppe Buttazzo.Semicontinuity, relaxation and integral representation in the calculus of variations. Pitman Research Notes in Mathematics 207. Longman, Harlow, 1989. 74

1989

[25] [25]

Optimal approximation with sparsely connected deep neural networks.SIAM Journal on Mathematics of Data Science, 1(1):8–45, 2019

Helmut B¨ olcskei, Philipp Grohs, Gitta Kutyniok, and Philipp Petersen. Optimal approximation with sparsely connected deep neural networks.SIAM Journal on Mathematics of Data Science, 1(1):8–45, 2019

2019

[26] [26]

Parrilo, and Alan S

Venkat Chandrasekaran, Benjamin Recht, Pablo A. Parrilo, and Alan S. Willsky. The convex geometry of linear inverse problems.Foundations of Computational Mathematics, 12(6):805–849, 2012

2012

[27] [27]

Multi-layer neural networks as trainable ladders of Hilbert spaces

Zhengdao Chen. Multi-layer neural networks as trainable ladders of Hilbert spaces. In International Conference on Machine Learning, pages 4294–4329. PMLR, 2023

2023

[28] [28]

Neural Hilbert ladders: Multi-layer neural networks in function space.Journal of Machine Learning Research, 25(109):1–65, 2024

Zhengdao Chen. Neural Hilbert ladders: Multi-layer neural networks in function space.Journal of Machine Learning Research, 25(109):1–65, 2024

2024

[29] [29]

On the representation of solutions to elliptic PDEs in Barron spaces.Advances in Neural Information Processing Systems, 34:6454–6465, 2021

Ziang Chen, Jianfeng Lu, and Yulong Lu. On the representation of solutions to elliptic PDEs in Barron spaces.Advances in Neural Information Processing Systems, 34:6454–6465, 2021

2021

[30] [30]

On the global convergence of gradient descent for over- parameterized models using optimal transport.Advances in Neural Information Processing Systems, 31, 2018

Lenaic Chizat and Francis Bach. On the global convergence of gradient descent for over- parameterized models using optimal transport.Advances in Neural Information Processing Systems, 31, 2018

2018

[31] [31]

Springer, 2 edition, 1990

John B Conway.A Course in Functional Analysis, volume 96 ofGraduate Texts in Mathematics. Springer, 2 edition, 1990

1990

[32] [32]

Compositional sparsity, approximation classes, and parametric transport equations.Constructive Approximation, 61(2):219–283, 2025

Wolfgang Dahmen. Compositional sparsity, approximation classes, and parametric transport equations.Constructive Approximation, 61(2):219–283, 2025

2025

[33] [33]

Representation costs of linear neural networks: Analysis and design.Advances in Neural Information Processing Systems, 34:26884–26896, 2021

Zhen Dai, Mina Karzand, and Nathan Srebro. Representation costs of linear neural networks: Analysis and design.Advances in Neural Information Processing Systems, 34:26884–26896, 2021

2021

[34] [34]

Birkh¨ auser Boston, 1993

Gianni Dal Maso.An introduction toΓ-convergence, volume 8 ofProgress in Nonlinear Differential Equations and Their Applications. Birkh¨ auser Boston, 1993

1993

[35] [35]

SIAM, Philadelphia, PA, 1992

Ingrid Daubechies.Ten Lectures on Wavelets. SIAM, Philadelphia, PA, 1992

1992

[36] [36]

An iterative thresholding algorithm for linear inverse problems with a sparsity constraint.Communications on Pure and Applied Mathematics, 57(11):1413–1457, 2004

Ingrid Daubechies, Michel Defrise, and Christine De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint.Communications on Pure and Applied Mathematics, 57(11):1413–1457, 2004

2004

[37] [37]

Nonlinear approximation and (deep) ReLU networks.Constructive Approximation, 55(1):127– 172, 2022

Ingrid Daubechies, Ronald DeVore, Simon Foucart, Boris Hanin, and Guergana Petrova. Nonlinear approximation and (deep) ReLU networks.Constructive Approximation, 55(1):127– 172, 2022

2022

[38] [38]

Carl de Boor and Robert E. Lynch. On splines and their minimum properties.Journal of Mathematics and Mechanics, 15(6):953–969, 1966

1966

[39] [39]

Neural network approximation.Acta Numerica, 30:327–444, 2021

Ronald DeVore, Boris Hanin, and Guergana Petrova. Neural network approximation.Acta Numerica, 30:327–444, 2021

2021

[40] [40]

Nowak, Rahul Parhi, and Jonathan W

Ronald DeVore, Robert D. Nowak, Rahul Parhi, and Jonathan W. Siegel. Weighted variation spaces and approximation by shallow ReLU networks.Applied and Computational Harmonic Analysis, 74(101713), 2025. 75

2025

[41] [41]

Ronald A. DeVore. Nonlinear approximation.Acta Numerica, 7:51–150, 1998

1998

[42] [42]

DeVore and George G

Ronald A. DeVore and George G. Lorentz.Constructive Approximation, volume 303 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin, Heidelberg, 1993

1993

[43] [43]

DeVore and Robert C

Ronald A. DeVore and Robert C. Sharpley. Besov spaces on domains in Rd.Transactions of the American Mathematical Society, 335(2):843–864, 1993

1993

[44] [44]

David L. Donoho. Unconditional bases are optimal bases for data compression and for statistical estimation.Applied and Computational Harmonic Analysis, 1(1):100–115, 1993

1993

[45] [45]

David L. Donoho. High-dimensional data analysis: The curses and blessings of dimensionality,

[46] [46]

AMS Mathematical Challenges Lecture

[47] [47]

Donoho and Iain M

David L. Donoho and Iain M. Johnstone. Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3):425–455, 1994

1994

[48] [48]

Donoho and Iain M

David L. Donoho and Iain M. Johnstone. Adapting to unknown smoothness via wavelet shrinkage.Journal of the American Statistical Association, 90(432):1200–1224, 1995

1995

[49] [49]

Donoho and Iain M

David L. Donoho and Iain M. Johnstone. Minimax estimation via wavelet shrinkage.Annals of Statistics, 26(3):879–921, 1998

1998

[50] [50]

The Barron space and the flow-induced function spaces for neural network models.Constructive Approximation, 55(1):369–406, 2022

Weinan E, Chao Ma, and Lei Wu. The Barron space and the flow-induced function spaces for neural network models.Constructive Approximation, 55(1):369–406, 2022

2022

[51] [51]

On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics

Weinan E and Stephan Wojtowytsch. On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics. CSIAM Transactions on Applied Mathematics, 1(3):387–440, 2020

2020

[52] [52]

Representation formulas and pointwise properties for Barron functions.Calculus of Variations and Partial Differential Equations, 61(2):46, 2022

Weinan E and Stephan Wojtowytsch. Representation formulas and pointwise properties for Barron functions.Calculus of Variations and Partial Differential Equations, 61(2):46, 2022

2022

[53] [53]

SIAM, 1999

Ivar Ekeland and Roger Temam.Convex analysis and variational problems. SIAM, 1999

1999

[54] [54]

Deep neural network approximation theory.IEEE Transactions on Information Theory, 67(5):2581–2623, 2021

Dennis Elbr¨ achter, Dmytro Perekrestenko, Philipp Grohs, and Helmut B¨ olcskei. Deep neural network approximation theory.IEEE Transactions on Information Theory, 67(5):2581–2623, 2021

2021

[55] [55]

Elbr¨ achter, Julius Berner, and Philipp Grohs

Dennis M. Elbr¨ achter, Julius Berner, and Philipp Grohs. How degenerate is the parametrization of neural networks with the ReLU activation function?Advances in Neural Information Processing Systems, 32, 2019

2019

[56] [56]

American mathematical society, 2nd edition, 2010

Lawrence C Evans.Partial differential equations, volume 19. American mathematical society, 2nd edition, 2010

2010

[57] [57]

PhD thesis, Stanford University, 2002

Maryam Fazel.Matrix Rank Minimization with Applications. PhD thesis, Stanford University, 2002

2002

[58] [58]

Fisher and Joseph W

Stephen D. Fisher and Joseph W. Jerome. Spline solutions to L1 extremal problems in one and several variables.Journal of Approximation Theory, 13(1):73–83, 1975

1975

[59] [59]

Exact solutions of infinite dimensional total-variation regularized problems.Information and Inference: A Journal of the IMA, 8(3):407–443, 2019

Axel Flinth and Pierre Weiss. Exact solutions of infinite dimensional total-variation regularized problems.Information and Inference: A Journal of the IMA, 8(3):407–443, 2019. 76

2019

[60] [60]

Sriperumbudur

Kenji Fukumizu, Gert Lanckriet, and Bharath K. Sriperumbudur. Learning in Hilbert vs. Banach spaces: A measure embedding viewpoint.Advances in Neural Information Processing Systems, 24, 2011

2011

[61] [61]

A survey on Lipschitz-free Banach spaces.Commentationes Mathematicae, 55(2):89–118, 2015

Gilles Godefroy. A survey on Lipschitz-free Banach spaces.Commentationes Mathematicae, 55(2):89–118, 2015

2015

[62] [62]

Least absolute shrinkage is equivalent to quadratic penalization

Yves Grandvalet. Least absolute shrinkage is equivalent to quadratic penalization. In International Conference on Artificial Neural Networks, pages 201–206. Springer, 1998

1998

[63] [63]

Approximation spaces of deep neural networks.Constructive Approximation, 55(1):259–367, 2022

R´ emi Gribonval, Gitta Kutyniok, Morten Nielsen, and Felix Voigtlaender. Approximation spaces of deep neural networks.Constructive Approximation, 55(1):259–367, 2022

2022

[64] [64]

Lee, Daniel Soudry, and Nati Srebro

Suriya Gunasekar, Jason D. Lee, Daniel Soudry, and Nati Srebro. Implicit bias of gradient descent on linear convolutional networks.Advances in Neural Information Processing Systems, 31, 2018

2018

[65] [65]

Implicit regularization in matrix factorization.Advances in Neural Information Processing Systems, 30, 2017

Suriya Gunasekar, Blake E Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, and Nati Srebro. Implicit regularization in matrix factorization.Advances in Neural Information Processing Systems, 30, 2017

2017

[66] [66]

Comparing biases for minimal network construction with back-propagation.Advances in Neural Information Processing Systems, 1, 1988

Stephen Hanson and Lorien Pratt. Comparing biases for minimal network construction with back-propagation.Advances in Neural Information Processing Systems, 1, 1988

1988

[67] [67]

ReLU deep neural networks and linear finite elements.Journal of Computational Mathematics, 38(3):502–527, 2020

Juncai He, Lin Li, Jinchao Xu, and Chunyue Zheng. ReLU deep neural networks and linear finite elements.Journal of Computational Mathematics, 38(3):502–527, 2020

2020

[68] [68]

Deep networks are reproducing kernel chains.arXiv preprint arXiv:2501.03697, 2025

Tjeerd Jan Heeringa, Len Spek, and Christoph Brune. Deep networks are reproducing kernel chains.arXiv preprint arXiv:2501.03697, 2025

arXiv 2025

[69] [69]

Towards lower bounds on the depth of ReLU neural networks

Christoph Hertrich, Amitabh Basu, Marco Di Summa, and Martin Skutella. Towards lower bounds on the depth of ReLU neural networks. InAdvances in Neural Information Processing Systems, volume 34, pages 3336–3348, 2021

2021

[70] [70]

Learning sparse compo- sitional functions with norm-constrained neural networks.arXiv preprint arXiv:2605.25608, 2026

Shuo Huang, Lorenzo Fiorito, Lorenzo Rosasco, and Tomaso Poggio. Learning sparse compo- sitional functions with norm-constrained neural networks.arXiv preprint arXiv:2605.25608, 2026

Pith/arXiv arXiv 2026

[71] [71]

Hunter and Bruno Nachtergaele.Applied Analysis

John K. Hunter and Bruno Nachtergaele.Applied Analysis. World Scientific Publishing Company, 2001

2001

[72] [72]

Bottleneck structure in learned features: Low-dimension vs regularity tradeoff

Arthur Jacot. Bottleneck structure in learned features: Low-dimension vs regularity tradeoff. Advances in Neural Information Processing Systems, 36:23607–23629, 2023

2023

[73] [73]

Implicit bias of large depth networks: a notion of rank for nonlinear functions

Arthur Jacot. Implicit bias of large depth networks: a notion of rank for nonlinear functions. InInternational Conference on Learning Representations (ICLR), 2023

2023

[74] [74]

Feature learning in L2-regularized DNNs: Attraction/repulsion and sparsity.Advances in Neural Information Processing Systems, 35:6763–6774, 2022

Arthur Jacot, Eugene Golikov, Cl´ ement Hongler, and Franck Gabriel. Feature learning in L2-regularized DNNs: Attraction/repulsion and sparsity.Advances in Neural Information Processing Systems, 35:6763–6774, 2022

2022

[75] [75]

Kimeldorf and Grace Wahba

George S. Kimeldorf and Grace Wahba. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines.The Annals of Mathematical Statistics, 41(2):495–502, 1970. 77

1970

[76] [76]

Kimeldorf and Grace Wahba

George S. Kimeldorf and Grace Wahba. Spline functions and stochastic processes.Sankhy¯ a: The Indian Journal of Statistics, Series A, pages 173–180, 1970

1970

[77] [77]

Kimeldorf and Grace Wahba

George S. Kimeldorf and Grace Wahba. Some results on Tchebycheffian spline functions. Journal of mathematical analysis and applications, 33(1):82–95, 1971

1971

[78] [78]

Two-layer neural networks with values in a Banach space.SIAM Journal on Mathematical Analysis, 54(6):6358–6389, 2022

Yury Korolev. Two-layer neural networks with values in a Banach space.SIAM Journal on Mathematical Analysis, 54(6):6358–6389, 2022

2022

[79] [79]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks.Advances in Neural Information Processing Systems, 25, 2012

2012

[80] [80]

A simple weight decay can improve generalization.Advances in Neural Information Processing Systems, 4, 1991

Anders Krogh and John Hertz. A simple weight decay can improve generalization.Advances in Neural Information Processing Systems, 4, 1991

1991