Tighter Learning Guarantees on Digital Computers via Concentration of Measure on Finite Spaces

A. Martina Neuman; Anastasis Kratsios; Gudmund Pammer

arxiv: 2402.05576 · v4 · pith:V3ZUVYL4new · submitted 2024-02-08 · 💻 cs.LG

Tighter Learning Guarantees on Digital Computers via Concentration of Measure on Finite Spaces

Anastasis Kratsios , A. Martina Neuman , Gudmund Pammer This is my paper

Pith reviewed 2026-05-24 03:26 UTC · model grok-4.3

classification 💻 cs.LG

keywords generalization boundsconcentration of measurefinite metric spacesdigital computersmachine learninggeometric representation dimensionmetric embeddings

0 comments

The pith

Learning models on digital computers admit generalization bounds that adapt to a geometric representation dimension m, achieving rates of order 1 over N to the power 1 over 2 or m with constants of order square root of m.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives a family of generalization bounds for machine learning models implemented on digital computers. These bounds take the form c_m over N raised to the power 1 over 2 or m, where m indexes the geometric representation dimension of the underlying discrete problem. Adjusting m to the sample size N produces tighter guarantees for realistic values of N, while small m recovers the optimal dimension-free rate of order 1 over square root of N. The bounds rest on a new non-asymptotic concentration-of-measure inequality for finite metric spaces obtained by metric embedding arguments. Classical bounds suffer from constants that grow with ambient dimension and machine precision, so the new family addresses a practical gap when N is small to moderate.

Core claim

We derive a family of generalization bounds {c_m/N^{1/(2∨m)}}_{m=1}^∞ tailored for learning models on digital computers, which adapt to both the sample size N and the geometric representation dimension m of the discrete learning problem. Adjusting the parameter m according to N results in significantly tighter generalization bounds for practical sample sizes N, while setting m small maintains the optimal dimension-free worst-case rate of O(1/N^{1/2}). Notably, c_m ∈ O(m^{1/2}) for learning models on discretized Euclidean domains. Furthermore, our adaptive generalization bounds are formulated based on our new non-asymptotic result for concentration of measure in finite metric spaces, obtained

What carries the argument

A new non-asymptotic concentration of measure inequality for finite metric spaces, derived via metric embedding arguments, which directly yields the parameterized family of generalization bounds.

If this is right

For any fixed m the generalization gap converges at rate 1/N^{1/(2∨m)}
Choosing m depending on N produces tighter numerical bounds for practical sample sizes
Small m recovers the dimension-free rate O(1/N^{1/2})
On discretized Euclidean domains the prefactor c_m scales at most like sqrt(m)
The bounds apply directly to models whose inputs are Euclidean but whose implementation is finite-precision

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The embedding technique may allow similar adaptive bounds for other finite-precision or quantized architectures beyond Euclidean discretization
Optimal m could be chosen from data alone by monitoring how the observed gap behaves across a grid of m values
The same concentration result may tighten analyses of generalization for models trained with finite-precision arithmetic even when the target function is continuous
If the inequality holds, it suggests that the effective dimension m can serve as a practical proxy for the complexity induced by machine precision

Load-bearing premise

The new concentration of measure result for finite metric spaces holds when applied to the discrete learning problem arising from digital implementation.

What would settle it

An explicit finite metric space arising from discretization of a Euclidean domain together with a learning model whose generalization gap exceeds every c_m / N^{1/(2∨m)} for all m, or a direct counterexample to the claimed concentration inequality on that space.

Figures

Figures reproduced from arXiv: 2402.05576 by A. Martina Neuman, Anastasis Kratsios, Gudmund Pammer.

**Figure 1.** Figure 1: When the sample size (N) is small-to-realistically-large, our non-asymptotic risk bounds are tighter than the classical bounds, e.g. [25, Corollary 4.6]). For massive sample sizes N, both bounds yield the parametric rate of O(1/N1/2 ). See Subsection IV-B for theoretical and numerical demonstrations of this phenomenon. To lay the foundation, we define the agnostic, probably-approximately-correct (PAC) stat… view at source ↗

**Figure 2.** Figure 2: The distortion incurred when compressing at [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

read the original abstract

Machine learning models with inputs in a Euclidean space $\mathbb{R}^d$, when implemented on digital computers, generalize, and their generalization gap converges to $0$ at a rate of $c/N^{1/2}$ concerning the sample size $N$. However, the constant $c>0$ obtained through classical methods can be large in terms of the ambient dimension $d$ and machine precision, posing a challenge when $N$ is small to realistically large. In this paper, we derive a family of generalization bounds $\{c_m/N^{1/(2\vee m)}\}_{m=1}^{\infty}$ tailored for learning models on digital computers, which adapt to both the sample size $N$ and the so-called geometric representation dimension $m$ of the discrete learning problem. Adjusting the parameter $m$ according to $N$ results in significantly tighter generalization bounds for practical sample sizes $N$, while setting $m$ small maintains the optimal dimension-free worst-case rate of $\mathcal{O}(1/N^{1/2})$. Notably, $c_{m}\in \mathcal{O}(m^{1/2})$ for learning models on discretized Euclidean domains. Furthermore, our adaptive generalization bounds are formulated based on our new non-asymptotic result for concentration of measure in finite metric spaces, established via leveraging metric embedding arguments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives an adaptive family of generalization bounds indexed by a geometric representation dimension m, trading rate for smaller constants on discretized domains via a new finite-space concentration result.

read the letter

The main takeaway is that the authors give an adaptive family of generalization bounds for learning models on digital computers. These bounds are indexed by a parameter m they call the geometric representation dimension, yielding rates of order 1/N to the power 1 over 2 or m, with the prefactor c_m scaling as O(sqrt(m)) on discretized Euclidean domains. This is positioned as a way to get tighter bounds for practical N by choosing m appropriately while recovering the dimension-free rate when m is kept small. What is actually new is the specific family of rates that interpolate between the dimension-free 1/sqrt(N) and slower rates, together with the supporting non-asymptotic concentration result on finite metric spaces derived from metric embedding arguments. The paper does a good job highlighting the problem with large constants in classical bounds when dimension and machine precision are high. The O(sqrt(m)) claim is consistent with typical costs in embedding theory, and the overall approach avoids obvious circularity or self-referential fitting. The soft spots are around the new concentration result. The abstract presents it as established independently, but the details of how the embedding is applied to the learning problem and whether it preserves the finite-space advantages need verification in the full text. If the proof holds without extra factors, the bounds look solid. A minor point is that practical selection or estimation of m is not discussed, which could limit immediate applicability beyond theory. This paper is aimed at researchers in statistical learning theory interested in finite-sample guarantees for discrete or finite-precision models. A reader who values dimension-free bounds or concentration on finite spaces will get value from thinking about the m-parameterized family. The work demonstrates clear thinking and honest engagement with the literature on generalization bounds. I would bring this to a reading group to go over the concentration proof. I would not cite it in my own work until the new theorem is checked, but it deserves peer review because the idea is grounded and the problem it addresses is real.

Referee Report

1 major / 2 minor

Summary. The paper claims to derive a family of generalization bounds {c_m / N^{1/(2∨m)}} for learning models on digital computers. These adapt to sample size N and a geometric representation dimension m of the discrete problem, obtained from a new non-asymptotic concentration-of-measure inequality on finite metric spaces derived via metric-embedding arguments. For discretized Euclidean domains the prefactor satisfies c_m = O(m^{1/2}); choosing m as a function of N is asserted to produce tighter bounds than the classical O(1/sqrt(N)) while preserving the optimal worst-case rate for small m.

Significance. If the new concentration inequality is valid and correctly applied, the work supplies a concrete mechanism for obtaining sample-size-adaptive and representation-dimension-adaptive generalization bounds that can be substantially tighter than dimension-dependent classical bounds for moderate N. The metric-embedding route to the finite-space concentration result is a methodological strength that, if rigorous, could be reusable beyond the learning setting.

major comments (1)

[new concentration theorem and its application to generalization bounds] The central claim rests on the new non-asymptotic concentration-of-measure theorem for finite metric spaces (the result stated after the abstract and proved in the body). The proof must be checked for the precise assumptions on the metric, the embedding map, and the dependence of the constants on the representation dimension m; without an explicit verification that the discrete learning problem satisfies those assumptions, the claimed O(m^{1/2}) scaling and the applicability of the bound to digital-computer models cannot be confirmed.

minor comments (2)

[Introduction / notation] The definition and construction of the geometric representation dimension m should be stated with an explicit example (e.g., for a discretized Euclidean domain) before the statement of the main bounds.
[Preliminaries] Notation for the finite metric space and the embedding should be introduced once and used consistently; several symbols appear to be overloaded between the concentration result and the learning setting.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and for recommending major revision. The single major comment is addressed point-by-point below.

read point-by-point responses

Referee: The central claim rests on the new non-asymptotic concentration-of-measure theorem for finite metric spaces (the result stated after the abstract and proved in the body). The proof must be checked for the precise assumptions on the metric, the embedding map, and the dependence of the constants on the representation dimension m; without an explicit verification that the discrete learning problem satisfies those assumptions, the claimed O(m^{1/2}) scaling and the applicability of the bound to digital-computer models cannot be confirmed.

Authors: We thank the referee for highlighting the need for explicit verification. Theorem 1 (stated immediately after the abstract) lists the assumptions: (X, d) must be a finite metric space that admits a bi-Lipschitz embedding into a space whose concentration properties are known, with the embedding distortion and dimension controlling the dependence on m. The proof in the body uses the embedding to transfer concentration inequalities, deriving the prefactor scaling directly from the embedding parameters. Section 4 applies the result to digital-computer models by equipping the finite grid of machine-representable points with the restricted Euclidean metric; this grid is finite by construction and satisfies the embedding assumption when m is taken as the effective geometric representation dimension of the discretization. The O(m^{1/2}) bound follows from the embedding dimension and the 1-Lipschitz property of the loss. While the application section already connects the setting to the theorem assumptions, we agree that a more explicit cross-reference would strengthen the manuscript. We will therefore add a short dedicated paragraph (or table) in Section 4 that enumerates each assumption of Theorem 1 and confirms it holds for the discretized Euclidean case. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper derives a new non-asymptotic concentration-of-measure inequality for finite metric spaces via metric embedding arguments and applies it to obtain the family of generalization bounds {c_m / N^{1/(2∨m)}}. No load-bearing step reduces by construction to its own inputs, fitted parameters renamed as predictions, or self-citation chains; the claimed O(m^{1/2}) prefactor scaling follows from standard embedding-dimension costs rather than tautology. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on a new concentration result whose validity is not independently evidenced in the provided abstract, plus the introduced parameter m.

free parameters (1)

m (geometric representation dimension)
Tunable integer parameter chosen according to N to obtain the adaptive rate; no explicit fitting procedure given in abstract.

axioms (1)

domain assumption Concentration of measure holds in finite metric spaces via metric embedding arguments
Invoked as the foundation for the new non-asymptotic result (abstract).

invented entities (1)

geometric representation dimension m no independent evidence
purpose: Parameter that encodes the discrete structure of the learning problem on digital computers and controls the rate exponent
Newly introduced to define the family of bounds; no independent evidence supplied in abstract.

pith-pipeline@v0.9.0 · 5776 in / 1331 out tokens · 30037 ms · 2026-05-24T03:26:57.340413+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages · 1 internal anchor

[1]

Deep vs. shallow networks: An approximation theory perspective,

H. N. Mhaskar and T. Poggio, “Deep vs. shallow networks: An approximation theory perspective,” Analysis and Applications, vol. 14, no. 06, pp. 829–848, 2016

work page 2016
[2]

Error bounds for approximations with deep relu networks,

D. Yarotsky, “Error bounds for approximations with deep relu networks,” Neural Networks, vol. 94, pp. 103–114, 2017

work page 2017
[3]

Optimal approximation of piecewise smooth functions using deep relu neural networks,

P. Petersen and F. V oigtlaender, “Optimal approximation of piecewise smooth functions using deep relu neural networks,” Neural Networks, vol. 108, pp. 296–330, 2018

work page 2018
[4]

Optimal approximation rate of relu networks in terms of width and depth,

Z. Shen, H. Yang, and S. Zhang, “Optimal approximation rate of relu networks in terms of width and depth,” Journal de Math´ematiques Pures et Appliqu ´ees, vol. 157, pp. 101–135, 2022

work page 2022
[5]

Designing universal causal deep learning models: The geometric (hyper) transformer,

B. Acciaio, A. Kratsios, and G. Pammer, “Designing universal causal deep learning models: The geometric (hyper) transformer,” Mathematical Finance, 2023

work page 2023
[6]

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

B. Neyshabur, S. Bhojanapalli, and N. Srebro, “A pac-bayesian approach to spectrally-normalized margin bounds for neural networks,” arXiv preprint arXiv:1707.09564 , 2017. 19

work page internal anchor Pith review Pith/arXiv arXiv 2017
[7]

A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks,

——, “A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks,” in iclr, 2018

work page 2018
[8]

Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks,

P. L. Bartlett, N. Harvey, C. Liaw, and A. Mehrabian, “Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks,” Journal of Machine Learning Research , vol. 20, no. 63, pp. 1–17, 2019. [Online]. Available: http://jmlr.org/papers/v20/17-612.html

work page 2019
[9]

Spectrally-normalized margin bounds for neural networks,

P. L. Bartlett, D. J. Foster, and M. J. Telgarsky, “Spectrally-normalized margin bounds for neural networks,” Advances in neural information processing systems , vol. 30, 2017

work page 2017
[10]

Generalization error bounds via R ´enyi-, f-divergences and maximal leakage,

A. R. Esposito, M. Gastpar, and I. Issa, “Generalization error bounds via R ´enyi-, f-divergences and maximal leakage,” IEEE Trans. Inform. Theory , vol. 67, no. 8, pp. 4986–5004, 2021. [Online]. Available: https://doi.org/10.1109/TIT.2021.3085190

work page doi:10.1109/tit.2021.3085190 2021
[11]

Information-theoretic bounds on the moments of the generalization error of learning algorithms,

G. Aminian, L. Toni, and M. R. Rodrigues, “Information-theoretic bounds on the moments of the generalization error of learning algorithms,” in 2021 IEEE International Symposium on Information Theory (ISIT) . IEEE, 2021, pp. 682–687

work page 2021
[12]

The generalization error of random features regression: precise asymptotics and the double descent curve,

S. Mei and A. Montanari, “The generalization error of random features regression: precise asymptotics and the double descent curve,” Comm. Pure Appl. Math. , vol. 75, no. 4, pp. 667–766, 2022. [Online]. Available: https://doi.org/10.1002/cpa.22008

work page doi:10.1002/cpa.22008 2022
[13]

Individually conditional individual mutual information bound on generalization error,

R. Zhou, C. Tian, and T. Liu, “Individually conditional individual mutual information bound on generalization error,” IEEE Trans. Inform. Theory, vol. 68, no. 5, pp. 3304–3316, 2022. [Online]. Available: https://doi.org/10.1109/tit.2022.3144615

work page doi:10.1109/tit.2022.3144615 2022
[14]

Exactly tight information-theoretic generalization error bound for the quadratic gaussian problem,

——, “Exactly tight information-theoretic generalization error bound for the quadratic gaussian problem,” in 2023 IEEE International Symposium on Information Theory (ISIT) . IEEE, 2023, pp. 903–908

work page 2023
[15]

Approximation bounds for random neural networks and reservoir systems,

L. Gonon, L. Grigoryeva, and J.-P. Ortega, “Approximation bounds for random neural networks and reservoir systems,” The Annals of Applied Probability , vol. 33, no. 1, pp. 28–69, 2023

work page 2023
[16]

A theoretical analysis of the test error of finite-rank kernel ridge regression,

T. S. Cheng, A. Lucchi, I. Dokmani ´c, A. Kratsios, and D. Belius, “A theoretical analysis of the test error of finite-rank kernel ridge regression,” Advances in Neural Information Processing Systems , 2023

work page 2023
[17]

Stability and generalization of graph convolutional networks in eigen-domains,

M. K. Ng and A. Yip, “Stability and generalization of graph convolutional networks in eigen-domains,” Anal. Appl. (Singap.), vol. 21, no. 3, pp. 819–840, 2023. [Online]. Available: https://doi.org/10.1142/S0219530523500021

work page doi:10.1142/s0219530523500021 2023
[18]

Instance-dependent generalization bounds via optimal transport,

S. Hou, P. Kassraie, A. Kratsios, A. Krause, and J. Rothfuss, “Instance-dependent generalization bounds via optimal transport,” J. Mach. Learn. Res. , 2023

work page 2023
[19]

Benign overfitting in ridge regression,

A. Tsigler and P. L. Bartlett, “Benign overfitting in ridge regression,” J. Mach. Learn. Res. , vol. 24, pp. Paper No. [123], 76, 2023

work page 2023
[20]

On generalization bounds for deep networks based on loss surface implicit regularization,

M. Imaizumi and J. Schmidt-Hieber, “On generalization bounds for deep networks based on loss surface implicit regularization,” IEEE Trans. Inform. Theory , vol. 69, no. 2, pp. 1203–1223, 2023

work page 2023
[21]

Generalization in kernel regression under realistic assumptions,

D. Barzilai and O. Shamir, “Generalization in kernel regression under realistic assumptions,” arXiv preprint arXiv:2312.15995, 2023

work page arXiv 2023
[22]

Characterizing overfitting in kernel ridgeless regression through the eigenspectrum,

T. S. Cheng, A. Lucchi, A. Kratsios, and D. Belius, “Characterizing overfitting in kernel ridgeless regression through the eigenspectrum,” arxiv, 2024

work page 2024
[23]

What every computer scientist should know about floating-point arithmetic,

D. Goldberg, “What every computer scientist should know about floating-point arithmetic,” ACM Comput. Surv., vol. 23, no. 1, p. 5–48, mar 1991. [Online]. Available: https://doi.org/10.1145/103162.103163

work page doi:10.1145/103162.103163 1991
[24]

Muller, N

J.-M. Muller, N. Brisebarre, F. De Dinechin, C.-P. Jeannerod, V . Lefevre, G. Melquiond, N. Revol, D. Stehl ´e, S. Torres et al., Handbook of floating-point arithmetic . Springer, 2018

work page 2018
[25]

Shalev-Shwartz and S

S. Shalev-Shwartz and S. Ben-David, Understanding machine learning: From theory to algorithms. Cambridge university press, 2014

work page 2014
[26]

A. W. van der Vaart and J. A. Wellner, Weak convergence and empirical processes , ser. Springer Series in Statistics. Springer-Verlag, New York, 1996, with applications to statistics. [Online]. Available: https: //doi.org/10.1007/978-1-4757-2545-2

work page doi:10.1007/978-1-4757-2545-2 1996
[27]

On a space of totally additive functions, vestn,

L. Kantorovich and G. Rubinstein, “On a space of totally additive functions, vestn,” Vestnik Leningrad. Univ, 1958

work page 1958
[28]

Empirical measures: regularity is a counter-curse to dimensionality,

B. R. Kloeckner, “Empirical measures: regularity is a counter-curse to dimensionality,” ESAIM Probab. Stat. , vol. 24, pp. 408–434, 2020. [Online]. Available: https://doi.org/10.1051/ps/2019025

work page doi:10.1051/ps/2019025 2020
[29]

On the rate of convergence in wasserstein distance of the empirical measure,

N. Fournier and A. Guillin, “On the rate of convergence in wasserstein distance of the empirical measure,” Probability theory and related fields , vol. 162, no. 3-4, pp. 707–738, 2015

work page 2015
[30]

On optimal matchings,

M. Ajtai, J. Koml ´os, and G. Tusn ´ady, “On optimal matchings,” Combinatorica, vol. 4, no. 4, pp. 259–264, 1984. [Online]. Available: https://doi.org/10.1007/BF02579135

work page doi:10.1007/bf02579135 1984
[31]

Graf and H

S. Graf and H. Luschgy, Foundations of quantization for probability distributions , ser. Lecture Notes in Mathematics. Springer-Verlag, Berlin, 2000, vol. 1730. [Online]. Available: https://doi.org/10.1007/BFb0103945

work page doi:10.1007/bfb0103945 2000
[32]

Approximation by finitely supported measures,

B. Kloeckner, “Approximation by finitely supported measures,” ESAIM Control Optim. Calc. Var. , vol. 18, no. 2, pp. 343–359, 2012. [Online]. Available: https://doi.org/10.1051/cocv/2010100

work page doi:10.1051/cocv/2010100 2012
[33]

Convergence rate of optimal quantization and application to the clustering performance of the empirical measure,

Y . Liu and G. Pag `es, “Convergence rate of optimal quantization and application to the clustering performance of the empirical measure,” J. Mach. Learn. Res. , vol. 21, pp. Paper No. 86, 36, 2020

work page 2020
[34]

Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance,

J. Weed and F. Bach, “Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance,” Bernoulli, vol. 25, no. 4A, pp. 2620–2648, 2019. [Online]. Available: https://doi.org/10.3150/18-BEJ1065 20

work page doi:10.3150/18-bej1065 2019
[35]

On Lipschitz embedding of finite metric spaces in Hilbert space,

J. Bourgain, “On Lipschitz embedding of finite metric spaces in Hilbert space,” Israel J. Math. , vol. 52, no. 1-2, pp. 46–52, 1985. [Online]. Available: https://doi.org/10.1007/BF02776078

work page doi:10.1007/bf02776078 1985
[36]

On the distortion required for embedding finite metric spaces into normed spaces,

J. Matou ˇsek, “On the distortion required for embedding finite metric spaces into normed spaces,” Israel J. Math., vol. 93, pp. 333–344, 1996

work page 1996
[37]

Link prediction based on graph neural networks,

M. Zhang and Y . Chen, “Link prediction based on graph neural networks,” Advances in neural information processing systems, vol. 31, 2018

work page 2018
[38]

Wasserstein generative adversarial networks,

M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in International conference on machine learning. PMLR, 2017, pp. 214–223

work page 2017
[39]

Multi-marginal wasserstein gan,

J. Cao, L. Mo, Y . Zhang, K. Jia, C. Shen, and M. Tan, “Multi-marginal wasserstein gan,” Advances in Neural Information Processing Systems, vol. 32, 2019

work page 2019
[40]

Cot-gan: Generating sequential data via causal optimal transport,

T. Xu, L. K. Wenliang, M. Munn, and B. Acciaio, “Cot-gan: Generating sequential data via causal optimal transport,” Advances in neural information processing systems , vol. 33, pp. 8798–8809, 2020

work page 2020
[41]

Neural ordinary differential equations,

R. T. Chen, Y . Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neural ordinary differential equations,” Advances in neural information processing systems , vol. 31, 2018

work page 2018
[42]

Neural jump ordinary differential equations: Consistent continuous-time prediction and filtering,

C. Herrera, F. Krach, and J. Teichmann, “Neural jump ordinary differential equations: Consistent continuous-time prediction and filtering,” in International Conference on Learning Representations , 2020

work page 2020
[43]

Neural rough differential equations for long time series,

J. Morrill, C. Salvi, P. Kidger, and J. Foster, “Neural rough differential equations for long time series,” in International Conference on Machine Learning . PMLR, 2021, pp. 7829–7838

work page 2021
[44]

On universal approximation and error bounds for fourier neural operators,

N. Kovachki, S. Lanthaler, and S. Mishra, “On universal approximation and error bounds for fourier neural operators,” The Journal of Machine Learning Research , vol. 22, no. 1, pp. 13 237–13 312, 2021

work page 2021
[45]

Physics-informed machine learning,

G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, “Physics-informed machine learning,” Nature Reviews Physics , vol. 3, no. 6, pp. 422–440, 2021

work page 2021
[46]

Sharper bounds for Gaussian and empirical processes,

M. Talagrand, “Sharper bounds for Gaussian and empirical processes,” Ann. Probab., vol. 22, no. 1, pp. 28–76, 1994. [Online]. Available: http://links.jstor.org/sici?sici=0091-1798(199401)22:1⟨28:SBFGAE⟩2.0.CO;2-W&origin=MSN

work page 1994
[47]

The complexity of learning according to two models of a drifting environment,

P. M. Long, “The complexity of learning according to two models of a drifting environment,” in Proceedings of the Eleventh Annual Conference on Computational Learning Theory (Madison, WI, 1998) . ACM, New York, 1998, pp. 116–125. [Online]. Available: https://doi.org/10.1145/279943.279968

work page doi:10.1145/279943.279968 1998
[48]

Exact lower bounds for the agnostic probably-approximately-correct (PAC) machine learning model,

A. Kontorovich and I. Pinelis, “Exact lower bounds for the agnostic probably-approximately-correct (PAC) machine learning model,” Ann. Statist. , vol. 47, no. 5, pp. 2822–2854, 2019. [Online]. Available: https: //doi.org/10.1214/18-AOS1766

work page doi:10.1214/18-aos1766 2019
[49]

Concentration inequalities for the empirical distribution of discrete distributions: beyond the method of types,

J. Mardia, J. Jiao, E. T ´anczos, R. D. Nowak, and T. Weissman, “Concentration inequalities for the empirical distribution of discrete distributions: beyond the method of types,” Inf. Inference , vol. 9, no. 4, pp. 813–850, 2020. [Online]. Available: https://doi.org/10.1093/imaiai/iaz025

work page doi:10.1093/imaiai/iaz025 2020
[50]

f-divergence inequalities,

I. Sason and S. Verd ´u, “f-divergence inequalities,” IEEE Transactions on Information Theory , vol. 62, no. 11, pp. 5973– 6006, 2016

work page 2016
[51]

Inference for empirical wasserstein distances on finite spaces,

M. Sommerfeld and A. Munk, “Inference for empirical wasserstein distances on finite spaces,” Journal of the Royal Statistical Society. Series B (Statistical Methodology) , vol. 80, no. 1, pp. 219–238, 2018

work page 2018
[52]

On the mean speed of convergence of empirical and occupation measures in wasserstein distance,

E. Boissard and T. Le Gouic, “On the mean speed of convergence of empirical and occupation measures in wasserstein distance,” Annales de l’IHP Probabilit ´es et statistiques , vol. 50, no. 2, pp. 539–563, 2014

work page 2014
[53]

The least doubling constant of a path graph,

E. Durand-Cartagena, J. Soria, and P. Tradacete, “The least doubling constant of a path graph,” arXiv preprint arXiv:2111.09196, 2021

work page arXiv 2021
[54]

Doubling constants and spectral theory on graphs,

——, “Doubling constants and spectral theory on graphs,” Discrete Math., vol. 346, no. 6, pp. Paper No. 113 354, 17,

work page
[55]

Available: https://doi.org/10.1016/j.disc.2023.113354

[Online]. Available: https://doi.org/10.1016/j.disc.2023.113354

work page doi:10.1016/j.disc.2023.113354 2023
[56]

Small transformers compute universal metric embeddings,

A. Kratsios, V . Debarnot, and I. Dokmani ´c, “Small transformers compute universal metric embeddings,” Journal of Machine Learning Research , 2023

work page 2023
[57]

Approximating a bandlimited function using very coarsely quantized data: a family of stable sigma-delta modulators of arbitrary order,

I. Daubechies and R. DeV ore, “Approximating a bandlimited function using very coarsely quantized data: a family of stable sigma-delta modulators of arbitrary order,” Ann. of Math. (2) , vol. 158, no. 2, pp. 679–710, 2003. [Online]. Available: https://doi.org/10.4007/annals.2003.158.679

work page doi:10.4007/annals.2003.158.679 2003
[58]

Approximating a bandlimited function using very coarsely quantized data: improved error estimates in sigma-delta modulation,

C. S. G ¨unt¨urk, “Approximating a bandlimited function using very coarsely quantized data: improved error estimates in sigma-delta modulation,” J. Amer. Math. Soc. , vol. 17, no. 1, pp. 229–242, 2004. [Online]. Available: https://doi.org/10.1090/S0894-0347-03-00436-3

work page doi:10.1090/s0894-0347-03-00436-3 2004
[59]

Phase transitions in rate distortion theory and deep learning,

P. Grohs, A. Klotz, and F. V oigtlaender, “Phase transitions in rate distortion theory and deep learning,” Found. Comput. Math., vol. 23, no. 1, pp. 329–392, 2023. [Online]. Available: https://doi.org/10.1007/s10208-021-09546-4

work page doi:10.1007/s10208-021-09546-4 2023
[60]

Inverse problems are solvable on real number signal processing hardware,

H. Boche, A. Fono, and G. Kutyniok, “Inverse problems are solvable on real number signal processing hardware,” arXiv preprint arXiv:2204.02066, 2022

work page arXiv 2022
[61]

Limitations of deep learning for inverse problems on digital hardware,

——, “Limitations of deep learning for inverse problems on digital hardware,” arXiv preprint arXiv:2202.13490 , 2022

work page arXiv 2022
[62]

Non-computability of the pseudoinverse on digital computers,

——, “Non-computability of the pseudoinverse on digital computers,” arXiv preprint arXiv:2212.02940 , 2022

work page arXiv 2022
[63]

Expressive power of relu and step networks under floating-point operations,

Y . Park, G. Hwang, W. Lee, and S. Park, “Expressive power of relu and step networks under floating-point operations,” 21 arXiv preprint arXiv:2401.15121 , 2024

work page arXiv 2024
[64]

Bi-Lipschitz embeddings into low-dimensional Euclidean spaces,

J. Matou ˇsek, “Bi-Lipschitz embeddings into low-dimensional Euclidean spaces,” Comment. Math. Univ. Carolin., vol. 31, no. 3, pp. 589–600, 1990

work page 1990
[65]

, year =

C. Villani, Optimal transport , ser. Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 2009, vol. 338, old and new. [Online]. Available: https://doi.org/10.1007/978-3-540-71050-9

work page doi:10.1007/978-3-540-71050-9 2009
[66]

D. P. Dubhashi and A. Panconesi, Concentration of measure for the analysis of randomized algorithms . Cambridge University Press, Cambridge, 2009. [Online]. Available: https://doi-org.libaccess.lib.mcmaster.ca/10.1017/ CBO9780511581274

work page 2009
[67]

Extensions of Lipschitz ma ps into a Hilbert space.Conference in modern analysis and pr obability (New Haven, Conn., 1982), Contemp

W. B. Johnson and J. Lindenstrauss, “Extensions of Lipschitz mappings into a Hilbert space,” in Conference in modern analysis and probability (New Haven, Conn., 1982) , ser. Contemp. Math. Amer. Math. Soc., Providence, RI, 1984, vol. 26, pp. 189–206. [Online]. Available: https://doi.org/10.1090/conm/026/737400

work page doi:10.1090/conm/026/737400 1982
[68]

Matou ˇsek, Lectures on discrete geometry, ser

J. Matou ˇsek, Lectures on discrete geometry, ser. Graduate Texts in Mathematics. Springer-Verlag, New York, 2002, vol. 212

work page 2002
[69]

Rademacher and Gaussian complexities: risk bounds and structural results,

P. L. Bartlett and S. Mendelson, “Rademacher and Gaussian complexities: risk bounds and structural results,” J. Mach. Learn. Res. , vol. 3, no. Spec. Issue Comput. Learn. Theory, pp. 463–482, 2002. [Online]. Available: https://doi.org/10.1162/153244303321897690

work page doi:10.1162/153244303321897690 2002
[70]

Local Rademacher complexities,

P. L. Bartlett, O. Bousquet, and S. Mendelson, “Local Rademacher complexities,” Ann. Statist. , vol. 33, no. 4, pp. 1497–1537, 2005. [Online]. Available: https://doi.org/10.1214/009053605000000282

work page doi:10.1214/009053605000000282 2005
[71]

Designing universal causal deep learning models: The case of infinite- dimensional dynamical systems from stochastic analysis,

L. Galimberti, A. Kratsios, and G. Livieri, “Designing universal causal deep learning models: The case of infinite- dimensional dynamical systems from stochastic analysis,” arXiv preprint arXiv:2210.13300 , 2022

work page arXiv 2022
[72]

G. G. Lorentz, M. v. Golitschek, and Y . Makovoz, Constructive approximation , ser. Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1996, vol. 304, advanced problems. [Online]. Available: https://doi.org/10.1007/978-3-642-60932-9

work page doi:10.1007/978-3-642-60932-9 1996
[73]

Optimal learners for realizable regression: Pac learning and online learning,

I. Attias, S. Hanneke, A. Kalavasis, A. Karbasi, and G. Velegkas, “Optimal learners for realizable regression: Pac learning and online learning,” in Thirty-seventh Conference on Neural Information Processing Systems , 2023

work page 2023
[74]

Nonlinear approximation and (deep) relu networks,

I. Daubechies, R. DeV ore, S. Foucart, B. Hanin, and G. Petrova, “Nonlinear approximation and (deep) relu networks,” Constructive Approximation, vol. 55, no. 1, pp. 127–172, 2022

work page 2022
[75]

Vershynin, High-dimensional probability, ser

R. Vershynin, High-dimensional probability, ser. Cambridge Series in Statistical and Probabilistic Mathematics. Cam- bridge University Press, Cambridge, 2018, vol. 47, an introduction with applications in data science, With a foreword by Sara van de Geer

work page 2018
[76]

Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data,

G. K. Dziugaite and D. M. Roy, “Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data,” Uncertainty in Artificial Intelligence , 2017

work page 2017
[77]

Matou ˇsek, Lectures on discrete geometry , ser

J. Matou ˇsek, Lectures on discrete geometry , ser. Graduate Texts in Mathematics. Springer-Verlag, New York, 2002, vol. 212. [Online]. Available: https://doi.org/10.1007/978-1-4613-0039-7

work page doi:10.1007/978-1-4613-0039-7 2002

[1] [1]

Deep vs. shallow networks: An approximation theory perspective,

H. N. Mhaskar and T. Poggio, “Deep vs. shallow networks: An approximation theory perspective,” Analysis and Applications, vol. 14, no. 06, pp. 829–848, 2016

work page 2016

[2] [2]

Error bounds for approximations with deep relu networks,

D. Yarotsky, “Error bounds for approximations with deep relu networks,” Neural Networks, vol. 94, pp. 103–114, 2017

work page 2017

[3] [3]

Optimal approximation of piecewise smooth functions using deep relu neural networks,

P. Petersen and F. V oigtlaender, “Optimal approximation of piecewise smooth functions using deep relu neural networks,” Neural Networks, vol. 108, pp. 296–330, 2018

work page 2018

[4] [4]

Optimal approximation rate of relu networks in terms of width and depth,

Z. Shen, H. Yang, and S. Zhang, “Optimal approximation rate of relu networks in terms of width and depth,” Journal de Math´ematiques Pures et Appliqu ´ees, vol. 157, pp. 101–135, 2022

work page 2022

[5] [5]

Designing universal causal deep learning models: The geometric (hyper) transformer,

B. Acciaio, A. Kratsios, and G. Pammer, “Designing universal causal deep learning models: The geometric (hyper) transformer,” Mathematical Finance, 2023

work page 2023

[6] [6]

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

B. Neyshabur, S. Bhojanapalli, and N. Srebro, “A pac-bayesian approach to spectrally-normalized margin bounds for neural networks,” arXiv preprint arXiv:1707.09564 , 2017. 19

work page internal anchor Pith review Pith/arXiv arXiv 2017

[7] [7]

A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks,

——, “A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks,” in iclr, 2018

work page 2018

[8] [8]

Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks,

P. L. Bartlett, N. Harvey, C. Liaw, and A. Mehrabian, “Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks,” Journal of Machine Learning Research , vol. 20, no. 63, pp. 1–17, 2019. [Online]. Available: http://jmlr.org/papers/v20/17-612.html

work page 2019

[9] [9]

Spectrally-normalized margin bounds for neural networks,

P. L. Bartlett, D. J. Foster, and M. J. Telgarsky, “Spectrally-normalized margin bounds for neural networks,” Advances in neural information processing systems , vol. 30, 2017

work page 2017

[10] [10]

Generalization error bounds via R ´enyi-, f-divergences and maximal leakage,

A. R. Esposito, M. Gastpar, and I. Issa, “Generalization error bounds via R ´enyi-, f-divergences and maximal leakage,” IEEE Trans. Inform. Theory , vol. 67, no. 8, pp. 4986–5004, 2021. [Online]. Available: https://doi.org/10.1109/TIT.2021.3085190

work page doi:10.1109/tit.2021.3085190 2021

[11] [11]

Information-theoretic bounds on the moments of the generalization error of learning algorithms,

G. Aminian, L. Toni, and M. R. Rodrigues, “Information-theoretic bounds on the moments of the generalization error of learning algorithms,” in 2021 IEEE International Symposium on Information Theory (ISIT) . IEEE, 2021, pp. 682–687

work page 2021

[12] [12]

The generalization error of random features regression: precise asymptotics and the double descent curve,

S. Mei and A. Montanari, “The generalization error of random features regression: precise asymptotics and the double descent curve,” Comm. Pure Appl. Math. , vol. 75, no. 4, pp. 667–766, 2022. [Online]. Available: https://doi.org/10.1002/cpa.22008

work page doi:10.1002/cpa.22008 2022

[13] [13]

Individually conditional individual mutual information bound on generalization error,

R. Zhou, C. Tian, and T. Liu, “Individually conditional individual mutual information bound on generalization error,” IEEE Trans. Inform. Theory, vol. 68, no. 5, pp. 3304–3316, 2022. [Online]. Available: https://doi.org/10.1109/tit.2022.3144615

work page doi:10.1109/tit.2022.3144615 2022

[14] [14]

Exactly tight information-theoretic generalization error bound for the quadratic gaussian problem,

——, “Exactly tight information-theoretic generalization error bound for the quadratic gaussian problem,” in 2023 IEEE International Symposium on Information Theory (ISIT) . IEEE, 2023, pp. 903–908

work page 2023

[15] [15]

Approximation bounds for random neural networks and reservoir systems,

L. Gonon, L. Grigoryeva, and J.-P. Ortega, “Approximation bounds for random neural networks and reservoir systems,” The Annals of Applied Probability , vol. 33, no. 1, pp. 28–69, 2023

work page 2023

[16] [16]

A theoretical analysis of the test error of finite-rank kernel ridge regression,

T. S. Cheng, A. Lucchi, I. Dokmani ´c, A. Kratsios, and D. Belius, “A theoretical analysis of the test error of finite-rank kernel ridge regression,” Advances in Neural Information Processing Systems , 2023

work page 2023

[17] [17]

Stability and generalization of graph convolutional networks in eigen-domains,

M. K. Ng and A. Yip, “Stability and generalization of graph convolutional networks in eigen-domains,” Anal. Appl. (Singap.), vol. 21, no. 3, pp. 819–840, 2023. [Online]. Available: https://doi.org/10.1142/S0219530523500021

work page doi:10.1142/s0219530523500021 2023

[18] [18]

Instance-dependent generalization bounds via optimal transport,

S. Hou, P. Kassraie, A. Kratsios, A. Krause, and J. Rothfuss, “Instance-dependent generalization bounds via optimal transport,” J. Mach. Learn. Res. , 2023

work page 2023

[19] [19]

Benign overfitting in ridge regression,

A. Tsigler and P. L. Bartlett, “Benign overfitting in ridge regression,” J. Mach. Learn. Res. , vol. 24, pp. Paper No. [123], 76, 2023

work page 2023

[20] [20]

On generalization bounds for deep networks based on loss surface implicit regularization,

M. Imaizumi and J. Schmidt-Hieber, “On generalization bounds for deep networks based on loss surface implicit regularization,” IEEE Trans. Inform. Theory , vol. 69, no. 2, pp. 1203–1223, 2023

work page 2023

[21] [21]

Generalization in kernel regression under realistic assumptions,

D. Barzilai and O. Shamir, “Generalization in kernel regression under realistic assumptions,” arXiv preprint arXiv:2312.15995, 2023

work page arXiv 2023

[22] [22]

Characterizing overfitting in kernel ridgeless regression through the eigenspectrum,

T. S. Cheng, A. Lucchi, A. Kratsios, and D. Belius, “Characterizing overfitting in kernel ridgeless regression through the eigenspectrum,” arxiv, 2024

work page 2024

[23] [23]

What every computer scientist should know about floating-point arithmetic,

D. Goldberg, “What every computer scientist should know about floating-point arithmetic,” ACM Comput. Surv., vol. 23, no. 1, p. 5–48, mar 1991. [Online]. Available: https://doi.org/10.1145/103162.103163

work page doi:10.1145/103162.103163 1991

[24] [24]

Muller, N

J.-M. Muller, N. Brisebarre, F. De Dinechin, C.-P. Jeannerod, V . Lefevre, G. Melquiond, N. Revol, D. Stehl ´e, S. Torres et al., Handbook of floating-point arithmetic . Springer, 2018

work page 2018

[25] [25]

Shalev-Shwartz and S

S. Shalev-Shwartz and S. Ben-David, Understanding machine learning: From theory to algorithms. Cambridge university press, 2014

work page 2014

[26] [26]

A. W. van der Vaart and J. A. Wellner, Weak convergence and empirical processes , ser. Springer Series in Statistics. Springer-Verlag, New York, 1996, with applications to statistics. [Online]. Available: https: //doi.org/10.1007/978-1-4757-2545-2

work page doi:10.1007/978-1-4757-2545-2 1996

[27] [27]

On a space of totally additive functions, vestn,

L. Kantorovich and G. Rubinstein, “On a space of totally additive functions, vestn,” Vestnik Leningrad. Univ, 1958

work page 1958

[28] [28]

Empirical measures: regularity is a counter-curse to dimensionality,

B. R. Kloeckner, “Empirical measures: regularity is a counter-curse to dimensionality,” ESAIM Probab. Stat. , vol. 24, pp. 408–434, 2020. [Online]. Available: https://doi.org/10.1051/ps/2019025

work page doi:10.1051/ps/2019025 2020

[29] [29]

On the rate of convergence in wasserstein distance of the empirical measure,

N. Fournier and A. Guillin, “On the rate of convergence in wasserstein distance of the empirical measure,” Probability theory and related fields , vol. 162, no. 3-4, pp. 707–738, 2015

work page 2015

[30] [30]

On optimal matchings,

M. Ajtai, J. Koml ´os, and G. Tusn ´ady, “On optimal matchings,” Combinatorica, vol. 4, no. 4, pp. 259–264, 1984. [Online]. Available: https://doi.org/10.1007/BF02579135

work page doi:10.1007/bf02579135 1984

[31] [31]

Graf and H

S. Graf and H. Luschgy, Foundations of quantization for probability distributions , ser. Lecture Notes in Mathematics. Springer-Verlag, Berlin, 2000, vol. 1730. [Online]. Available: https://doi.org/10.1007/BFb0103945

work page doi:10.1007/bfb0103945 2000

[32] [32]

Approximation by finitely supported measures,

B. Kloeckner, “Approximation by finitely supported measures,” ESAIM Control Optim. Calc. Var. , vol. 18, no. 2, pp. 343–359, 2012. [Online]. Available: https://doi.org/10.1051/cocv/2010100

work page doi:10.1051/cocv/2010100 2012

[33] [33]

Convergence rate of optimal quantization and application to the clustering performance of the empirical measure,

Y . Liu and G. Pag `es, “Convergence rate of optimal quantization and application to the clustering performance of the empirical measure,” J. Mach. Learn. Res. , vol. 21, pp. Paper No. 86, 36, 2020

work page 2020

[34] [34]

Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance,

J. Weed and F. Bach, “Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance,” Bernoulli, vol. 25, no. 4A, pp. 2620–2648, 2019. [Online]. Available: https://doi.org/10.3150/18-BEJ1065 20

work page doi:10.3150/18-bej1065 2019

[35] [35]

On Lipschitz embedding of finite metric spaces in Hilbert space,

J. Bourgain, “On Lipschitz embedding of finite metric spaces in Hilbert space,” Israel J. Math. , vol. 52, no. 1-2, pp. 46–52, 1985. [Online]. Available: https://doi.org/10.1007/BF02776078

work page doi:10.1007/bf02776078 1985

[36] [36]

On the distortion required for embedding finite metric spaces into normed spaces,

J. Matou ˇsek, “On the distortion required for embedding finite metric spaces into normed spaces,” Israel J. Math., vol. 93, pp. 333–344, 1996

work page 1996

[37] [37]

Link prediction based on graph neural networks,

M. Zhang and Y . Chen, “Link prediction based on graph neural networks,” Advances in neural information processing systems, vol. 31, 2018

work page 2018

[38] [38]

Wasserstein generative adversarial networks,

M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in International conference on machine learning. PMLR, 2017, pp. 214–223

work page 2017

[39] [39]

Multi-marginal wasserstein gan,

J. Cao, L. Mo, Y . Zhang, K. Jia, C. Shen, and M. Tan, “Multi-marginal wasserstein gan,” Advances in Neural Information Processing Systems, vol. 32, 2019

work page 2019

[40] [40]

Cot-gan: Generating sequential data via causal optimal transport,

T. Xu, L. K. Wenliang, M. Munn, and B. Acciaio, “Cot-gan: Generating sequential data via causal optimal transport,” Advances in neural information processing systems , vol. 33, pp. 8798–8809, 2020

work page 2020

[41] [41]

Neural ordinary differential equations,

R. T. Chen, Y . Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neural ordinary differential equations,” Advances in neural information processing systems , vol. 31, 2018

work page 2018

[42] [42]

Neural jump ordinary differential equations: Consistent continuous-time prediction and filtering,

C. Herrera, F. Krach, and J. Teichmann, “Neural jump ordinary differential equations: Consistent continuous-time prediction and filtering,” in International Conference on Learning Representations , 2020

work page 2020

[43] [43]

Neural rough differential equations for long time series,

J. Morrill, C. Salvi, P. Kidger, and J. Foster, “Neural rough differential equations for long time series,” in International Conference on Machine Learning . PMLR, 2021, pp. 7829–7838

work page 2021

[44] [44]

On universal approximation and error bounds for fourier neural operators,

N. Kovachki, S. Lanthaler, and S. Mishra, “On universal approximation and error bounds for fourier neural operators,” The Journal of Machine Learning Research , vol. 22, no. 1, pp. 13 237–13 312, 2021

work page 2021

[45] [45]

Physics-informed machine learning,

G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, “Physics-informed machine learning,” Nature Reviews Physics , vol. 3, no. 6, pp. 422–440, 2021

work page 2021

[46] [46]

Sharper bounds for Gaussian and empirical processes,

M. Talagrand, “Sharper bounds for Gaussian and empirical processes,” Ann. Probab., vol. 22, no. 1, pp. 28–76, 1994. [Online]. Available: http://links.jstor.org/sici?sici=0091-1798(199401)22:1⟨28:SBFGAE⟩2.0.CO;2-W&origin=MSN

work page 1994

[47] [47]

The complexity of learning according to two models of a drifting environment,

P. M. Long, “The complexity of learning according to two models of a drifting environment,” in Proceedings of the Eleventh Annual Conference on Computational Learning Theory (Madison, WI, 1998) . ACM, New York, 1998, pp. 116–125. [Online]. Available: https://doi.org/10.1145/279943.279968

work page doi:10.1145/279943.279968 1998

[48] [48]

Exact lower bounds for the agnostic probably-approximately-correct (PAC) machine learning model,

A. Kontorovich and I. Pinelis, “Exact lower bounds for the agnostic probably-approximately-correct (PAC) machine learning model,” Ann. Statist. , vol. 47, no. 5, pp. 2822–2854, 2019. [Online]. Available: https: //doi.org/10.1214/18-AOS1766

work page doi:10.1214/18-aos1766 2019

[49] [49]

Concentration inequalities for the empirical distribution of discrete distributions: beyond the method of types,

J. Mardia, J. Jiao, E. T ´anczos, R. D. Nowak, and T. Weissman, “Concentration inequalities for the empirical distribution of discrete distributions: beyond the method of types,” Inf. Inference , vol. 9, no. 4, pp. 813–850, 2020. [Online]. Available: https://doi.org/10.1093/imaiai/iaz025

work page doi:10.1093/imaiai/iaz025 2020

[50] [50]

f-divergence inequalities,

I. Sason and S. Verd ´u, “f-divergence inequalities,” IEEE Transactions on Information Theory , vol. 62, no. 11, pp. 5973– 6006, 2016

work page 2016

[51] [51]

Inference for empirical wasserstein distances on finite spaces,

M. Sommerfeld and A. Munk, “Inference for empirical wasserstein distances on finite spaces,” Journal of the Royal Statistical Society. Series B (Statistical Methodology) , vol. 80, no. 1, pp. 219–238, 2018

work page 2018

[52] [52]

On the mean speed of convergence of empirical and occupation measures in wasserstein distance,

E. Boissard and T. Le Gouic, “On the mean speed of convergence of empirical and occupation measures in wasserstein distance,” Annales de l’IHP Probabilit ´es et statistiques , vol. 50, no. 2, pp. 539–563, 2014

work page 2014

[53] [53]

The least doubling constant of a path graph,

E. Durand-Cartagena, J. Soria, and P. Tradacete, “The least doubling constant of a path graph,” arXiv preprint arXiv:2111.09196, 2021

work page arXiv 2021

[54] [54]

Doubling constants and spectral theory on graphs,

——, “Doubling constants and spectral theory on graphs,” Discrete Math., vol. 346, no. 6, pp. Paper No. 113 354, 17,

work page

[55] [55]

Available: https://doi.org/10.1016/j.disc.2023.113354

[Online]. Available: https://doi.org/10.1016/j.disc.2023.113354

work page doi:10.1016/j.disc.2023.113354 2023

[56] [56]

Small transformers compute universal metric embeddings,

A. Kratsios, V . Debarnot, and I. Dokmani ´c, “Small transformers compute universal metric embeddings,” Journal of Machine Learning Research , 2023

work page 2023

[57] [57]

Approximating a bandlimited function using very coarsely quantized data: a family of stable sigma-delta modulators of arbitrary order,

I. Daubechies and R. DeV ore, “Approximating a bandlimited function using very coarsely quantized data: a family of stable sigma-delta modulators of arbitrary order,” Ann. of Math. (2) , vol. 158, no. 2, pp. 679–710, 2003. [Online]. Available: https://doi.org/10.4007/annals.2003.158.679

work page doi:10.4007/annals.2003.158.679 2003

[58] [58]

Approximating a bandlimited function using very coarsely quantized data: improved error estimates in sigma-delta modulation,

C. S. G ¨unt¨urk, “Approximating a bandlimited function using very coarsely quantized data: improved error estimates in sigma-delta modulation,” J. Amer. Math. Soc. , vol. 17, no. 1, pp. 229–242, 2004. [Online]. Available: https://doi.org/10.1090/S0894-0347-03-00436-3

work page doi:10.1090/s0894-0347-03-00436-3 2004

[59] [59]

Phase transitions in rate distortion theory and deep learning,

P. Grohs, A. Klotz, and F. V oigtlaender, “Phase transitions in rate distortion theory and deep learning,” Found. Comput. Math., vol. 23, no. 1, pp. 329–392, 2023. [Online]. Available: https://doi.org/10.1007/s10208-021-09546-4

work page doi:10.1007/s10208-021-09546-4 2023

[60] [60]

Inverse problems are solvable on real number signal processing hardware,

H. Boche, A. Fono, and G. Kutyniok, “Inverse problems are solvable on real number signal processing hardware,” arXiv preprint arXiv:2204.02066, 2022

work page arXiv 2022

[61] [61]

Limitations of deep learning for inverse problems on digital hardware,

——, “Limitations of deep learning for inverse problems on digital hardware,” arXiv preprint arXiv:2202.13490 , 2022

work page arXiv 2022

[62] [62]

Non-computability of the pseudoinverse on digital computers,

——, “Non-computability of the pseudoinverse on digital computers,” arXiv preprint arXiv:2212.02940 , 2022

work page arXiv 2022

[63] [63]

Expressive power of relu and step networks under floating-point operations,

Y . Park, G. Hwang, W. Lee, and S. Park, “Expressive power of relu and step networks under floating-point operations,” 21 arXiv preprint arXiv:2401.15121 , 2024

work page arXiv 2024

[64] [64]

Bi-Lipschitz embeddings into low-dimensional Euclidean spaces,

J. Matou ˇsek, “Bi-Lipschitz embeddings into low-dimensional Euclidean spaces,” Comment. Math. Univ. Carolin., vol. 31, no. 3, pp. 589–600, 1990

work page 1990

[65] [65]

, year =

C. Villani, Optimal transport , ser. Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 2009, vol. 338, old and new. [Online]. Available: https://doi.org/10.1007/978-3-540-71050-9

work page doi:10.1007/978-3-540-71050-9 2009

[66] [66]

D. P. Dubhashi and A. Panconesi, Concentration of measure for the analysis of randomized algorithms . Cambridge University Press, Cambridge, 2009. [Online]. Available: https://doi-org.libaccess.lib.mcmaster.ca/10.1017/ CBO9780511581274

work page 2009

[67] [67]

Extensions of Lipschitz ma ps into a Hilbert space.Conference in modern analysis and pr obability (New Haven, Conn., 1982), Contemp

W. B. Johnson and J. Lindenstrauss, “Extensions of Lipschitz mappings into a Hilbert space,” in Conference in modern analysis and probability (New Haven, Conn., 1982) , ser. Contemp. Math. Amer. Math. Soc., Providence, RI, 1984, vol. 26, pp. 189–206. [Online]. Available: https://doi.org/10.1090/conm/026/737400

work page doi:10.1090/conm/026/737400 1982

[68] [68]

Matou ˇsek, Lectures on discrete geometry, ser

J. Matou ˇsek, Lectures on discrete geometry, ser. Graduate Texts in Mathematics. Springer-Verlag, New York, 2002, vol. 212

work page 2002

[69] [69]

Rademacher and Gaussian complexities: risk bounds and structural results,

P. L. Bartlett and S. Mendelson, “Rademacher and Gaussian complexities: risk bounds and structural results,” J. Mach. Learn. Res. , vol. 3, no. Spec. Issue Comput. Learn. Theory, pp. 463–482, 2002. [Online]. Available: https://doi.org/10.1162/153244303321897690

work page doi:10.1162/153244303321897690 2002

[70] [70]

Local Rademacher complexities,

P. L. Bartlett, O. Bousquet, and S. Mendelson, “Local Rademacher complexities,” Ann. Statist. , vol. 33, no. 4, pp. 1497–1537, 2005. [Online]. Available: https://doi.org/10.1214/009053605000000282

work page doi:10.1214/009053605000000282 2005

[71] [71]

Designing universal causal deep learning models: The case of infinite- dimensional dynamical systems from stochastic analysis,

L. Galimberti, A. Kratsios, and G. Livieri, “Designing universal causal deep learning models: The case of infinite- dimensional dynamical systems from stochastic analysis,” arXiv preprint arXiv:2210.13300 , 2022

work page arXiv 2022

[72] [72]

G. G. Lorentz, M. v. Golitschek, and Y . Makovoz, Constructive approximation , ser. Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1996, vol. 304, advanced problems. [Online]. Available: https://doi.org/10.1007/978-3-642-60932-9

work page doi:10.1007/978-3-642-60932-9 1996

[73] [73]

Optimal learners for realizable regression: Pac learning and online learning,

I. Attias, S. Hanneke, A. Kalavasis, A. Karbasi, and G. Velegkas, “Optimal learners for realizable regression: Pac learning and online learning,” in Thirty-seventh Conference on Neural Information Processing Systems , 2023

work page 2023

[74] [74]

Nonlinear approximation and (deep) relu networks,

I. Daubechies, R. DeV ore, S. Foucart, B. Hanin, and G. Petrova, “Nonlinear approximation and (deep) relu networks,” Constructive Approximation, vol. 55, no. 1, pp. 127–172, 2022

work page 2022

[75] [75]

Vershynin, High-dimensional probability, ser

R. Vershynin, High-dimensional probability, ser. Cambridge Series in Statistical and Probabilistic Mathematics. Cam- bridge University Press, Cambridge, 2018, vol. 47, an introduction with applications in data science, With a foreword by Sara van de Geer

work page 2018

[76] [76]

Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data,

G. K. Dziugaite and D. M. Roy, “Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data,” Uncertainty in Artificial Intelligence , 2017

work page 2017

[77] [77]

Matou ˇsek, Lectures on discrete geometry , ser

J. Matou ˇsek, Lectures on discrete geometry , ser. Graduate Texts in Mathematics. Springer-Verlag, New York, 2002, vol. 212. [Online]. Available: https://doi.org/10.1007/978-1-4613-0039-7

work page doi:10.1007/978-1-4613-0039-7 2002