pith. sign in

arxiv: 2402.05576 · v4 · pith:V3ZUVYL4new · submitted 2024-02-08 · 💻 cs.LG

Tighter Learning Guarantees on Digital Computers via Concentration of Measure on Finite Spaces

Pith reviewed 2026-05-24 03:26 UTC · model grok-4.3

classification 💻 cs.LG
keywords generalization boundsconcentration of measurefinite metric spacesdigital computersmachine learninggeometric representation dimensionmetric embeddings
0
0 comments X

The pith

Learning models on digital computers admit generalization bounds that adapt to a geometric representation dimension m, achieving rates of order 1 over N to the power 1 over 2 or m with constants of order square root of m.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives a family of generalization bounds for machine learning models implemented on digital computers. These bounds take the form c_m over N raised to the power 1 over 2 or m, where m indexes the geometric representation dimension of the underlying discrete problem. Adjusting m to the sample size N produces tighter guarantees for realistic values of N, while small m recovers the optimal dimension-free rate of order 1 over square root of N. The bounds rest on a new non-asymptotic concentration-of-measure inequality for finite metric spaces obtained by metric embedding arguments. Classical bounds suffer from constants that grow with ambient dimension and machine precision, so the new family addresses a practical gap when N is small to moderate.

Core claim

We derive a family of generalization bounds {c_m/N^{1/(2∨m)}}_{m=1}^∞ tailored for learning models on digital computers, which adapt to both the sample size N and the geometric representation dimension m of the discrete learning problem. Adjusting the parameter m according to N results in significantly tighter generalization bounds for practical sample sizes N, while setting m small maintains the optimal dimension-free worst-case rate of O(1/N^{1/2}). Notably, c_m ∈ O(m^{1/2}) for learning models on discretized Euclidean domains. Furthermore, our adaptive generalization bounds are formulated based on our new non-asymptotic result for concentration of measure in finite metric spaces, obtained

What carries the argument

A new non-asymptotic concentration of measure inequality for finite metric spaces, derived via metric embedding arguments, which directly yields the parameterized family of generalization bounds.

If this is right

  • For any fixed m the generalization gap converges at rate 1/N^{1/(2∨m)}
  • Choosing m depending on N produces tighter numerical bounds for practical sample sizes
  • Small m recovers the dimension-free rate O(1/N^{1/2})
  • On discretized Euclidean domains the prefactor c_m scales at most like sqrt(m)
  • The bounds apply directly to models whose inputs are Euclidean but whose implementation is finite-precision

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The embedding technique may allow similar adaptive bounds for other finite-precision or quantized architectures beyond Euclidean discretization
  • Optimal m could be chosen from data alone by monitoring how the observed gap behaves across a grid of m values
  • The same concentration result may tighten analyses of generalization for models trained with finite-precision arithmetic even when the target function is continuous
  • If the inequality holds, it suggests that the effective dimension m can serve as a practical proxy for the complexity induced by machine precision

Load-bearing premise

The new concentration of measure result for finite metric spaces holds when applied to the discrete learning problem arising from digital implementation.

What would settle it

An explicit finite metric space arising from discretization of a Euclidean domain together with a learning model whose generalization gap exceeds every c_m / N^{1/(2∨m)} for all m, or a direct counterexample to the claimed concentration inequality on that space.

Figures

Figures reproduced from arXiv: 2402.05576 by A. Martina Neuman, Anastasis Kratsios, Gudmund Pammer.

Figure 1
Figure 1. Figure 1: When the sample size (N) is small-to-realistically-large, our non-asymptotic risk bounds are tighter than the classical bounds, e.g. [25, Corollary 4.6]). For massive sample sizes N, both bounds yield the parametric rate of O(1/N1/2 ). See Subsection IV-B for theoretical and numerical demonstrations of this phenomenon. To lay the foundation, we define the agnostic, probably-approximately-correct (PAC) stat… view at source ↗
Figure 2
Figure 2. Figure 2: The distortion incurred when compressing at [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
read the original abstract

Machine learning models with inputs in a Euclidean space $\mathbb{R}^d$, when implemented on digital computers, generalize, and their generalization gap converges to $0$ at a rate of $c/N^{1/2}$ concerning the sample size $N$. However, the constant $c>0$ obtained through classical methods can be large in terms of the ambient dimension $d$ and machine precision, posing a challenge when $N$ is small to realistically large. In this paper, we derive a family of generalization bounds $\{c_m/N^{1/(2\vee m)}\}_{m=1}^{\infty}$ tailored for learning models on digital computers, which adapt to both the sample size $N$ and the so-called geometric representation dimension $m$ of the discrete learning problem. Adjusting the parameter $m$ according to $N$ results in significantly tighter generalization bounds for practical sample sizes $N$, while setting $m$ small maintains the optimal dimension-free worst-case rate of $\mathcal{O}(1/N^{1/2})$. Notably, $c_{m}\in \mathcal{O}(m^{1/2})$ for learning models on discretized Euclidean domains. Furthermore, our adaptive generalization bounds are formulated based on our new non-asymptotic result for concentration of measure in finite metric spaces, established via leveraging metric embedding arguments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper claims to derive a family of generalization bounds {c_m / N^{1/(2∨m)}} for learning models on digital computers. These adapt to sample size N and a geometric representation dimension m of the discrete problem, obtained from a new non-asymptotic concentration-of-measure inequality on finite metric spaces derived via metric-embedding arguments. For discretized Euclidean domains the prefactor satisfies c_m = O(m^{1/2}); choosing m as a function of N is asserted to produce tighter bounds than the classical O(1/sqrt(N)) while preserving the optimal worst-case rate for small m.

Significance. If the new concentration inequality is valid and correctly applied, the work supplies a concrete mechanism for obtaining sample-size-adaptive and representation-dimension-adaptive generalization bounds that can be substantially tighter than dimension-dependent classical bounds for moderate N. The metric-embedding route to the finite-space concentration result is a methodological strength that, if rigorous, could be reusable beyond the learning setting.

major comments (1)
  1. [new concentration theorem and its application to generalization bounds] The central claim rests on the new non-asymptotic concentration-of-measure theorem for finite metric spaces (the result stated after the abstract and proved in the body). The proof must be checked for the precise assumptions on the metric, the embedding map, and the dependence of the constants on the representation dimension m; without an explicit verification that the discrete learning problem satisfies those assumptions, the claimed O(m^{1/2}) scaling and the applicability of the bound to digital-computer models cannot be confirmed.
minor comments (2)
  1. [Introduction / notation] The definition and construction of the geometric representation dimension m should be stated with an explicit example (e.g., for a discretized Euclidean domain) before the statement of the main bounds.
  2. [Preliminaries] Notation for the finite metric space and the embedding should be introduced once and used consistently; several symbols appear to be overloaded between the concentration result and the learning setting.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and for recommending major revision. The single major comment is addressed point-by-point below.

read point-by-point responses
  1. Referee: The central claim rests on the new non-asymptotic concentration-of-measure theorem for finite metric spaces (the result stated after the abstract and proved in the body). The proof must be checked for the precise assumptions on the metric, the embedding map, and the dependence of the constants on the representation dimension m; without an explicit verification that the discrete learning problem satisfies those assumptions, the claimed O(m^{1/2}) scaling and the applicability of the bound to digital-computer models cannot be confirmed.

    Authors: We thank the referee for highlighting the need for explicit verification. Theorem 1 (stated immediately after the abstract) lists the assumptions: (X, d) must be a finite metric space that admits a bi-Lipschitz embedding into a space whose concentration properties are known, with the embedding distortion and dimension controlling the dependence on m. The proof in the body uses the embedding to transfer concentration inequalities, deriving the prefactor scaling directly from the embedding parameters. Section 4 applies the result to digital-computer models by equipping the finite grid of machine-representable points with the restricted Euclidean metric; this grid is finite by construction and satisfies the embedding assumption when m is taken as the effective geometric representation dimension of the discretization. The O(m^{1/2}) bound follows from the embedding dimension and the 1-Lipschitz property of the loss. While the application section already connects the setting to the theorem assumptions, we agree that a more explicit cross-reference would strengthen the manuscript. We will therefore add a short dedicated paragraph (or table) in Section 4 that enumerates each assumption of Theorem 1 and confirms it holds for the discretized Euclidean case. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper derives a new non-asymptotic concentration-of-measure inequality for finite metric spaces via metric embedding arguments and applies it to obtain the family of generalization bounds {c_m / N^{1/(2∨m)}}. No load-bearing step reduces by construction to its own inputs, fitted parameters renamed as predictions, or self-citation chains; the claimed O(m^{1/2}) prefactor scaling follows from standard embedding-dimension costs rather than tautology. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on a new concentration result whose validity is not independently evidenced in the provided abstract, plus the introduced parameter m.

free parameters (1)
  • m (geometric representation dimension)
    Tunable integer parameter chosen according to N to obtain the adaptive rate; no explicit fitting procedure given in abstract.
axioms (1)
  • domain assumption Concentration of measure holds in finite metric spaces via metric embedding arguments
    Invoked as the foundation for the new non-asymptotic result (abstract).
invented entities (1)
  • geometric representation dimension m no independent evidence
    purpose: Parameter that encodes the discrete structure of the learning problem on digital computers and controls the rate exponent
    Newly introduced to define the family of bounds; no independent evidence supplied in abstract.

pith-pipeline@v0.9.0 · 5776 in / 1331 out tokens · 30037 ms · 2026-05-24T03:26:57.340413+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages · 1 internal anchor

  1. [1]

    Deep vs. shallow networks: An approximation theory perspective,

    H. N. Mhaskar and T. Poggio, “Deep vs. shallow networks: An approximation theory perspective,” Analysis and Applications, vol. 14, no. 06, pp. 829–848, 2016

  2. [2]

    Error bounds for approximations with deep relu networks,

    D. Yarotsky, “Error bounds for approximations with deep relu networks,” Neural Networks, vol. 94, pp. 103–114, 2017

  3. [3]

    Optimal approximation of piecewise smooth functions using deep relu neural networks,

    P. Petersen and F. V oigtlaender, “Optimal approximation of piecewise smooth functions using deep relu neural networks,” Neural Networks, vol. 108, pp. 296–330, 2018

  4. [4]

    Optimal approximation rate of relu networks in terms of width and depth,

    Z. Shen, H. Yang, and S. Zhang, “Optimal approximation rate of relu networks in terms of width and depth,” Journal de Math´ematiques Pures et Appliqu ´ees, vol. 157, pp. 101–135, 2022

  5. [5]

    Designing universal causal deep learning models: The geometric (hyper) transformer,

    B. Acciaio, A. Kratsios, and G. Pammer, “Designing universal causal deep learning models: The geometric (hyper) transformer,” Mathematical Finance, 2023

  6. [6]

    A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

    B. Neyshabur, S. Bhojanapalli, and N. Srebro, “A pac-bayesian approach to spectrally-normalized margin bounds for neural networks,” arXiv preprint arXiv:1707.09564 , 2017. 19

  7. [7]

    A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks,

    ——, “A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks,” in iclr, 2018

  8. [8]

    Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks,

    P. L. Bartlett, N. Harvey, C. Liaw, and A. Mehrabian, “Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks,” Journal of Machine Learning Research , vol. 20, no. 63, pp. 1–17, 2019. [Online]. Available: http://jmlr.org/papers/v20/17-612.html

  9. [9]

    Spectrally-normalized margin bounds for neural networks,

    P. L. Bartlett, D. J. Foster, and M. J. Telgarsky, “Spectrally-normalized margin bounds for neural networks,” Advances in neural information processing systems , vol. 30, 2017

  10. [10]

    Generalization error bounds via R ´enyi-, f-divergences and maximal leakage,

    A. R. Esposito, M. Gastpar, and I. Issa, “Generalization error bounds via R ´enyi-, f-divergences and maximal leakage,” IEEE Trans. Inform. Theory , vol. 67, no. 8, pp. 4986–5004, 2021. [Online]. Available: https://doi.org/10.1109/TIT.2021.3085190

  11. [11]

    Information-theoretic bounds on the moments of the generalization error of learning algorithms,

    G. Aminian, L. Toni, and M. R. Rodrigues, “Information-theoretic bounds on the moments of the generalization error of learning algorithms,” in 2021 IEEE International Symposium on Information Theory (ISIT) . IEEE, 2021, pp. 682–687

  12. [12]

    The generalization error of random features regression: precise asymptotics and the double descent curve,

    S. Mei and A. Montanari, “The generalization error of random features regression: precise asymptotics and the double descent curve,” Comm. Pure Appl. Math. , vol. 75, no. 4, pp. 667–766, 2022. [Online]. Available: https://doi.org/10.1002/cpa.22008

  13. [13]

    Individually conditional individual mutual information bound on generalization error,

    R. Zhou, C. Tian, and T. Liu, “Individually conditional individual mutual information bound on generalization error,” IEEE Trans. Inform. Theory, vol. 68, no. 5, pp. 3304–3316, 2022. [Online]. Available: https://doi.org/10.1109/tit.2022.3144615

  14. [14]

    Exactly tight information-theoretic generalization error bound for the quadratic gaussian problem,

    ——, “Exactly tight information-theoretic generalization error bound for the quadratic gaussian problem,” in 2023 IEEE International Symposium on Information Theory (ISIT) . IEEE, 2023, pp. 903–908

  15. [15]

    Approximation bounds for random neural networks and reservoir systems,

    L. Gonon, L. Grigoryeva, and J.-P. Ortega, “Approximation bounds for random neural networks and reservoir systems,” The Annals of Applied Probability , vol. 33, no. 1, pp. 28–69, 2023

  16. [16]

    A theoretical analysis of the test error of finite-rank kernel ridge regression,

    T. S. Cheng, A. Lucchi, I. Dokmani ´c, A. Kratsios, and D. Belius, “A theoretical analysis of the test error of finite-rank kernel ridge regression,” Advances in Neural Information Processing Systems , 2023

  17. [17]

    Stability and generalization of graph convolutional networks in eigen-domains,

    M. K. Ng and A. Yip, “Stability and generalization of graph convolutional networks in eigen-domains,” Anal. Appl. (Singap.), vol. 21, no. 3, pp. 819–840, 2023. [Online]. Available: https://doi.org/10.1142/S0219530523500021

  18. [18]

    Instance-dependent generalization bounds via optimal transport,

    S. Hou, P. Kassraie, A. Kratsios, A. Krause, and J. Rothfuss, “Instance-dependent generalization bounds via optimal transport,” J. Mach. Learn. Res. , 2023

  19. [19]

    Benign overfitting in ridge regression,

    A. Tsigler and P. L. Bartlett, “Benign overfitting in ridge regression,” J. Mach. Learn. Res. , vol. 24, pp. Paper No. [123], 76, 2023

  20. [20]

    On generalization bounds for deep networks based on loss surface implicit regularization,

    M. Imaizumi and J. Schmidt-Hieber, “On generalization bounds for deep networks based on loss surface implicit regularization,” IEEE Trans. Inform. Theory , vol. 69, no. 2, pp. 1203–1223, 2023

  21. [21]

    Generalization in kernel regression under realistic assumptions,

    D. Barzilai and O. Shamir, “Generalization in kernel regression under realistic assumptions,” arXiv preprint arXiv:2312.15995, 2023

  22. [22]

    Characterizing overfitting in kernel ridgeless regression through the eigenspectrum,

    T. S. Cheng, A. Lucchi, A. Kratsios, and D. Belius, “Characterizing overfitting in kernel ridgeless regression through the eigenspectrum,” arxiv, 2024

  23. [23]

    What every computer scientist should know about floating-point arithmetic,

    D. Goldberg, “What every computer scientist should know about floating-point arithmetic,” ACM Comput. Surv., vol. 23, no. 1, p. 5–48, mar 1991. [Online]. Available: https://doi.org/10.1145/103162.103163

  24. [24]

    Muller, N

    J.-M. Muller, N. Brisebarre, F. De Dinechin, C.-P. Jeannerod, V . Lefevre, G. Melquiond, N. Revol, D. Stehl ´e, S. Torres et al., Handbook of floating-point arithmetic . Springer, 2018

  25. [25]

    Shalev-Shwartz and S

    S. Shalev-Shwartz and S. Ben-David, Understanding machine learning: From theory to algorithms. Cambridge university press, 2014

  26. [26]

    A. W. van der Vaart and J. A. Wellner, Weak convergence and empirical processes , ser. Springer Series in Statistics. Springer-Verlag, New York, 1996, with applications to statistics. [Online]. Available: https: //doi.org/10.1007/978-1-4757-2545-2

  27. [27]

    On a space of totally additive functions, vestn,

    L. Kantorovich and G. Rubinstein, “On a space of totally additive functions, vestn,” Vestnik Leningrad. Univ, 1958

  28. [28]

    Empirical measures: regularity is a counter-curse to dimensionality,

    B. R. Kloeckner, “Empirical measures: regularity is a counter-curse to dimensionality,” ESAIM Probab. Stat. , vol. 24, pp. 408–434, 2020. [Online]. Available: https://doi.org/10.1051/ps/2019025

  29. [29]

    On the rate of convergence in wasserstein distance of the empirical measure,

    N. Fournier and A. Guillin, “On the rate of convergence in wasserstein distance of the empirical measure,” Probability theory and related fields , vol. 162, no. 3-4, pp. 707–738, 2015

  30. [30]

    On optimal matchings,

    M. Ajtai, J. Koml ´os, and G. Tusn ´ady, “On optimal matchings,” Combinatorica, vol. 4, no. 4, pp. 259–264, 1984. [Online]. Available: https://doi.org/10.1007/BF02579135

  31. [31]

    Graf and H

    S. Graf and H. Luschgy, Foundations of quantization for probability distributions , ser. Lecture Notes in Mathematics. Springer-Verlag, Berlin, 2000, vol. 1730. [Online]. Available: https://doi.org/10.1007/BFb0103945

  32. [32]

    Approximation by finitely supported measures,

    B. Kloeckner, “Approximation by finitely supported measures,” ESAIM Control Optim. Calc. Var. , vol. 18, no. 2, pp. 343–359, 2012. [Online]. Available: https://doi.org/10.1051/cocv/2010100

  33. [33]

    Convergence rate of optimal quantization and application to the clustering performance of the empirical measure,

    Y . Liu and G. Pag `es, “Convergence rate of optimal quantization and application to the clustering performance of the empirical measure,” J. Mach. Learn. Res. , vol. 21, pp. Paper No. 86, 36, 2020

  34. [34]

    Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance,

    J. Weed and F. Bach, “Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance,” Bernoulli, vol. 25, no. 4A, pp. 2620–2648, 2019. [Online]. Available: https://doi.org/10.3150/18-BEJ1065 20

  35. [35]

    On Lipschitz embedding of finite metric spaces in Hilbert space,

    J. Bourgain, “On Lipschitz embedding of finite metric spaces in Hilbert space,” Israel J. Math. , vol. 52, no. 1-2, pp. 46–52, 1985. [Online]. Available: https://doi.org/10.1007/BF02776078

  36. [36]

    On the distortion required for embedding finite metric spaces into normed spaces,

    J. Matou ˇsek, “On the distortion required for embedding finite metric spaces into normed spaces,” Israel J. Math., vol. 93, pp. 333–344, 1996

  37. [37]

    Link prediction based on graph neural networks,

    M. Zhang and Y . Chen, “Link prediction based on graph neural networks,” Advances in neural information processing systems, vol. 31, 2018

  38. [38]

    Wasserstein generative adversarial networks,

    M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in International conference on machine learning. PMLR, 2017, pp. 214–223

  39. [39]

    Multi-marginal wasserstein gan,

    J. Cao, L. Mo, Y . Zhang, K. Jia, C. Shen, and M. Tan, “Multi-marginal wasserstein gan,” Advances in Neural Information Processing Systems, vol. 32, 2019

  40. [40]

    Cot-gan: Generating sequential data via causal optimal transport,

    T. Xu, L. K. Wenliang, M. Munn, and B. Acciaio, “Cot-gan: Generating sequential data via causal optimal transport,” Advances in neural information processing systems , vol. 33, pp. 8798–8809, 2020

  41. [41]

    Neural ordinary differential equations,

    R. T. Chen, Y . Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neural ordinary differential equations,” Advances in neural information processing systems , vol. 31, 2018

  42. [42]

    Neural jump ordinary differential equations: Consistent continuous-time prediction and filtering,

    C. Herrera, F. Krach, and J. Teichmann, “Neural jump ordinary differential equations: Consistent continuous-time prediction and filtering,” in International Conference on Learning Representations , 2020

  43. [43]

    Neural rough differential equations for long time series,

    J. Morrill, C. Salvi, P. Kidger, and J. Foster, “Neural rough differential equations for long time series,” in International Conference on Machine Learning . PMLR, 2021, pp. 7829–7838

  44. [44]

    On universal approximation and error bounds for fourier neural operators,

    N. Kovachki, S. Lanthaler, and S. Mishra, “On universal approximation and error bounds for fourier neural operators,” The Journal of Machine Learning Research , vol. 22, no. 1, pp. 13 237–13 312, 2021

  45. [45]

    Physics-informed machine learning,

    G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, “Physics-informed machine learning,” Nature Reviews Physics , vol. 3, no. 6, pp. 422–440, 2021

  46. [46]

    Sharper bounds for Gaussian and empirical processes,

    M. Talagrand, “Sharper bounds for Gaussian and empirical processes,” Ann. Probab., vol. 22, no. 1, pp. 28–76, 1994. [Online]. Available: http://links.jstor.org/sici?sici=0091-1798(199401)22:1⟨28:SBFGAE⟩2.0.CO;2-W&origin=MSN

  47. [47]

    The complexity of learning according to two models of a drifting environment,

    P. M. Long, “The complexity of learning according to two models of a drifting environment,” in Proceedings of the Eleventh Annual Conference on Computational Learning Theory (Madison, WI, 1998) . ACM, New York, 1998, pp. 116–125. [Online]. Available: https://doi.org/10.1145/279943.279968

  48. [48]

    Exact lower bounds for the agnostic probably-approximately-correct (PAC) machine learning model,

    A. Kontorovich and I. Pinelis, “Exact lower bounds for the agnostic probably-approximately-correct (PAC) machine learning model,” Ann. Statist. , vol. 47, no. 5, pp. 2822–2854, 2019. [Online]. Available: https: //doi.org/10.1214/18-AOS1766

  49. [49]

    Concentration inequalities for the empirical distribution of discrete distributions: beyond the method of types,

    J. Mardia, J. Jiao, E. T ´anczos, R. D. Nowak, and T. Weissman, “Concentration inequalities for the empirical distribution of discrete distributions: beyond the method of types,” Inf. Inference , vol. 9, no. 4, pp. 813–850, 2020. [Online]. Available: https://doi.org/10.1093/imaiai/iaz025

  50. [50]

    f-divergence inequalities,

    I. Sason and S. Verd ´u, “f-divergence inequalities,” IEEE Transactions on Information Theory , vol. 62, no. 11, pp. 5973– 6006, 2016

  51. [51]

    Inference for empirical wasserstein distances on finite spaces,

    M. Sommerfeld and A. Munk, “Inference for empirical wasserstein distances on finite spaces,” Journal of the Royal Statistical Society. Series B (Statistical Methodology) , vol. 80, no. 1, pp. 219–238, 2018

  52. [52]

    On the mean speed of convergence of empirical and occupation measures in wasserstein distance,

    E. Boissard and T. Le Gouic, “On the mean speed of convergence of empirical and occupation measures in wasserstein distance,” Annales de l’IHP Probabilit ´es et statistiques , vol. 50, no. 2, pp. 539–563, 2014

  53. [53]

    The least doubling constant of a path graph,

    E. Durand-Cartagena, J. Soria, and P. Tradacete, “The least doubling constant of a path graph,” arXiv preprint arXiv:2111.09196, 2021

  54. [54]

    Doubling constants and spectral theory on graphs,

    ——, “Doubling constants and spectral theory on graphs,” Discrete Math., vol. 346, no. 6, pp. Paper No. 113 354, 17,

  55. [55]

    Available: https://doi.org/10.1016/j.disc.2023.113354

    [Online]. Available: https://doi.org/10.1016/j.disc.2023.113354

  56. [56]

    Small transformers compute universal metric embeddings,

    A. Kratsios, V . Debarnot, and I. Dokmani ´c, “Small transformers compute universal metric embeddings,” Journal of Machine Learning Research , 2023

  57. [57]

    Approximating a bandlimited function using very coarsely quantized data: a family of stable sigma-delta modulators of arbitrary order,

    I. Daubechies and R. DeV ore, “Approximating a bandlimited function using very coarsely quantized data: a family of stable sigma-delta modulators of arbitrary order,” Ann. of Math. (2) , vol. 158, no. 2, pp. 679–710, 2003. [Online]. Available: https://doi.org/10.4007/annals.2003.158.679

  58. [58]

    Approximating a bandlimited function using very coarsely quantized data: improved error estimates in sigma-delta modulation,

    C. S. G ¨unt¨urk, “Approximating a bandlimited function using very coarsely quantized data: improved error estimates in sigma-delta modulation,” J. Amer. Math. Soc. , vol. 17, no. 1, pp. 229–242, 2004. [Online]. Available: https://doi.org/10.1090/S0894-0347-03-00436-3

  59. [59]

    Phase transitions in rate distortion theory and deep learning,

    P. Grohs, A. Klotz, and F. V oigtlaender, “Phase transitions in rate distortion theory and deep learning,” Found. Comput. Math., vol. 23, no. 1, pp. 329–392, 2023. [Online]. Available: https://doi.org/10.1007/s10208-021-09546-4

  60. [60]

    Inverse problems are solvable on real number signal processing hardware,

    H. Boche, A. Fono, and G. Kutyniok, “Inverse problems are solvable on real number signal processing hardware,” arXiv preprint arXiv:2204.02066, 2022

  61. [61]

    Limitations of deep learning for inverse problems on digital hardware,

    ——, “Limitations of deep learning for inverse problems on digital hardware,” arXiv preprint arXiv:2202.13490 , 2022

  62. [62]

    Non-computability of the pseudoinverse on digital computers,

    ——, “Non-computability of the pseudoinverse on digital computers,” arXiv preprint arXiv:2212.02940 , 2022

  63. [63]

    Expressive power of relu and step networks under floating-point operations,

    Y . Park, G. Hwang, W. Lee, and S. Park, “Expressive power of relu and step networks under floating-point operations,” 21 arXiv preprint arXiv:2401.15121 , 2024

  64. [64]

    Bi-Lipschitz embeddings into low-dimensional Euclidean spaces,

    J. Matou ˇsek, “Bi-Lipschitz embeddings into low-dimensional Euclidean spaces,” Comment. Math. Univ. Carolin., vol. 31, no. 3, pp. 589–600, 1990

  65. [65]

    , year =

    C. Villani, Optimal transport , ser. Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 2009, vol. 338, old and new. [Online]. Available: https://doi.org/10.1007/978-3-540-71050-9

  66. [66]

    D. P. Dubhashi and A. Panconesi, Concentration of measure for the analysis of randomized algorithms . Cambridge University Press, Cambridge, 2009. [Online]. Available: https://doi-org.libaccess.lib.mcmaster.ca/10.1017/ CBO9780511581274

  67. [67]

    Extensions of Lipschitz ma ps into a Hilbert space.Conference in modern analysis and pr obability (New Haven, Conn., 1982), Contemp

    W. B. Johnson and J. Lindenstrauss, “Extensions of Lipschitz mappings into a Hilbert space,” in Conference in modern analysis and probability (New Haven, Conn., 1982) , ser. Contemp. Math. Amer. Math. Soc., Providence, RI, 1984, vol. 26, pp. 189–206. [Online]. Available: https://doi.org/10.1090/conm/026/737400

  68. [68]

    Matou ˇsek, Lectures on discrete geometry, ser

    J. Matou ˇsek, Lectures on discrete geometry, ser. Graduate Texts in Mathematics. Springer-Verlag, New York, 2002, vol. 212

  69. [69]

    Rademacher and Gaussian complexities: risk bounds and structural results,

    P. L. Bartlett and S. Mendelson, “Rademacher and Gaussian complexities: risk bounds and structural results,” J. Mach. Learn. Res. , vol. 3, no. Spec. Issue Comput. Learn. Theory, pp. 463–482, 2002. [Online]. Available: https://doi.org/10.1162/153244303321897690

  70. [70]

    Local Rademacher complexities,

    P. L. Bartlett, O. Bousquet, and S. Mendelson, “Local Rademacher complexities,” Ann. Statist. , vol. 33, no. 4, pp. 1497–1537, 2005. [Online]. Available: https://doi.org/10.1214/009053605000000282

  71. [71]

    Designing universal causal deep learning models: The case of infinite- dimensional dynamical systems from stochastic analysis,

    L. Galimberti, A. Kratsios, and G. Livieri, “Designing universal causal deep learning models: The case of infinite- dimensional dynamical systems from stochastic analysis,” arXiv preprint arXiv:2210.13300 , 2022

  72. [72]

    G. G. Lorentz, M. v. Golitschek, and Y . Makovoz, Constructive approximation , ser. Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1996, vol. 304, advanced problems. [Online]. Available: https://doi.org/10.1007/978-3-642-60932-9

  73. [73]

    Optimal learners for realizable regression: Pac learning and online learning,

    I. Attias, S. Hanneke, A. Kalavasis, A. Karbasi, and G. Velegkas, “Optimal learners for realizable regression: Pac learning and online learning,” in Thirty-seventh Conference on Neural Information Processing Systems , 2023

  74. [74]

    Nonlinear approximation and (deep) relu networks,

    I. Daubechies, R. DeV ore, S. Foucart, B. Hanin, and G. Petrova, “Nonlinear approximation and (deep) relu networks,” Constructive Approximation, vol. 55, no. 1, pp. 127–172, 2022

  75. [75]

    Vershynin, High-dimensional probability, ser

    R. Vershynin, High-dimensional probability, ser. Cambridge Series in Statistical and Probabilistic Mathematics. Cam- bridge University Press, Cambridge, 2018, vol. 47, an introduction with applications in data science, With a foreword by Sara van de Geer

  76. [76]

    Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data,

    G. K. Dziugaite and D. M. Roy, “Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data,” Uncertainty in Artificial Intelligence , 2017

  77. [77]

    Matou ˇsek, Lectures on discrete geometry , ser

    J. Matou ˇsek, Lectures on discrete geometry , ser. Graduate Texts in Mathematics. Springer-Verlag, New York, 2002, vol. 212. [Online]. Available: https://doi.org/10.1007/978-1-4613-0039-7