Tighter Learning Guarantees on Digital Computers via Concentration of Measure on Finite Spaces
Pith reviewed 2026-05-24 03:26 UTC · model grok-4.3
The pith
Learning models on digital computers admit generalization bounds that adapt to a geometric representation dimension m, achieving rates of order 1 over N to the power 1 over 2 or m with constants of order square root of m.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We derive a family of generalization bounds {c_m/N^{1/(2∨m)}}_{m=1}^∞ tailored for learning models on digital computers, which adapt to both the sample size N and the geometric representation dimension m of the discrete learning problem. Adjusting the parameter m according to N results in significantly tighter generalization bounds for practical sample sizes N, while setting m small maintains the optimal dimension-free worst-case rate of O(1/N^{1/2}). Notably, c_m ∈ O(m^{1/2}) for learning models on discretized Euclidean domains. Furthermore, our adaptive generalization bounds are formulated based on our new non-asymptotic result for concentration of measure in finite metric spaces, obtained
What carries the argument
A new non-asymptotic concentration of measure inequality for finite metric spaces, derived via metric embedding arguments, which directly yields the parameterized family of generalization bounds.
If this is right
- For any fixed m the generalization gap converges at rate 1/N^{1/(2∨m)}
- Choosing m depending on N produces tighter numerical bounds for practical sample sizes
- Small m recovers the dimension-free rate O(1/N^{1/2})
- On discretized Euclidean domains the prefactor c_m scales at most like sqrt(m)
- The bounds apply directly to models whose inputs are Euclidean but whose implementation is finite-precision
Where Pith is reading between the lines
- The embedding technique may allow similar adaptive bounds for other finite-precision or quantized architectures beyond Euclidean discretization
- Optimal m could be chosen from data alone by monitoring how the observed gap behaves across a grid of m values
- The same concentration result may tighten analyses of generalization for models trained with finite-precision arithmetic even when the target function is continuous
- If the inequality holds, it suggests that the effective dimension m can serve as a practical proxy for the complexity induced by machine precision
Load-bearing premise
The new concentration of measure result for finite metric spaces holds when applied to the discrete learning problem arising from digital implementation.
What would settle it
An explicit finite metric space arising from discretization of a Euclidean domain together with a learning model whose generalization gap exceeds every c_m / N^{1/(2∨m)} for all m, or a direct counterexample to the claimed concentration inequality on that space.
Figures
read the original abstract
Machine learning models with inputs in a Euclidean space $\mathbb{R}^d$, when implemented on digital computers, generalize, and their generalization gap converges to $0$ at a rate of $c/N^{1/2}$ concerning the sample size $N$. However, the constant $c>0$ obtained through classical methods can be large in terms of the ambient dimension $d$ and machine precision, posing a challenge when $N$ is small to realistically large. In this paper, we derive a family of generalization bounds $\{c_m/N^{1/(2\vee m)}\}_{m=1}^{\infty}$ tailored for learning models on digital computers, which adapt to both the sample size $N$ and the so-called geometric representation dimension $m$ of the discrete learning problem. Adjusting the parameter $m$ according to $N$ results in significantly tighter generalization bounds for practical sample sizes $N$, while setting $m$ small maintains the optimal dimension-free worst-case rate of $\mathcal{O}(1/N^{1/2})$. Notably, $c_{m}\in \mathcal{O}(m^{1/2})$ for learning models on discretized Euclidean domains. Furthermore, our adaptive generalization bounds are formulated based on our new non-asymptotic result for concentration of measure in finite metric spaces, established via leveraging metric embedding arguments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to derive a family of generalization bounds {c_m / N^{1/(2∨m)}} for learning models on digital computers. These adapt to sample size N and a geometric representation dimension m of the discrete problem, obtained from a new non-asymptotic concentration-of-measure inequality on finite metric spaces derived via metric-embedding arguments. For discretized Euclidean domains the prefactor satisfies c_m = O(m^{1/2}); choosing m as a function of N is asserted to produce tighter bounds than the classical O(1/sqrt(N)) while preserving the optimal worst-case rate for small m.
Significance. If the new concentration inequality is valid and correctly applied, the work supplies a concrete mechanism for obtaining sample-size-adaptive and representation-dimension-adaptive generalization bounds that can be substantially tighter than dimension-dependent classical bounds for moderate N. The metric-embedding route to the finite-space concentration result is a methodological strength that, if rigorous, could be reusable beyond the learning setting.
major comments (1)
- [new concentration theorem and its application to generalization bounds] The central claim rests on the new non-asymptotic concentration-of-measure theorem for finite metric spaces (the result stated after the abstract and proved in the body). The proof must be checked for the precise assumptions on the metric, the embedding map, and the dependence of the constants on the representation dimension m; without an explicit verification that the discrete learning problem satisfies those assumptions, the claimed O(m^{1/2}) scaling and the applicability of the bound to digital-computer models cannot be confirmed.
minor comments (2)
- [Introduction / notation] The definition and construction of the geometric representation dimension m should be stated with an explicit example (e.g., for a discretized Euclidean domain) before the statement of the main bounds.
- [Preliminaries] Notation for the finite metric space and the embedding should be introduced once and used consistently; several symbols appear to be overloaded between the concentration result and the learning setting.
Simulated Author's Rebuttal
We thank the referee for their careful reading and for recommending major revision. The single major comment is addressed point-by-point below.
read point-by-point responses
-
Referee: The central claim rests on the new non-asymptotic concentration-of-measure theorem for finite metric spaces (the result stated after the abstract and proved in the body). The proof must be checked for the precise assumptions on the metric, the embedding map, and the dependence of the constants on the representation dimension m; without an explicit verification that the discrete learning problem satisfies those assumptions, the claimed O(m^{1/2}) scaling and the applicability of the bound to digital-computer models cannot be confirmed.
Authors: We thank the referee for highlighting the need for explicit verification. Theorem 1 (stated immediately after the abstract) lists the assumptions: (X, d) must be a finite metric space that admits a bi-Lipschitz embedding into a space whose concentration properties are known, with the embedding distortion and dimension controlling the dependence on m. The proof in the body uses the embedding to transfer concentration inequalities, deriving the prefactor scaling directly from the embedding parameters. Section 4 applies the result to digital-computer models by equipping the finite grid of machine-representable points with the restricted Euclidean metric; this grid is finite by construction and satisfies the embedding assumption when m is taken as the effective geometric representation dimension of the discretization. The O(m^{1/2}) bound follows from the embedding dimension and the 1-Lipschitz property of the loss. While the application section already connects the setting to the theorem assumptions, we agree that a more explicit cross-reference would strengthen the manuscript. We will therefore add a short dedicated paragraph (or table) in Section 4 that enumerates each assumption of Theorem 1 and confirms it holds for the discretized Euclidean case. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper derives a new non-asymptotic concentration-of-measure inequality for finite metric spaces via metric embedding arguments and applies it to obtain the family of generalization bounds {c_m / N^{1/(2∨m)}}. No load-bearing step reduces by construction to its own inputs, fitted parameters renamed as predictions, or self-citation chains; the claimed O(m^{1/2}) prefactor scaling follows from standard embedding-dimension costs rather than tautology. The derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- m (geometric representation dimension)
axioms (1)
- domain assumption Concentration of measure holds in finite metric spaces via metric embedding arguments
invented entities (1)
-
geometric representation dimension m
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Deep vs. shallow networks: An approximation theory perspective,
H. N. Mhaskar and T. Poggio, “Deep vs. shallow networks: An approximation theory perspective,” Analysis and Applications, vol. 14, no. 06, pp. 829–848, 2016
work page 2016
-
[2]
Error bounds for approximations with deep relu networks,
D. Yarotsky, “Error bounds for approximations with deep relu networks,” Neural Networks, vol. 94, pp. 103–114, 2017
work page 2017
-
[3]
Optimal approximation of piecewise smooth functions using deep relu neural networks,
P. Petersen and F. V oigtlaender, “Optimal approximation of piecewise smooth functions using deep relu neural networks,” Neural Networks, vol. 108, pp. 296–330, 2018
work page 2018
-
[4]
Optimal approximation rate of relu networks in terms of width and depth,
Z. Shen, H. Yang, and S. Zhang, “Optimal approximation rate of relu networks in terms of width and depth,” Journal de Math´ematiques Pures et Appliqu ´ees, vol. 157, pp. 101–135, 2022
work page 2022
-
[5]
Designing universal causal deep learning models: The geometric (hyper) transformer,
B. Acciaio, A. Kratsios, and G. Pammer, “Designing universal causal deep learning models: The geometric (hyper) transformer,” Mathematical Finance, 2023
work page 2023
-
[6]
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
B. Neyshabur, S. Bhojanapalli, and N. Srebro, “A pac-bayesian approach to spectrally-normalized margin bounds for neural networks,” arXiv preprint arXiv:1707.09564 , 2017. 19
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[7]
A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks,
——, “A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks,” in iclr, 2018
work page 2018
-
[8]
Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks,
P. L. Bartlett, N. Harvey, C. Liaw, and A. Mehrabian, “Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks,” Journal of Machine Learning Research , vol. 20, no. 63, pp. 1–17, 2019. [Online]. Available: http://jmlr.org/papers/v20/17-612.html
work page 2019
-
[9]
Spectrally-normalized margin bounds for neural networks,
P. L. Bartlett, D. J. Foster, and M. J. Telgarsky, “Spectrally-normalized margin bounds for neural networks,” Advances in neural information processing systems , vol. 30, 2017
work page 2017
-
[10]
Generalization error bounds via R ´enyi-, f-divergences and maximal leakage,
A. R. Esposito, M. Gastpar, and I. Issa, “Generalization error bounds via R ´enyi-, f-divergences and maximal leakage,” IEEE Trans. Inform. Theory , vol. 67, no. 8, pp. 4986–5004, 2021. [Online]. Available: https://doi.org/10.1109/TIT.2021.3085190
-
[11]
Information-theoretic bounds on the moments of the generalization error of learning algorithms,
G. Aminian, L. Toni, and M. R. Rodrigues, “Information-theoretic bounds on the moments of the generalization error of learning algorithms,” in 2021 IEEE International Symposium on Information Theory (ISIT) . IEEE, 2021, pp. 682–687
work page 2021
-
[12]
S. Mei and A. Montanari, “The generalization error of random features regression: precise asymptotics and the double descent curve,” Comm. Pure Appl. Math. , vol. 75, no. 4, pp. 667–766, 2022. [Online]. Available: https://doi.org/10.1002/cpa.22008
-
[13]
Individually conditional individual mutual information bound on generalization error,
R. Zhou, C. Tian, and T. Liu, “Individually conditional individual mutual information bound on generalization error,” IEEE Trans. Inform. Theory, vol. 68, no. 5, pp. 3304–3316, 2022. [Online]. Available: https://doi.org/10.1109/tit.2022.3144615
-
[14]
Exactly tight information-theoretic generalization error bound for the quadratic gaussian problem,
——, “Exactly tight information-theoretic generalization error bound for the quadratic gaussian problem,” in 2023 IEEE International Symposium on Information Theory (ISIT) . IEEE, 2023, pp. 903–908
work page 2023
-
[15]
Approximation bounds for random neural networks and reservoir systems,
L. Gonon, L. Grigoryeva, and J.-P. Ortega, “Approximation bounds for random neural networks and reservoir systems,” The Annals of Applied Probability , vol. 33, no. 1, pp. 28–69, 2023
work page 2023
-
[16]
A theoretical analysis of the test error of finite-rank kernel ridge regression,
T. S. Cheng, A. Lucchi, I. Dokmani ´c, A. Kratsios, and D. Belius, “A theoretical analysis of the test error of finite-rank kernel ridge regression,” Advances in Neural Information Processing Systems , 2023
work page 2023
-
[17]
Stability and generalization of graph convolutional networks in eigen-domains,
M. K. Ng and A. Yip, “Stability and generalization of graph convolutional networks in eigen-domains,” Anal. Appl. (Singap.), vol. 21, no. 3, pp. 819–840, 2023. [Online]. Available: https://doi.org/10.1142/S0219530523500021
-
[18]
Instance-dependent generalization bounds via optimal transport,
S. Hou, P. Kassraie, A. Kratsios, A. Krause, and J. Rothfuss, “Instance-dependent generalization bounds via optimal transport,” J. Mach. Learn. Res. , 2023
work page 2023
-
[19]
Benign overfitting in ridge regression,
A. Tsigler and P. L. Bartlett, “Benign overfitting in ridge regression,” J. Mach. Learn. Res. , vol. 24, pp. Paper No. [123], 76, 2023
work page 2023
-
[20]
On generalization bounds for deep networks based on loss surface implicit regularization,
M. Imaizumi and J. Schmidt-Hieber, “On generalization bounds for deep networks based on loss surface implicit regularization,” IEEE Trans. Inform. Theory , vol. 69, no. 2, pp. 1203–1223, 2023
work page 2023
-
[21]
Generalization in kernel regression under realistic assumptions,
D. Barzilai and O. Shamir, “Generalization in kernel regression under realistic assumptions,” arXiv preprint arXiv:2312.15995, 2023
-
[22]
Characterizing overfitting in kernel ridgeless regression through the eigenspectrum,
T. S. Cheng, A. Lucchi, A. Kratsios, and D. Belius, “Characterizing overfitting in kernel ridgeless regression through the eigenspectrum,” arxiv, 2024
work page 2024
-
[23]
What every computer scientist should know about floating-point arithmetic,
D. Goldberg, “What every computer scientist should know about floating-point arithmetic,” ACM Comput. Surv., vol. 23, no. 1, p. 5–48, mar 1991. [Online]. Available: https://doi.org/10.1145/103162.103163
- [24]
-
[25]
S. Shalev-Shwartz and S. Ben-David, Understanding machine learning: From theory to algorithms. Cambridge university press, 2014
work page 2014
-
[26]
A. W. van der Vaart and J. A. Wellner, Weak convergence and empirical processes , ser. Springer Series in Statistics. Springer-Verlag, New York, 1996, with applications to statistics. [Online]. Available: https: //doi.org/10.1007/978-1-4757-2545-2
-
[27]
On a space of totally additive functions, vestn,
L. Kantorovich and G. Rubinstein, “On a space of totally additive functions, vestn,” Vestnik Leningrad. Univ, 1958
work page 1958
-
[28]
Empirical measures: regularity is a counter-curse to dimensionality,
B. R. Kloeckner, “Empirical measures: regularity is a counter-curse to dimensionality,” ESAIM Probab. Stat. , vol. 24, pp. 408–434, 2020. [Online]. Available: https://doi.org/10.1051/ps/2019025
-
[29]
On the rate of convergence in wasserstein distance of the empirical measure,
N. Fournier and A. Guillin, “On the rate of convergence in wasserstein distance of the empirical measure,” Probability theory and related fields , vol. 162, no. 3-4, pp. 707–738, 2015
work page 2015
-
[30]
M. Ajtai, J. Koml ´os, and G. Tusn ´ady, “On optimal matchings,” Combinatorica, vol. 4, no. 4, pp. 259–264, 1984. [Online]. Available: https://doi.org/10.1007/BF02579135
-
[31]
S. Graf and H. Luschgy, Foundations of quantization for probability distributions , ser. Lecture Notes in Mathematics. Springer-Verlag, Berlin, 2000, vol. 1730. [Online]. Available: https://doi.org/10.1007/BFb0103945
-
[32]
Approximation by finitely supported measures,
B. Kloeckner, “Approximation by finitely supported measures,” ESAIM Control Optim. Calc. Var. , vol. 18, no. 2, pp. 343–359, 2012. [Online]. Available: https://doi.org/10.1051/cocv/2010100
-
[33]
Y . Liu and G. Pag `es, “Convergence rate of optimal quantization and application to the clustering performance of the empirical measure,” J. Mach. Learn. Res. , vol. 21, pp. Paper No. 86, 36, 2020
work page 2020
-
[34]
J. Weed and F. Bach, “Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance,” Bernoulli, vol. 25, no. 4A, pp. 2620–2648, 2019. [Online]. Available: https://doi.org/10.3150/18-BEJ1065 20
-
[35]
On Lipschitz embedding of finite metric spaces in Hilbert space,
J. Bourgain, “On Lipschitz embedding of finite metric spaces in Hilbert space,” Israel J. Math. , vol. 52, no. 1-2, pp. 46–52, 1985. [Online]. Available: https://doi.org/10.1007/BF02776078
-
[36]
On the distortion required for embedding finite metric spaces into normed spaces,
J. Matou ˇsek, “On the distortion required for embedding finite metric spaces into normed spaces,” Israel J. Math., vol. 93, pp. 333–344, 1996
work page 1996
-
[37]
Link prediction based on graph neural networks,
M. Zhang and Y . Chen, “Link prediction based on graph neural networks,” Advances in neural information processing systems, vol. 31, 2018
work page 2018
-
[38]
Wasserstein generative adversarial networks,
M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in International conference on machine learning. PMLR, 2017, pp. 214–223
work page 2017
-
[39]
Multi-marginal wasserstein gan,
J. Cao, L. Mo, Y . Zhang, K. Jia, C. Shen, and M. Tan, “Multi-marginal wasserstein gan,” Advances in Neural Information Processing Systems, vol. 32, 2019
work page 2019
-
[40]
Cot-gan: Generating sequential data via causal optimal transport,
T. Xu, L. K. Wenliang, M. Munn, and B. Acciaio, “Cot-gan: Generating sequential data via causal optimal transport,” Advances in neural information processing systems , vol. 33, pp. 8798–8809, 2020
work page 2020
-
[41]
Neural ordinary differential equations,
R. T. Chen, Y . Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neural ordinary differential equations,” Advances in neural information processing systems , vol. 31, 2018
work page 2018
-
[42]
Neural jump ordinary differential equations: Consistent continuous-time prediction and filtering,
C. Herrera, F. Krach, and J. Teichmann, “Neural jump ordinary differential equations: Consistent continuous-time prediction and filtering,” in International Conference on Learning Representations , 2020
work page 2020
-
[43]
Neural rough differential equations for long time series,
J. Morrill, C. Salvi, P. Kidger, and J. Foster, “Neural rough differential equations for long time series,” in International Conference on Machine Learning . PMLR, 2021, pp. 7829–7838
work page 2021
-
[44]
On universal approximation and error bounds for fourier neural operators,
N. Kovachki, S. Lanthaler, and S. Mishra, “On universal approximation and error bounds for fourier neural operators,” The Journal of Machine Learning Research , vol. 22, no. 1, pp. 13 237–13 312, 2021
work page 2021
-
[45]
Physics-informed machine learning,
G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, “Physics-informed machine learning,” Nature Reviews Physics , vol. 3, no. 6, pp. 422–440, 2021
work page 2021
-
[46]
Sharper bounds for Gaussian and empirical processes,
M. Talagrand, “Sharper bounds for Gaussian and empirical processes,” Ann. Probab., vol. 22, no. 1, pp. 28–76, 1994. [Online]. Available: http://links.jstor.org/sici?sici=0091-1798(199401)22:1⟨28:SBFGAE⟩2.0.CO;2-W&origin=MSN
work page 1994
-
[47]
The complexity of learning according to two models of a drifting environment,
P. M. Long, “The complexity of learning according to two models of a drifting environment,” in Proceedings of the Eleventh Annual Conference on Computational Learning Theory (Madison, WI, 1998) . ACM, New York, 1998, pp. 116–125. [Online]. Available: https://doi.org/10.1145/279943.279968
-
[48]
Exact lower bounds for the agnostic probably-approximately-correct (PAC) machine learning model,
A. Kontorovich and I. Pinelis, “Exact lower bounds for the agnostic probably-approximately-correct (PAC) machine learning model,” Ann. Statist. , vol. 47, no. 5, pp. 2822–2854, 2019. [Online]. Available: https: //doi.org/10.1214/18-AOS1766
-
[49]
J. Mardia, J. Jiao, E. T ´anczos, R. D. Nowak, and T. Weissman, “Concentration inequalities for the empirical distribution of discrete distributions: beyond the method of types,” Inf. Inference , vol. 9, no. 4, pp. 813–850, 2020. [Online]. Available: https://doi.org/10.1093/imaiai/iaz025
-
[50]
I. Sason and S. Verd ´u, “f-divergence inequalities,” IEEE Transactions on Information Theory , vol. 62, no. 11, pp. 5973– 6006, 2016
work page 2016
-
[51]
Inference for empirical wasserstein distances on finite spaces,
M. Sommerfeld and A. Munk, “Inference for empirical wasserstein distances on finite spaces,” Journal of the Royal Statistical Society. Series B (Statistical Methodology) , vol. 80, no. 1, pp. 219–238, 2018
work page 2018
-
[52]
On the mean speed of convergence of empirical and occupation measures in wasserstein distance,
E. Boissard and T. Le Gouic, “On the mean speed of convergence of empirical and occupation measures in wasserstein distance,” Annales de l’IHP Probabilit ´es et statistiques , vol. 50, no. 2, pp. 539–563, 2014
work page 2014
-
[53]
The least doubling constant of a path graph,
E. Durand-Cartagena, J. Soria, and P. Tradacete, “The least doubling constant of a path graph,” arXiv preprint arXiv:2111.09196, 2021
-
[54]
Doubling constants and spectral theory on graphs,
——, “Doubling constants and spectral theory on graphs,” Discrete Math., vol. 346, no. 6, pp. Paper No. 113 354, 17,
-
[55]
Available: https://doi.org/10.1016/j.disc.2023.113354
[Online]. Available: https://doi.org/10.1016/j.disc.2023.113354
-
[56]
Small transformers compute universal metric embeddings,
A. Kratsios, V . Debarnot, and I. Dokmani ´c, “Small transformers compute universal metric embeddings,” Journal of Machine Learning Research , 2023
work page 2023
-
[57]
I. Daubechies and R. DeV ore, “Approximating a bandlimited function using very coarsely quantized data: a family of stable sigma-delta modulators of arbitrary order,” Ann. of Math. (2) , vol. 158, no. 2, pp. 679–710, 2003. [Online]. Available: https://doi.org/10.4007/annals.2003.158.679
-
[58]
C. S. G ¨unt¨urk, “Approximating a bandlimited function using very coarsely quantized data: improved error estimates in sigma-delta modulation,” J. Amer. Math. Soc. , vol. 17, no. 1, pp. 229–242, 2004. [Online]. Available: https://doi.org/10.1090/S0894-0347-03-00436-3
-
[59]
Phase transitions in rate distortion theory and deep learning,
P. Grohs, A. Klotz, and F. V oigtlaender, “Phase transitions in rate distortion theory and deep learning,” Found. Comput. Math., vol. 23, no. 1, pp. 329–392, 2023. [Online]. Available: https://doi.org/10.1007/s10208-021-09546-4
-
[60]
Inverse problems are solvable on real number signal processing hardware,
H. Boche, A. Fono, and G. Kutyniok, “Inverse problems are solvable on real number signal processing hardware,” arXiv preprint arXiv:2204.02066, 2022
-
[61]
Limitations of deep learning for inverse problems on digital hardware,
——, “Limitations of deep learning for inverse problems on digital hardware,” arXiv preprint arXiv:2202.13490 , 2022
-
[62]
Non-computability of the pseudoinverse on digital computers,
——, “Non-computability of the pseudoinverse on digital computers,” arXiv preprint arXiv:2212.02940 , 2022
-
[63]
Expressive power of relu and step networks under floating-point operations,
Y . Park, G. Hwang, W. Lee, and S. Park, “Expressive power of relu and step networks under floating-point operations,” 21 arXiv preprint arXiv:2401.15121 , 2024
-
[64]
Bi-Lipschitz embeddings into low-dimensional Euclidean spaces,
J. Matou ˇsek, “Bi-Lipschitz embeddings into low-dimensional Euclidean spaces,” Comment. Math. Univ. Carolin., vol. 31, no. 3, pp. 589–600, 1990
work page 1990
-
[65]
C. Villani, Optimal transport , ser. Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 2009, vol. 338, old and new. [Online]. Available: https://doi.org/10.1007/978-3-540-71050-9
-
[66]
D. P. Dubhashi and A. Panconesi, Concentration of measure for the analysis of randomized algorithms . Cambridge University Press, Cambridge, 2009. [Online]. Available: https://doi-org.libaccess.lib.mcmaster.ca/10.1017/ CBO9780511581274
work page 2009
-
[67]
W. B. Johnson and J. Lindenstrauss, “Extensions of Lipschitz mappings into a Hilbert space,” in Conference in modern analysis and probability (New Haven, Conn., 1982) , ser. Contemp. Math. Amer. Math. Soc., Providence, RI, 1984, vol. 26, pp. 189–206. [Online]. Available: https://doi.org/10.1090/conm/026/737400
-
[68]
Matou ˇsek, Lectures on discrete geometry, ser
J. Matou ˇsek, Lectures on discrete geometry, ser. Graduate Texts in Mathematics. Springer-Verlag, New York, 2002, vol. 212
work page 2002
-
[69]
Rademacher and Gaussian complexities: risk bounds and structural results,
P. L. Bartlett and S. Mendelson, “Rademacher and Gaussian complexities: risk bounds and structural results,” J. Mach. Learn. Res. , vol. 3, no. Spec. Issue Comput. Learn. Theory, pp. 463–482, 2002. [Online]. Available: https://doi.org/10.1162/153244303321897690
-
[70]
Local Rademacher complexities,
P. L. Bartlett, O. Bousquet, and S. Mendelson, “Local Rademacher complexities,” Ann. Statist. , vol. 33, no. 4, pp. 1497–1537, 2005. [Online]. Available: https://doi.org/10.1214/009053605000000282
-
[71]
L. Galimberti, A. Kratsios, and G. Livieri, “Designing universal causal deep learning models: The case of infinite- dimensional dynamical systems from stochastic analysis,” arXiv preprint arXiv:2210.13300 , 2022
-
[72]
G. G. Lorentz, M. v. Golitschek, and Y . Makovoz, Constructive approximation , ser. Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1996, vol. 304, advanced problems. [Online]. Available: https://doi.org/10.1007/978-3-642-60932-9
-
[73]
Optimal learners for realizable regression: Pac learning and online learning,
I. Attias, S. Hanneke, A. Kalavasis, A. Karbasi, and G. Velegkas, “Optimal learners for realizable regression: Pac learning and online learning,” in Thirty-seventh Conference on Neural Information Processing Systems , 2023
work page 2023
-
[74]
Nonlinear approximation and (deep) relu networks,
I. Daubechies, R. DeV ore, S. Foucart, B. Hanin, and G. Petrova, “Nonlinear approximation and (deep) relu networks,” Constructive Approximation, vol. 55, no. 1, pp. 127–172, 2022
work page 2022
-
[75]
Vershynin, High-dimensional probability, ser
R. Vershynin, High-dimensional probability, ser. Cambridge Series in Statistical and Probabilistic Mathematics. Cam- bridge University Press, Cambridge, 2018, vol. 47, an introduction with applications in data science, With a foreword by Sara van de Geer
work page 2018
-
[76]
G. K. Dziugaite and D. M. Roy, “Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data,” Uncertainty in Artificial Intelligence , 2017
work page 2017
-
[77]
Matou ˇsek, Lectures on discrete geometry , ser
J. Matou ˇsek, Lectures on discrete geometry , ser. Graduate Texts in Mathematics. Springer-Verlag, New York, 2002, vol. 212. [Online]. Available: https://doi.org/10.1007/978-1-4613-0039-7
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.