Efficient Techniques for Data Reconstruction, with Finite-Width Recovery Guarantees

Coralia Cartis; Edward Tansley; Estelle Massart; Roy Makhlouf

arxiv: 2605.06519 · v1 · submitted 2026-05-07 · 💻 cs.LG

Efficient Techniques for Data Reconstruction, with Finite-Width Recovery Guarantees

Edward Tansley , Roy Makhlouf , Estelle Massart , Coralia Cartis This is my paper

Pith reviewed 2026-05-08 12:31 UTC · model grok-4.3

classification 💻 cs.LG

keywords data reconstructionprivacy attacksrandom feature modelfinite-width guaranteesPAC boundssubspace approximationneural networksoptimization

0 comments

The pith

A unified optimization formulation provably reconstructs training data with high probability in random feature models of sufficient width.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors introduce a single optimization problem that recovers training examples from a network's initial random parameters and its final trained parameters. In the random feature model they prove this problem succeeds with high probability once the width exceeds a finite threshold derived via PAC-style arguments. When the data lie in a low-dimensional subspace the required width depends on the subspace dimension rather than ambient dimension. They further supply an efficient algorithm that approximates the subspace from first-layer weight changes and reconstructs using only the final layer, reducing the search-space dimension for general networks. Experiments on synthetic data and CIFAR-10 confirm that the subspace-aware method yields higher-quality recoveries than full-space baselines.

Core claim

In the random feature model the unified optimization formulation recovers the exact training data with high probability once the network width is sufficiently large, with the required width bounded using PAC-style arguments; when the data lie in a low-dimensional subspace the width requirement relaxes to a function of the subspace dimension, and an efficient reconstruction algorithm approximates this subspace from first-layer weight changes to enable practical recovery in general models using only last-layer weights.

What carries the argument

The unified optimization formulation that matches initial and trained network parameters to candidate data points, together with the subspace approximation obtained from the change in first-layer weights.

If this is right

Exact recovery of training data becomes possible in the random feature model once width exceeds the PAC-derived threshold.
The subspace relaxation permits successful reconstruction at smaller widths when data intrinsic dimension is low.
The efficient algorithm lowers the effective search-space dimension and the minimal width needed for high-quality results.
Numerical tests on CIFAR-10 show improved reconstruction quality relative to non-subspace methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Defenses could target the leakage of subspace orientation through first-layer weight changes.
The PAC bounds offer a concrete way to set minimum widths for privacy-preserving random-feature training.
Similar finite-width analysis might extend to other architectures if the subspace structure is preserved.
Testing the algorithm on datasets with known sensitive labels would quantify real-world privacy leakage.

Load-bearing premise

The optimization formulation must exactly encode the relationship between parameter changes and the training data, and the random feature model with fixed first-layer weights must accurately describe the network.

What would settle it

An experiment on a random feature network whose width meets the derived PAC bound yet the optimization fails to recover the training data with probability higher than the allowed failure rate.

Figures

Figures reproduced from arXiv: 2605.06519 by Coralia Cartis, Edward Tansley, Estelle Massart, Roy Makhlouf.

**Figure 1.** Figure 1: The quality of reconstruction improves as the network width view at source ↗

**Figure 2.** Figure 2: Left: The spectral decay of ∆W1 in a 2-layer network of width p = 103 , trained on data drawn from a 30-dimensional subspace of R 60; Right: Algorithm 1 (“Full space”) and Algorithm 2 (“Subspace (∆W1)” and “True subspace”) for synthetic data reconstruction using last-layer parameters only (see Results section). 102 103 104 105 Last-layer parameters p (L) = output dim · p 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8… view at source ↗

**Figure 3.** Figure 3: Examining the effect of network depth. Algorithm 1 (“Full space”) and Algorithm 2 (“Subspace view at source ↗

**Figure 4.** Figure 4: Comparing parameter choices with varying network depths (2, 3, 5 layers). Algorithm 1 (“Full view at source ↗

**Figure 5.** Figure 5: Comparing the reconstruction performance of the last layer reconstruction scheme with the all view at source ↗

**Figure 6.** Figure 6: How the quality of reconstructed images varies along with the reconstruction error view at source ↗

read the original abstract

Data reconstruction attacks on trained neural networks aim to recover the data on which the network has been trained and pose a significant threat to privacy, especially if the training dataset contains sensitive information. Here, we propose a unified optimization formulation of the data reconstruction problem based on initial and trained parameter values, incorporating state-of-the-art proposals. We show that in the random feature model, this formulation provably leads to training data reconstruction with high probability, provided the network width is sufficiently large; this unprecedented finite-width result uses PAC-style bounds. Furthermore, when the data lies in a low-dimensional subspace, we show that the network width requirement for successful reconstruction can be relaxed, with bounds depending on the subspace dimension rather than the ambient dimension. For general neural network models and unknown data orientations, we propose an efficient reconstruction algorithm that approximates the low-dimensional data subspace through the change in the first-layer weights during training and uses only the last-layer weights for reconstruction, thus reducing the search space dimension and the required network width for high-quality reconstructions. Our numerical experiments on synthetic datasets and CIFAR-10 confirm that our subspace-aware reconstruction approach outperforms standard full-space techniques.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives finite-width PAC bounds for data reconstruction in the random feature model plus a subspace-aware algorithm that beats standard methods on CIFAR-10.

read the letter

The main takeaway is that they prove high-probability training data reconstruction for finite-width random feature networks using PAC-style bounds, and they relax the width requirement when the data sits in a low-dimensional subspace. They also give a practical algorithm that estimates the subspace from first-layer weight shifts and reconstructs using only the last layer weights. Experiments on synthetic data and CIFAR-10 show clearer gains over full-space baselines. That finite-width guarantee is the piece that stands out from prior infinite-width or purely empirical reconstruction work. The unified optimization formulation that folds in both initial and trained parameters is a clean way to set up the inverse problem, and the subspace relaxation ties the bound directly to subspace dimension rather than ambient dimension. The algorithm reduces search space in a way that matches the theory for the general case. The theory holds together under the stated random feature assumptions with fixed first-layer weights and trainable second layer. The PAC bounds appear derived from model properties without obvious circularity. Experiments back the claims on both synthetic and real image data. The soft spots are the strong modeling assumptions. The clean bounds apply only to the random feature model; for general networks the subspace estimation from weight changes is an approximation whose accuracy depends on how cleanly the first-layer shift reveals the data orientation. I would want to check how tight the PAC bounds are in practice and whether the width scaling matches the theory on larger models. The CIFAR-10 results are confirmatory but would benefit from more detail on hyperparameter sensitivity and failure cases. This is for researchers working on privacy attacks, model auditing, or finite-width analysis in machine learning. A reader who cares about rigorous reconstruction guarantees or efficient attack algorithms will get concrete value from the bounds and the method. I would send it to peer review. The combination of the new finite-width result and the working algorithm is substantial enough to merit referee attention even if the assumptions need tightening.

Referee Report

2 major / 3 minor

Summary. The paper proposes a unified optimization formulation for data reconstruction attacks that incorporates initial and trained network parameters. In the random feature model it derives finite-width recovery guarantees via PAC-style bounds, showing high-probability exact reconstruction when width is sufficiently large. When data lies in a low-dimensional subspace the width requirement is relaxed to depend on subspace dimension rather than ambient dimension. For general networks an efficient algorithm recovers an approximate subspace from first-layer weight changes and performs reconstruction using only last-layer weights. Experiments on synthetic data and CIFAR-10 demonstrate that the subspace-aware method outperforms full-space baselines.

Significance. If the finite-width PAC bounds hold under the stated assumptions, the work supplies the first explicit high-probability reconstruction guarantees for finite-width random-feature models, a concrete advance over prior asymptotic or infinite-width analyses. The subspace relaxation and the practical algorithm that reduces search-space dimension are directly useful for privacy auditing. The combination of a clean optimization formulation, PAC bounds, and reproducible numerical confirmation on CIFAR-10 strengthens the contribution.

major comments (2)

[§3, Theorem 1] §3 (Random Feature Model, Theorem 1): the PAC bound statement must explicitly list the boundedness assumptions on the random features and the precise form of the reconstruction objective; without these the high-probability claim cannot be verified from the given derivation sketch.
[§4.2] §4.2 (Subspace Relaxation): the claim that the subspace orientation is recoverable from first-layer weight changes is load-bearing for the relaxed width bound, yet the argument appears to rely on an unstated concentration result; a self-contained lemma showing that the estimated subspace converges to the true one with high probability is required.

minor comments (3)

[Abstract] The abstract and introduction should clarify that all theoretical results are stated for the random-feature model with fixed first-layer weights; the transition to general networks is only algorithmic.
[Experiments] Table 1 and Figure 3: axis labels and legend entries should be enlarged for readability; the current font size makes quantitative comparison of reconstruction error difficult.
[§5] The CIFAR-10 experiment description should specify the exact architecture (depth, width, activation) and the precise baseline methods against which the subspace-aware algorithm is compared.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive comments on our manuscript. We address each major comment below and will incorporate the requested clarifications in the revised version.

read point-by-point responses

Referee: [§3, Theorem 1] §3 (Random Feature Model, Theorem 1): the PAC bound statement must explicitly list the boundedness assumptions on the random features and the precise form of the reconstruction objective; without these the high-probability claim cannot be verified from the given derivation sketch.

Authors: We agree that the theorem statement should be self-contained. The derivation implicitly relies on the random features satisfying ||φ(x)||_2 ≤ 1 almost surely and on the reconstruction objective being the convex program min_w ||W_1 w - y||_2^2 subject to the trained last-layer weights matching the observed output. In the revision we will restate Theorem 1 with these assumptions listed explicitly at the outset, followed by the precise objective, so that the PAC bound and its high-probability claim can be verified directly from the text. revision: yes
Referee: [§4.2] §4.2 (Subspace Relaxation): the claim that the subspace orientation is recoverable from first-layer weight changes is load-bearing for the relaxed width bound, yet the argument appears to rely on an unstated concentration result; a self-contained lemma showing that the estimated subspace converges to the true one with high probability is required.

Authors: We acknowledge the need for an explicit supporting lemma. The current argument invokes a matrix concentration bound on the first-layer weight change ΔW ≈ (1/n) X X^T projected onto the random features, but does not isolate the result. We will add a new self-contained Lemma 4.1 that proves: under the assumption that the data lie in a d-dimensional subspace, the principal angles between the estimated subspace (top-d singular vectors of ΔW) and the true subspace converge to zero at rate O(√(d log m / n)) with probability 1-δ, using a matrix Bernstein inequality. This lemma will be placed immediately before the relaxed-width theorem and will make the dependence on subspace dimension fully rigorous. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained under model assumptions

full rationale

The paper proposes an optimization formulation for data reconstruction from initial and trained parameters, then derives high-probability finite-width recovery guarantees in the random feature model via PAC-style bounds. These bounds are obtained by applying standard concentration arguments to the random feature model (fixed random first layer, trainable second layer) and the optimization objective; they are not defined in terms of the target reconstruction success probability. The subspace relaxation similarly derives width requirements from the subspace dimension via the same PAC machinery rather than by construction. No equations reduce a prediction to a fitted parameter, no load-bearing self-citations are invoked for uniqueness or ansatz, and the central claim does not rename a known empirical pattern. The argument structure is therefore internally consistent and independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claims rest on the random feature model being a faithful proxy for real networks and on standard concentration inequalities from PAC learning theory; no new entities are postulated and no parameters appear to be fitted inside the bounds themselves.

axioms (2)

domain assumption Random feature model: first-layer weights are fixed random, only second layer is trained.
The provable reconstruction result is stated only for this model.
standard math PAC-style concentration bounds apply to the reconstruction optimization.
The finite-width high-probability guarantee is derived via these bounds.

pith-pipeline@v0.9.0 · 5506 in / 1371 out tokens · 31599 ms · 2026-05-08T12:31:04.820980+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

[1]

Reconstructing training data from model gradient, provably

Zihan Wang, Jason Lee, and Qi Lei. Reconstructing training data from model gradient, provably. In International Conference on Artificial Intelligence and Statistics, pages 6595–6612. PMLR, 2023

work page 2023
[2]

Reconstructing training data with informed adver- saries

Borja Balle, Giovanni Cherubin, and Jamie Hayes. Reconstructing training data with informed adver- saries. In2022 IEEE Symposium on Security and Privacy (SP), pages 1138–1156. IEEE, 2022

work page 2022
[3]

Reconstructing training data from trained neural networks

Niv Haim, Gal Vardi, Gilad Yehudai, Ohad Shamir, and Michal Irani. Reconstructing training data from trained neural networks. InAdvances in neural information processing systems, volume 35, pages 22911–22924. Curran Associates, Inc., 2022

work page 2022
[4]

arXiv preprint arXiv:1906.05890 , year=

Kaifeng Lyu and Jian Li. Gradient descent maximizes the margin of homogeneous neural networks. arXiv preprint arXiv:1906.05890, 2019

work page arXiv 1906
[5]

Directional convergence and alignment in deep learning

Ziwei Ji and Matus Telgarsky. Directional convergence and alignment in deep learning. InAdvances in Neural Information Processing Systems, volume 33, pages 17176–17186. Curran Associates, Inc., 2020

work page 2020
[6]

Deconstructing data reconstruction: Multiclass, weight decay and general losses.Advances in Neural Information Processing Systems, 36:51515–51535, 2023

Gon Buzaglo, Niv Haim, Gilad Yehudai, Gal Vardi, Yakir Oz, Yaniv Nikankin, and Michal Irani. Deconstructing data reconstruction: Multiclass, weight decay and general losses.Advances in Neural Information Processing Systems, 36:51515–51535, 2023

work page 2023
[7]

Understand- ing reconstruction attacks with the neural tangent kernel and dataset distillation.arXiv preprint arXiv:2302.01428, 2023

Noel Loo, Ramin Hasani, Mathias Lechner, Alexander Amini, and Daniela Rus. Understand- ing reconstruction attacks with the neural tangent kernel and dataset distillation.arXiv preprint arXiv:2302.01428, 2023

work page arXiv 2023
[8]

arXiv preprint arXiv:2509.22214 , year=

Leonardo Iurada, Simone Bombari, Tatiana Tommasi, and Marco Mondelli. A Law of Data Reconstruc- tion for Random Features (and Beyond), September 2025. arXiv:2509.22214 [cs]

work page arXiv 2025
[9]

Neural tangent kernel: Convergence and general- ization in neural networks.Advances in neural information processing systems, 31, 2018

Arthur Jacot, Franck Gabriel, and Cl´ ement Hongler. Neural tangent kernel: Convergence and general- ization in neural networks.Advances in neural information processing systems, 31, 2018

work page 2018
[10]

Random Features for Large-Scale Kernel Machines

Ali Rahimi and Benjamin Recht. Random Features for Large-Scale Kernel Machines. InAdvances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007

work page 2007
[11]

Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning.Advances in neural information processing systems, 21, 2008

Ali Rahimi and Benjamin Recht. Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning.Advances in neural information processing systems, 21, 2008

work page 2008
[12]

The manifold tangent classifier.Advances in neural information processing systems, 24, 2011

Salah Rifai, Yann N Dauphin, Pascal Vincent, Yoshua Bengio, and Xavier Muller. The manifold tangent classifier.Advances in neural information processing systems, 24, 2011

work page 2011
[13]

Springer, 2007

John A Lee, Michel Verleysen, et al.Nonlinear dimensionality reduction, volume 1. Springer, 2007

work page 2007
[14]

Generative adversarial networks.Communications of the ACM, 63(11):139–144, 2020

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.Communications of the ACM, 63(11):139–144, 2020

work page 2020
[15]

Adversarial examples exist in two-layer relu networks for low dimensional linear subspaces.Advances in Neural Information Processing Systems, 36:5028–5049, 2023

Odelia Melamed, Gilad Yehudai, and Gal Vardi. Adversarial examples exist in two-layer relu networks for low dimensional linear subspaces.Advances in Neural Information Processing Systems, 36:5028–5049, 2023

work page 2023
[16]

Adversarial vulnerability for any classifier, 2018

Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classifier, 2018

work page 2018
[17]

arXiv preprint arXiv:1810.04374 , year=

Yitong Sun, Anna Gilbert, and Ambuj Tewari. On the Approximation Properties of Random ReLU Features, August 2019. arXiv:1810.04374 [stat]

work page arXiv 2019
[18]

Deep learning: a statistical viewpoint

Peter L Bartlett, Andrea Montanari, and Alexander Rakhlin. Deep learning: a statistical viewpoint. Acta numerica, 30:87–201, 2021

work page 2021
[19]

A kernel two-sample test.The journal of machine learning research, 13(1):723–773, 2012

Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Sch¨ olkopf, and Alexander Smola. A kernel two-sample test.The journal of machine learning research, 13(1):723–773, 2012. 11

work page 2012
[20]

Universality, characteristic kernels and rkhs embedding of measures.Journal of Machine Learning Research, 12(7), 2011

Bharath K Sriperumbudur, Kenji Fukumizu, and Gert RG Lanckriet. Universality, characteristic kernels and rkhs embedding of measures.Journal of Machine Learning Research, 12(7), 2011

work page 2011
[21]

Controlling wasserstein distances by kernel norms with application to compressive statistical learning.Journal of Machine Learning Research, 24(149):1–51, 2023

Titouan Vayer and R´ emi Gribonval. Controlling wasserstein distances by kernel norms with application to compressive statistical learning.Journal of Machine Learning Research, 24(149):1–51, 2023

work page 2023
[22]

Lecun, L

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recogni- tion.Proceedings of the IEEE, 86(11):2278–2324, November 1998

work page 1998
[23]

Probability Inequalities for Sums of Bounded Random Variables.Journal of the American Statistical Association, 58(301):13–30, March 1963

Wassily Hoeffding. Probability Inequalities for Sums of Bounded Random Variables.Journal of the American Statistical Association, 58(301):13–30, March 1963

work page 1963
[24]

last layer

Roman Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2018. 12 A Proof of theoretical results We recall Hoeffding’s inequality, which will be useful to bound, with high probability, the deviation of the finite-wi...

work page 2018

[1] [1]

Reconstructing training data from model gradient, provably

Zihan Wang, Jason Lee, and Qi Lei. Reconstructing training data from model gradient, provably. In International Conference on Artificial Intelligence and Statistics, pages 6595–6612. PMLR, 2023

work page 2023

[2] [2]

Reconstructing training data with informed adver- saries

Borja Balle, Giovanni Cherubin, and Jamie Hayes. Reconstructing training data with informed adver- saries. In2022 IEEE Symposium on Security and Privacy (SP), pages 1138–1156. IEEE, 2022

work page 2022

[3] [3]

Reconstructing training data from trained neural networks

Niv Haim, Gal Vardi, Gilad Yehudai, Ohad Shamir, and Michal Irani. Reconstructing training data from trained neural networks. InAdvances in neural information processing systems, volume 35, pages 22911–22924. Curran Associates, Inc., 2022

work page 2022

[4] [4]

arXiv preprint arXiv:1906.05890 , year=

Kaifeng Lyu and Jian Li. Gradient descent maximizes the margin of homogeneous neural networks. arXiv preprint arXiv:1906.05890, 2019

work page arXiv 1906

[5] [5]

Directional convergence and alignment in deep learning

Ziwei Ji and Matus Telgarsky. Directional convergence and alignment in deep learning. InAdvances in Neural Information Processing Systems, volume 33, pages 17176–17186. Curran Associates, Inc., 2020

work page 2020

[6] [6]

Deconstructing data reconstruction: Multiclass, weight decay and general losses.Advances in Neural Information Processing Systems, 36:51515–51535, 2023

Gon Buzaglo, Niv Haim, Gilad Yehudai, Gal Vardi, Yakir Oz, Yaniv Nikankin, and Michal Irani. Deconstructing data reconstruction: Multiclass, weight decay and general losses.Advances in Neural Information Processing Systems, 36:51515–51535, 2023

work page 2023

[7] [7]

Understand- ing reconstruction attacks with the neural tangent kernel and dataset distillation.arXiv preprint arXiv:2302.01428, 2023

Noel Loo, Ramin Hasani, Mathias Lechner, Alexander Amini, and Daniela Rus. Understand- ing reconstruction attacks with the neural tangent kernel and dataset distillation.arXiv preprint arXiv:2302.01428, 2023

work page arXiv 2023

[8] [8]

arXiv preprint arXiv:2509.22214 , year=

Leonardo Iurada, Simone Bombari, Tatiana Tommasi, and Marco Mondelli. A Law of Data Reconstruc- tion for Random Features (and Beyond), September 2025. arXiv:2509.22214 [cs]

work page arXiv 2025

[9] [9]

Neural tangent kernel: Convergence and general- ization in neural networks.Advances in neural information processing systems, 31, 2018

Arthur Jacot, Franck Gabriel, and Cl´ ement Hongler. Neural tangent kernel: Convergence and general- ization in neural networks.Advances in neural information processing systems, 31, 2018

work page 2018

[10] [10]

Random Features for Large-Scale Kernel Machines

Ali Rahimi and Benjamin Recht. Random Features for Large-Scale Kernel Machines. InAdvances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007

work page 2007

[11] [11]

Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning.Advances in neural information processing systems, 21, 2008

Ali Rahimi and Benjamin Recht. Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning.Advances in neural information processing systems, 21, 2008

work page 2008

[12] [12]

The manifold tangent classifier.Advances in neural information processing systems, 24, 2011

Salah Rifai, Yann N Dauphin, Pascal Vincent, Yoshua Bengio, and Xavier Muller. The manifold tangent classifier.Advances in neural information processing systems, 24, 2011

work page 2011

[13] [13]

Springer, 2007

John A Lee, Michel Verleysen, et al.Nonlinear dimensionality reduction, volume 1. Springer, 2007

work page 2007

[14] [14]

Generative adversarial networks.Communications of the ACM, 63(11):139–144, 2020

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.Communications of the ACM, 63(11):139–144, 2020

work page 2020

[15] [15]

Adversarial examples exist in two-layer relu networks for low dimensional linear subspaces.Advances in Neural Information Processing Systems, 36:5028–5049, 2023

Odelia Melamed, Gilad Yehudai, and Gal Vardi. Adversarial examples exist in two-layer relu networks for low dimensional linear subspaces.Advances in Neural Information Processing Systems, 36:5028–5049, 2023

work page 2023

[16] [16]

Adversarial vulnerability for any classifier, 2018

Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classifier, 2018

work page 2018

[17] [17]

arXiv preprint arXiv:1810.04374 , year=

Yitong Sun, Anna Gilbert, and Ambuj Tewari. On the Approximation Properties of Random ReLU Features, August 2019. arXiv:1810.04374 [stat]

work page arXiv 2019

[18] [18]

Deep learning: a statistical viewpoint

Peter L Bartlett, Andrea Montanari, and Alexander Rakhlin. Deep learning: a statistical viewpoint. Acta numerica, 30:87–201, 2021

work page 2021

[19] [19]

A kernel two-sample test.The journal of machine learning research, 13(1):723–773, 2012

Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Sch¨ olkopf, and Alexander Smola. A kernel two-sample test.The journal of machine learning research, 13(1):723–773, 2012. 11

work page 2012

[20] [20]

Universality, characteristic kernels and rkhs embedding of measures.Journal of Machine Learning Research, 12(7), 2011

Bharath K Sriperumbudur, Kenji Fukumizu, and Gert RG Lanckriet. Universality, characteristic kernels and rkhs embedding of measures.Journal of Machine Learning Research, 12(7), 2011

work page 2011

[21] [21]

Controlling wasserstein distances by kernel norms with application to compressive statistical learning.Journal of Machine Learning Research, 24(149):1–51, 2023

Titouan Vayer and R´ emi Gribonval. Controlling wasserstein distances by kernel norms with application to compressive statistical learning.Journal of Machine Learning Research, 24(149):1–51, 2023

work page 2023

[22] [22]

Lecun, L

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recogni- tion.Proceedings of the IEEE, 86(11):2278–2324, November 1998

work page 1998

[23] [23]

Probability Inequalities for Sums of Bounded Random Variables.Journal of the American Statistical Association, 58(301):13–30, March 1963

Wassily Hoeffding. Probability Inequalities for Sums of Bounded Random Variables.Journal of the American Statistical Association, 58(301):13–30, March 1963

work page 1963

[24] [24]

last layer

Roman Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2018. 12 A Proof of theoretical results We recall Hoeffding’s inequality, which will be useful to bound, with high probability, the deviation of the finite-wi...

work page 2018