The distribution of the Lasso: Uniform control over sparse balls and adaptive parameter tuning
read the original abstract
The Lasso is a popular regression method for high-dimensional problems in which the number of parameters $\theta_1,\dots,\theta_N$, is larger than the number $n$ of samples: $N>n$. A useful heuristics relates the statistical properties of the Lasso estimator to that of a simple soft-thresholding denoiser,in a denoising problem in which the parameters $(\theta_i)_{i\le N}$ are observed in Gaussian noise, with a carefully tuned variance. Earlier work confirmed this picture in the limit $n,N\to\infty$, pointwise in the parameters $\theta$, and in the value of the regularization parameter. Here, we consider a standard random design model and prove exponential concentration of its empirical distribution around the prediction provided by the Gaussian denoising model. Crucially, our results are uniform with respect to $\theta$ belonging to $\ell_q$ balls, $q\in [0,1]$, and with respect to the regularization parameter. This allows to derive sharp results for the performances of various data-driven procedures to tune the regularization. Our proofs make use of Gaussian comparison inequalities, and in particular of a version of Gordon's minimax theorem developed by Thrampoulidis, Oymak, and Hassibi, which controls the optimum value of the Lasso optimization problem. Crucially, we prove a stability property of the minimizer in Wasserstein distance, that allows to characterize properties of the minimizer itself.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
When Does $\ell_2$-Boosting Overfit Benignly? High-Dimensional Risk Asymptotics and the $\ell_1$ Implicit Bias
ℓ₂-Boosting exhibits benign overfitting with logarithmic excess variance decay Θ(σ²/log(p/n)) under isotropic noise due to ℓ₁ bias, and a subdifferential early stopping rule recovers minimax-optimal ℓ₁ rates.
-
When Does $\ell_2$-Boosting Overfit Benignly? High-Dimensional Risk Asymptotics and the $\ell_1$ Implicit Bias
ℓ₂-boosting localizes noise into sparse sets under isotropic pure-noise models, yielding excess variance Θ(σ²/log(p/n)) instead of linear decay, with a tuning-free early stopping rule attaining minimax ℓ₁ rates.
-
Approximate separability of symmetrically penalized least squares in high dimensions: characterization and consequences
Symmetrically penalized least squares with non-separable penalties approximately matches separable penalties in high-dimensional Gaussian models, quantified by finite-sample concentration inequalities, with limited ad...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.