Asymptotic Theory of K-fold Cross-validation in Lasso and the validity of Bootstrap
Pith reviewed 2026-05-19 04:15 UTC · model grok-4.3
The pith
K-fold CV tuned Lasso is root-n consistent but not variable selection consistent under moment conditions, with bootstrap valid for its distribution.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under a heteroscedastic linear regression model and unspecified moment conditions, the Lasso estimator with penalty selected by K-fold cross-validation is n^{1/2}-consistent for the regression parameter vector, but is not variable-selection consistent; moreover, the bootstrap is valid for approximating the distribution of this estimator.
What carries the argument
K-fold cross-validation procedure for selecting the Lasso penalty parameter in a heteroscedastic linear model, together with bootstrap resampling of the CV-tuned estimator.
If this is right
- The CV-tuned Lasso can be used for root-n consistent estimation and for constructing asymptotically valid confidence intervals via bootstrap in heteroscedastic linear regression.
- Variable selection consistency cannot be claimed for this estimator, so it should not be treated as a method that reliably identifies the exact set of relevant predictors.
- Inference procedures that treat the CV-chosen penalty as fixed will generally remain valid because the bootstrap accounts for the randomness in penalty selection.
- The results extend justification for using K-fold CV Lasso in applied linear regression work without requiring stronger tail or design assumptions.
Where Pith is reading between the lines
- The lack of variable-selection consistency suggests that post-selection inference after CV-Lasso may still require additional adjustments beyond simple bootstrap.
- It would be natural to test whether the same moment conditions suffice for CV-tuned Lasso in generalized linear models or with other cross-validation schemes such as leave-one-out.
- The bootstrap validity result opens the door to using resampling-based methods for comparing different CV-tuned regularized estimators in finite samples.
Load-bearing premise
The regression errors obey certain unspecified moment conditions that are sufficient for the consistency and bootstrap results to hold.
What would settle it
A simulation or data example in which the CV-tuned Lasso estimator exhibits slower than root-n convergence rates or in which bootstrap intervals fail to achieve correct coverage under the paper's stated moment conditions.
Figures
read the original abstract
Least absolute shrinkage and selection operator or Lasso is one of the widely used regularization methods in regression. Statisticians usually implement Lasso in practice by choosing the penalty parameter in a data-dependent way, the most popular being the $K-$fold cross-validation (or $K-$fold CV). However, inferential properties, such as the variable selection consistency and $n^{1/2}-$consistency, of the $K-$fold CV based Lasso estimator and validity of the Bootstrap approximation are still unknown. In this paper, we consider the heteroscedastic linear regression model and show only under some moment type conditions that the Lasso estimator with $K$-fold CV based penalty is $n^{1/2}-$consistent, but not variable selection consistent. Additionally, we establish the validity of Bootstrap in approximating the distribution of the $K-$fold CV based Lasso estimator. Therefore, our results theoretically justify the use of $K-$fold CV based Lasso estimator to perform statistical inference in linear regression. We validate our Bootstrap method for the $K-$fold CV based Lasso estimator in finite samples based on simulations. We also implement our Bootstrap based inference on a real data set.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper considers the heteroscedastic linear regression model and shows that, under unspecified moment-type conditions, the Lasso estimator with penalty parameter chosen by K-fold cross-validation is n^{1/2}-consistent but not variable selection consistent. It further claims that the bootstrap consistently approximates the distribution of this K-fold CV Lasso estimator. The results are supported by simulation studies and an application to a real dataset.
Significance. If the bootstrap validity result holds under precisely stated conditions, the work would provide useful theoretical justification for performing inference with the practically common K-fold CV tuned Lasso in heteroscedastic settings, where variable selection consistency fails but root-n consistency and bootstrap approximation succeed.
major comments (1)
- [Abstract and the statement of the bootstrap theorem] The bootstrap consistency claim (central to the second half of the main result) rests on 'some moment type conditions' whose precise order and uniformity are not stated explicitly. In the heteroscedastic model, where observation-specific variances may be unbounded, conditions such as uniform boundedness of E[|X_i ε_i|^{2+δ}] for some δ>0 are typically required to control the remainder term after substituting the data-dependent λ_CV and to verify the Lindeberg condition for the bootstrap. If only second-moment assumptions are imposed, the argument may fail to go through.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback on our manuscript. We address the major comment point by point below.
read point-by-point responses
-
Referee: [Abstract and the statement of the bootstrap theorem] The bootstrap consistency claim (central to the second half of the main result) rests on 'some moment type conditions' whose precise order and uniformity are not stated explicitly. In the heteroscedastic model, where observation-specific variances may be unbounded, conditions such as uniform boundedness of E[|X_i ε_i|^{2+δ}] for some δ>0 are typically required to control the remainder term after substituting the data-dependent λ_CV and to verify the Lindeberg condition for the bootstrap. If only second-moment assumptions are imposed, the argument may fail to go through.
Authors: We agree that the abstract and theorem statement would benefit from greater explicitness regarding the moment conditions. The full paper states the assumptions in detail (including a uniform bound on E[|X_i ε_i|^{2+δ}] for δ > 0 to ensure the Lindeberg condition holds after substitution of the random λ_CV). However, the abstract's phrasing 'some moment type conditions' is indeed imprecise. We will revise the abstract to briefly indicate the key moment requirements and add a clarifying remark to the bootstrap theorem statement specifying the order and uniformity of the moments. This revision improves readability without changing the results or proofs. revision: yes
Circularity Check
No circularity: asymptotic derivations are self-contained under stated moment conditions.
full rationale
The paper derives sqrt(n)-consistency and bootstrap validity for the K-fold CV Lasso estimator in a heteroscedastic linear model via standard empirical process and concentration arguments under unspecified but fixed moment conditions. No step reduces a claimed prediction or uniqueness result to a fitted parameter or self-citation by construction; the central claims rest on external probabilistic tools rather than re-labeling inputs. The reader's assessment of score 2 aligns with the absence of any load-bearing self-definition or ansatz smuggling.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Heteroscedastic linear regression model
- domain assumption Some moment type conditions
Reference graph
Works this paper leans on
-
[1]
Aliprantis, C. D. & Border, K. C. (2006). Infinite dimensional analysis: a hitchhiker's guide. Springer Science & Business Media
work page 2006
-
[2]
Billingsley, P. (2013). Convergence of probability measures. John Wiley & Sons
work page 2013
-
[3]
B \"u hlmann, P. & Van De Geer, S. (2011). Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media
work page 2011
-
[4]
Camponovo, L. (2015). On the validity of the pairs bootstrap for lasso estimators. Biometrika. 102(4), 981--987
work page 2015
-
[5]
Chatterjee, A. & Lahiri, S. N. (2010). Asymptotic properties of the residual bootstrap for lasso estimators. Proceedings of the American Mathematical Society. 138(12), 4497--4509
work page 2010
-
[6]
Chatterjee, A. & Lahiri, S. N. (2011). Bootstrapping lasso estimators. Journal of the American Statistical Association. 106(494), 608--625
work page 2011
-
[7]
Chatterjee, A. & Lahiri, S. N. (2011). Strong consistency of Lasso estimators. Sankhya A. 73, 55--78
work page 2011
-
[8]
Prediction error of cross-validated Lasso
Chatterjee, S. & Jafarov, J. (2015). Prediction error of cross-validated lasso. arXiv preprint arXiv:1502.06291
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[9]
Chaudhuri, A. & Chatterjee, S. (2022). A Cross Validation Framework for Signal Denoising with Applications to Trend Filtering, Dyadic CART and Beyond. arXiv preprint arXiv:2201.02654
- [10]
- [11]
-
[12]
Das, D. & Lahiri, S. N. (2019). Distributional consistency of the lasso by perturbation bootstrap. Biometrika. 106(4), 957--964
work page 2019
-
[13]
Davis, R. A. , Knight, K. & Liu, J. (1992). M-estimation for autoregressions with infinite variance . Stochastic Processes and Their Applications. 40(1), 145--180
work page 1992
-
[14]
Dudley, R. M. (1985). An extended Wichura theorem, definitions of Donsker class, and weighted empirical distributions. Probability in Banach Spaces V: Lecture Notes in Mathematics. 1153, 141--178
work page 1985
-
[15]
Efron, B. , Hastie, T. , Johnstone, I. & Tibshirani, R. (2004). Least Angle Regression. Annals of Statistics. 32(2), 407--451
work page 2004
- [16]
-
[17]
Ferger, D. (2021). A continuous mapping theorem for the argmin-set functional with applications to convex stochastic processes. Kybernetika. 57(3), 426--445
work page 2021
-
[18]
Friedman, J. , Hastie, T. & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of statistical software. 33(1), 1--22
work page 2010
-
[19]
Fuk, D. K. & Nagaev, S. V. (1971). Probability inequalities for sums of independent random variables. Theory of Probability & Its Applications. 16(4), 643--660
work page 1971
-
[20]
Geyer, C, J. (1994). On the asymptotics of constrained M-estimation. The Annals of statistics. 22(1) ,1993--2010
work page 1994
-
[21]
Geyer, C, J. (1996). On the asymptotics of convex stochastic optimization. Unpublished manuscript. 37
work page 1996
-
[22]
Giraud, C. (2021). Introduction to high-dimensional statistics. CRC Press
work page 2021
-
[23]
Hastie, T. , Tibshirani, R. & Wainwright, M. (2015). Statistical learning with sparsity: the lasso and generalizations. CRC press
work page 2015
-
[24]
Hjort, N. L. & Pollard, D. (1993). Asymptotics for minimisers of convex processes Technical Report. Yale University
work page 1993
-
[25]
Hoffmann-J rgensen, J. (1991). Stochastic processes on Polish spaces. Aarhus Universite, Matematisk Institut, Aarhus
work page 1991
-
[26]
Homrighausen, D. & McDonald, D. J. (2013). The lasso, persistence, and cross-validation. International conference on machine learning. 1031--1039
work page 2013
-
[27]
Homrighausen, D. & McDonald, D. J. (2014). Leave-one-out cross-validation is risk consistent for lasso. Machine learning. 97, 65--78
work page 2014
-
[28]
Homrighausen, D. & McDonald, D. J. (2017). RISK CONSISTENCY OF CROSS-VALIDATION WITH LASSO-TYPE PROCEDURES. Statistica Sinica. 27, 1017--1036
work page 2017
- [29]
-
[30]
Kato, K. (2009). Asymptotics for argmin processes: Convexity arguments. Journal of Multivariate Analysis. 100(8) , 1816--1829
work page 2009
-
[31]
Kim, J. & Pollard, D. (1990). Cube root asymptotics. The Annals of Statistics. 18(1), 191--219
work page 1990
-
[32]
Knight, K. & Fu, W. (2000). Asymptotics for Lasso-Type estimators. The Annals of Statistics. 28(5), 1356--1378
work page 2000
-
[33]
Lahiri, S. N. (2021). NECESSARY AND SUFFICIENT CONDITIONS FOR VARIABLE SELECTION CONSISTENCY OF THE LASSO IN HIGH DIMENSIONS. The Annals of Statistics. 49(2) , 820--844
work page 2021
-
[34]
Lecu \'e , G. & Mitchell, C. (2012). Oracle inequalities for cross-validation type procedures. Electronic Journal of Statistics. 6 , 1803--1837
work page 2012
-
[35]
Newey, W. K. (1991). Uniform convergence in probability and stochastic equicontinuity. Econometrica: Journal of the Econometric Society. 59(4) , 1161--1167
work page 1991
-
[36]
Ng, T. L. & Newton, M. A. (2022). Random weighting in Lasso regression . Electronic Journal of Statistics. 16(1), 3430--3481
work page 2022
-
[37]
Pollard, D. (1989). Asymptotics via empirical processes. Statistical science, 341--354
work page 1989
-
[38]
Pollard, D. (1990). Empirical processes: theory and applications. vol. 2 CBMS Conference Series in Probability and Statistics, Vol. 2. Hayward, CA: Institute of Mathematical Statistics
work page 1990
-
[39]
Rockafellar, R. T. (1997). Convex analysis. vol. 11 Princeton university press
work page 1997
-
[40]
Rockafellar, R. T. & Wets, R. J. B. (2009). Variational analysis. Springer Science & Business Media. vol. 317
work page 2009
-
[41]
Rudin, W. (1976). Principles of Mathematical Analysis. McGraw-Hill Science
work page 1976
-
[42]
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological). 58(1), 267--288
work page 1996
-
[43]
Tibshirani, R. & Taylor, J. (2011). THE SOLUTION PATH OF THE GENERALIZED LASSO. Annals of Statistics. 39(3), 1335--1371
work page 2011
-
[44]
Van De Geer, S. & Lederer, J. (2013). The Lasso, correlated design, and improved oracle inequalities. From Probability to Statistics and Back: High-Dimensional Models and Processes--A Festschrift in Honor of Jon A. Wellner. 9, 303--317
work page 2013
-
[45]
Van Der Vaart, A. W. & Wellner, J. A. (1996). Weak convergence and empirical processes: with applications to statistics. Springer
work page 1996
-
[46]
Wagener, J. & Dette, H. (2012). Bridge estimators and the adaptive Lasso under heteroscedasticity. Mathematical Methods of Statistics. 21, 109--126
work page 2012
-
[47]
Zhao, P. & Yu, B. (2006). On model selection consistency of Lasso. Journal of Machine learning research. 7, 2541--2563
work page 2006
-
[48]
Zou, H. , Hastie, T. & Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Ann. Statist.. 35(1), 2173--2192
work page 2007
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.