pith. sign in

arxiv: 2507.12457 · v5 · submitted 2025-07-16 · 📊 stat.ME

Asymptotic Theory of K-fold Cross-validation in Lasso and the validity of Bootstrap

Pith reviewed 2026-05-19 04:15 UTC · model grok-4.3

classification 📊 stat.ME
keywords LassoK-fold cross-validationbootstrapasymptotic consistencyvariable selectionheteroscedastic regressionstatistical inference
0
0 comments X

The pith

K-fold CV tuned Lasso is root-n consistent but not variable selection consistent under moment conditions, with bootstrap valid for its distribution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In a heteroscedastic linear regression model the authors establish that Lasso with the penalty chosen by K-fold cross-validation is square-root consistent for the regression coefficients, yet fails to recover the exact support of nonzero coefficients. This matters because data-dependent tuning via cross-validation is the default practical choice, and without such results its use for estimation and inference rested on unproven assumptions. The paper further shows that bootstrap resampling consistently approximates the limiting distribution of this estimator. The proofs rely only on moment-type conditions rather than stronger assumptions like sub-Gaussian tails or exact sparsity.

Core claim

Under a heteroscedastic linear regression model and unspecified moment conditions, the Lasso estimator with penalty selected by K-fold cross-validation is n^{1/2}-consistent for the regression parameter vector, but is not variable-selection consistent; moreover, the bootstrap is valid for approximating the distribution of this estimator.

What carries the argument

K-fold cross-validation procedure for selecting the Lasso penalty parameter in a heteroscedastic linear model, together with bootstrap resampling of the CV-tuned estimator.

If this is right

  • The CV-tuned Lasso can be used for root-n consistent estimation and for constructing asymptotically valid confidence intervals via bootstrap in heteroscedastic linear regression.
  • Variable selection consistency cannot be claimed for this estimator, so it should not be treated as a method that reliably identifies the exact set of relevant predictors.
  • Inference procedures that treat the CV-chosen penalty as fixed will generally remain valid because the bootstrap accounts for the randomness in penalty selection.
  • The results extend justification for using K-fold CV Lasso in applied linear regression work without requiring stronger tail or design assumptions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The lack of variable-selection consistency suggests that post-selection inference after CV-Lasso may still require additional adjustments beyond simple bootstrap.
  • It would be natural to test whether the same moment conditions suffice for CV-tuned Lasso in generalized linear models or with other cross-validation schemes such as leave-one-out.
  • The bootstrap validity result opens the door to using resampling-based methods for comparing different CV-tuned regularized estimators in finite samples.

Load-bearing premise

The regression errors obey certain unspecified moment conditions that are sufficient for the consistency and bootstrap results to hold.

What would settle it

A simulation or data example in which the CV-tuned Lasso estimator exhibits slower than root-n convergence rates or in which bootstrap intervals fail to achieve correct coverage under the paper's stated moment conditions.

Figures

Figures reproduced from arXiv: 2507.12457 by Debraj Das, Mayukh Choudhury.

Figure 1
Figure 1. Figure 1: Uniqueness of Λˆ ∞,𝐾 over different S 12 [PITH_FULL_IMAGE:figures/full_fig_p012_1.png] view at source ↗
read the original abstract

Least absolute shrinkage and selection operator or Lasso is one of the widely used regularization methods in regression. Statisticians usually implement Lasso in practice by choosing the penalty parameter in a data-dependent way, the most popular being the $K-$fold cross-validation (or $K-$fold CV). However, inferential properties, such as the variable selection consistency and $n^{1/2}-$consistency, of the $K-$fold CV based Lasso estimator and validity of the Bootstrap approximation are still unknown. In this paper, we consider the heteroscedastic linear regression model and show only under some moment type conditions that the Lasso estimator with $K$-fold CV based penalty is $n^{1/2}-$consistent, but not variable selection consistent. Additionally, we establish the validity of Bootstrap in approximating the distribution of the $K-$fold CV based Lasso estimator. Therefore, our results theoretically justify the use of $K-$fold CV based Lasso estimator to perform statistical inference in linear regression. We validate our Bootstrap method for the $K-$fold CV based Lasso estimator in finite samples based on simulations. We also implement our Bootstrap based inference on a real data set.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper considers the heteroscedastic linear regression model and shows that, under unspecified moment-type conditions, the Lasso estimator with penalty parameter chosen by K-fold cross-validation is n^{1/2}-consistent but not variable selection consistent. It further claims that the bootstrap consistently approximates the distribution of this K-fold CV Lasso estimator. The results are supported by simulation studies and an application to a real dataset.

Significance. If the bootstrap validity result holds under precisely stated conditions, the work would provide useful theoretical justification for performing inference with the practically common K-fold CV tuned Lasso in heteroscedastic settings, where variable selection consistency fails but root-n consistency and bootstrap approximation succeed.

major comments (1)
  1. [Abstract and the statement of the bootstrap theorem] The bootstrap consistency claim (central to the second half of the main result) rests on 'some moment type conditions' whose precise order and uniformity are not stated explicitly. In the heteroscedastic model, where observation-specific variances may be unbounded, conditions such as uniform boundedness of E[|X_i ε_i|^{2+δ}] for some δ>0 are typically required to control the remainder term after substituting the data-dependent λ_CV and to verify the Lindeberg condition for the bootstrap. If only second-moment assumptions are imposed, the argument may fail to go through.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. We address the major comment point by point below.

read point-by-point responses
  1. Referee: [Abstract and the statement of the bootstrap theorem] The bootstrap consistency claim (central to the second half of the main result) rests on 'some moment type conditions' whose precise order and uniformity are not stated explicitly. In the heteroscedastic model, where observation-specific variances may be unbounded, conditions such as uniform boundedness of E[|X_i ε_i|^{2+δ}] for some δ>0 are typically required to control the remainder term after substituting the data-dependent λ_CV and to verify the Lindeberg condition for the bootstrap. If only second-moment assumptions are imposed, the argument may fail to go through.

    Authors: We agree that the abstract and theorem statement would benefit from greater explicitness regarding the moment conditions. The full paper states the assumptions in detail (including a uniform bound on E[|X_i ε_i|^{2+δ}] for δ > 0 to ensure the Lindeberg condition holds after substitution of the random λ_CV). However, the abstract's phrasing 'some moment type conditions' is indeed imprecise. We will revise the abstract to briefly indicate the key moment requirements and add a clarifying remark to the bootstrap theorem statement specifying the order and uniformity of the moments. This revision improves readability without changing the results or proofs. revision: yes

Circularity Check

0 steps flagged

No circularity: asymptotic derivations are self-contained under stated moment conditions.

full rationale

The paper derives sqrt(n)-consistency and bootstrap validity for the K-fold CV Lasso estimator in a heteroscedastic linear model via standard empirical process and concentration arguments under unspecified but fixed moment conditions. No step reduces a claimed prediction or uniqueness result to a fitted parameter or self-citation by construction; the central claims rest on external probabilistic tools rather than re-labeling inputs. The reader's assessment of score 2 aligns with the absence of any load-bearing self-definition or ansatz smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the heteroscedastic linear regression model and unspecified moment type conditions; no free parameters or invented entities are mentioned in the abstract.

axioms (2)
  • domain assumption Heteroscedastic linear regression model
    The entire analysis is conducted under this model as stated in the abstract.
  • domain assumption Some moment type conditions
    All consistency and bootstrap results hold only under these conditions, per the abstract.

pith-pipeline@v0.9.0 · 5735 in / 1332 out tokens · 52055 ms · 2026-05-19T04:15:04.861617+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 1 internal anchor

  1. [1]

    Aliprantis, C. D. & Border, K. C. (2006). Infinite dimensional analysis: a hitchhiker's guide. Springer Science & Business Media

  2. [2]

    Billingsley, P. (2013). Convergence of probability measures. John Wiley & Sons

  3. [3]

    & Van De Geer, S

    B \"u hlmann, P. & Van De Geer, S. (2011). Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media

  4. [4]

    Camponovo, L. (2015). On the validity of the pairs bootstrap for lasso estimators. Biometrika. 102(4), 981--987

  5. [5]

    & Lahiri, S

    Chatterjee, A. & Lahiri, S. N. (2010). Asymptotic properties of the residual bootstrap for lasso estimators. Proceedings of the American Mathematical Society. 138(12), 4497--4509

  6. [6]

    & Lahiri, S

    Chatterjee, A. & Lahiri, S. N. (2011). Bootstrapping lasso estimators. Journal of the American Statistical Association. 106(494), 608--625

  7. [7]

    & Lahiri, S

    Chatterjee, A. & Lahiri, S. N. (2011). Strong consistency of Lasso estimators. Sankhya A. 73, 55--78

  8. [8]

    Prediction error of cross-validated Lasso

    Chatterjee, S. & Jafarov, J. (2015). Prediction error of cross-validated lasso. arXiv preprint arXiv:1502.06291

  9. [9]

    & Chatterjee, S

    Chaudhuri, A. & Chatterjee, S. (2022). A Cross Validation Framework for Signal Denoising with Applications to Trend Filtering, Dyadic CART and Beyond. arXiv preprint arXiv:2201.02654

  10. [10]

    , Liao, Z

    Chetverikov, D. , Liao, Z. & Chernozhukov, V. (2021). On cross-validated lasso in high dimensions. The Annals of Statistics. 49(3), 1300--1317

  11. [11]

    & Das, D

    Choudhury, M. & Das, D. (2024). Bootstrapping Lasso in Generalized Linear Models. arXiv preprint arXiv:2403.19515

  12. [12]

    & Lahiri, S

    Das, D. & Lahiri, S. N. (2019). Distributional consistency of the lasso by perturbation bootstrap. Biometrika. 106(4), 957--964

  13. [13]

    Davis, R. A. , Knight, K. & Liu, J. (1992). M-estimation for autoregressions with infinite variance . Stochastic Processes and Their Applications. 40(1), 145--180

  14. [14]

    Dudley, R. M. (1985). An extended Wichura theorem, definitions of Donsker class, and weighted empirical distributions. Probability in Banach Spaces V: Lecture Notes in Mathematics. 1153, 141--178

  15. [15]

    , Hastie, T

    Efron, B. , Hastie, T. , Johnstone, I. & Tibshirani, R. (2004). Least Angle Regression. Annals of Statistics. 32(2), 407--451

  16. [16]

    , Guo, S

    Fan, J. , Guo, S. & Hao, N. (2012). Variance estimation using refitted cross-validation in ultrahigh dimensional regression. Journal of the Royal Statistical Society Series B: Statistical Methodology. 74(1), 37--65

  17. [17]

    Ferger, D. (2021). A continuous mapping theorem for the argmin-set functional with applications to convex stochastic processes. Kybernetika. 57(3), 426--445

  18. [18]

    , Hastie, T

    Friedman, J. , Hastie, T. & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of statistical software. 33(1), 1--22

  19. [19]

    Fuk, D. K. & Nagaev, S. V. (1971). Probability inequalities for sums of independent random variables. Theory of Probability & Its Applications. 16(4), 643--660

  20. [20]

    Geyer, C, J. (1994). On the asymptotics of constrained M-estimation. The Annals of statistics. 22(1) ,1993--2010

  21. [21]

    Geyer, C, J. (1996). On the asymptotics of convex stochastic optimization. Unpublished manuscript. 37

  22. [22]

    Giraud, C. (2021). Introduction to high-dimensional statistics. CRC Press

  23. [23]

    , Tibshirani, R

    Hastie, T. , Tibshirani, R. & Wainwright, M. (2015). Statistical learning with sparsity: the lasso and generalizations. CRC press

  24. [24]

    Hjort, N. L. & Pollard, D. (1993). Asymptotics for minimisers of convex processes Technical Report. Yale University

  25. [25]

    Hoffmann-J rgensen, J. (1991). Stochastic processes on Polish spaces. Aarhus Universite, Matematisk Institut, Aarhus

  26. [26]

    & McDonald, D

    Homrighausen, D. & McDonald, D. J. (2013). The lasso, persistence, and cross-validation. International conference on machine learning. 1031--1039

  27. [27]

    & McDonald, D

    Homrighausen, D. & McDonald, D. J. (2014). Leave-one-out cross-validation is risk consistent for lasso. Machine learning. 97, 65--78

  28. [28]

    & McDonald, D

    Homrighausen, D. & McDonald, D. J. (2017). RISK CONSISTENCY OF CROSS-VALIDATION WITH LASSO-TYPE PROCEDURES. Statistica Sinica. 27, 1017--1036

  29. [29]

    , Ying, Z

    Jin, Z. , Ying, Z. & Wei, L. J. (2001). A simple resampling method by perturbing the minimand. Biometrika. 88(2), 381--390

  30. [30]

    Kato, K. (2009). Asymptotics for argmin processes: Convexity arguments. Journal of Multivariate Analysis. 100(8) , 1816--1829

  31. [31]

    & Pollard, D

    Kim, J. & Pollard, D. (1990). Cube root asymptotics. The Annals of Statistics. 18(1), 191--219

  32. [32]

    Knight, K. & Fu, W. (2000). Asymptotics for Lasso-Type estimators. The Annals of Statistics. 28(5), 1356--1378

  33. [33]

    Lahiri, S. N. (2021). NECESSARY AND SUFFICIENT CONDITIONS FOR VARIABLE SELECTION CONSISTENCY OF THE LASSO IN HIGH DIMENSIONS. The Annals of Statistics. 49(2) , 820--844

  34. [34]

    & Mitchell, C

    Lecu \'e , G. & Mitchell, C. (2012). Oracle inequalities for cross-validation type procedures. Electronic Journal of Statistics. 6 , 1803--1837

  35. [35]

    Newey, W. K. (1991). Uniform convergence in probability and stochastic equicontinuity. Econometrica: Journal of the Econometric Society. 59(4) , 1161--1167

  36. [36]

    Ng, T. L. & Newton, M. A. (2022). Random weighting in Lasso regression . Electronic Journal of Statistics. 16(1), 3430--3481

  37. [37]

    Pollard, D. (1989). Asymptotics via empirical processes. Statistical science, 341--354

  38. [38]

    Pollard, D. (1990). Empirical processes: theory and applications. vol. 2 CBMS Conference Series in Probability and Statistics, Vol. 2. Hayward, CA: Institute of Mathematical Statistics

  39. [39]

    Rockafellar, R. T. (1997). Convex analysis. vol. 11 Princeton university press

  40. [40]

    Rockafellar, R. T. & Wets, R. J. B. (2009). Variational analysis. Springer Science & Business Media. vol. 317

  41. [41]

    Rudin, W. (1976). Principles of Mathematical Analysis. McGraw-Hill Science

  42. [42]

    Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological). 58(1), 267--288

  43. [43]

    & Taylor, J

    Tibshirani, R. & Taylor, J. (2011). THE SOLUTION PATH OF THE GENERALIZED LASSO. Annals of Statistics. 39(3), 1335--1371

  44. [44]

    & Lederer, J

    Van De Geer, S. & Lederer, J. (2013). The Lasso, correlated design, and improved oracle inequalities. From Probability to Statistics and Back: High-Dimensional Models and Processes--A Festschrift in Honor of Jon A. Wellner. 9, 303--317

  45. [45]

    Van Der Vaart, A. W. & Wellner, J. A. (1996). Weak convergence and empirical processes: with applications to statistics. Springer

  46. [46]

    & Dette, H

    Wagener, J. & Dette, H. (2012). Bridge estimators and the adaptive Lasso under heteroscedasticity. Mathematical Methods of Statistics. 21, 109--126

  47. [47]

    Zhao, P. & Yu, B. (2006). On model selection consistency of Lasso. Journal of Machine learning research. 7, 2541--2563

  48. [48]

    degrees of freedom

    Zou, H. , Hastie, T. & Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Ann. Statist.. 35(1), 2173--2192