Lasso-Ridge Refitting: A Two-Stage Estimator for High-Dimensional Linear Regression
Pith reviewed 2026-05-16 23:01 UTC · model grok-4.3
The pith
Lasso followed by Ridge refitting on the selected variables lowers prediction error while preserving selection consistency in high-dimensional linear models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Lasso-Ridge refitting estimator, formed by applying Ridge regression to the variables chosen by Lasso, achieves strictly better prediction performance than the Lasso estimator itself while retaining prediction consistency and variable-selection reliability under mild conditions.
What carries the argument
Two-stage procedure: Lasso variable selection followed by ordinary Ridge regression restricted to the Lasso-selected support.
If this is right
- Lower out-of-sample prediction error holds even when Lasso is tuned at its theoretical optimal rate sqrt(log p / n).
- Variable selection consistency of the Lasso is inherited by the refitted estimator.
- Estimation accuracy for the nonzero coefficients improves relative to plain Lasso.
- The procedure remains computationally comparable to a single Lasso fit plus a small Ridge solve.
Where Pith is reading between the lines
- The same refitting idea could be tested with other initial selectors such as forward stepwise or elastic net.
- In settings with moderate predictor correlation the bias reduction from Ridge may be larger than in orthogonal designs.
- Practitioners could use cross-validation on the second-stage Ridge penalty without needing to retune the first-stage Lasso.
Load-bearing premise
The Lasso must select a support that is close enough to the true support for the subsequent Ridge step to reduce bias without reintroducing large errors.
What would settle it
A simulation or real dataset in which the two-stage estimator records higher mean-squared prediction error than the plain Lasso tuned at the same rate.
read the original abstract
The least absolute shrinkage and selection operator (Lasso) is a popular method for high-dimensional statistics. However, it is known that the Lasso often has estimation bias and prediction error. To address such disadvantages, many alternatives and refitting strategies have been proposed and studied. This work introduces a novel Lasso--Ridge method. Our analysis indicates that the proposed estimator achieves improved prediction performance in a range of settings, including cases where the Lasso is tuned at its theoretical optimal rate \(\sqrt{\log(p)/n}\). Moreover, the proposed method retains several key advantages of the Lasso, such as prediction consistency and reliable variable selection under mild conditions. Through extensive simulations, we further demonstrate that our estimator outperforms the Lasso in both prediction and estimation accuracy, highlighting its potential as a powerful tool for high-dimensional linear regression.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a two-stage Lasso-Ridge refitting estimator for high-dimensional linear regression: the Lasso is first applied for variable selection, after which Ridge regression is performed on the selected variables to produce the final coefficient estimates. The central claims are that the procedure yields improved prediction performance relative to the Lasso (including when the Lasso penalty is set at its theoretical rate λ ∼ √(log p / n)), while preserving prediction consistency and reliable variable selection under mild conditions; these properties are supported by theoretical analysis and extensive simulations demonstrating gains in both prediction and estimation accuracy.
Significance. If the theoretical claims hold under conditions that are genuinely milder than those required for standard post-Lasso estimators, the method would provide a simple, computationally attractive way to reduce Lasso bias while retaining its selection advantages, which could be useful in applied high-dimensional regression settings.
major comments (2)
- [Abstract and theoretical results] Abstract and main theoretical results section: the claim that the estimator 'retains ... reliable variable selection under mild conditions' even at the Lasso's theoretical optimal rate √(log(p)/n) is load-bearing for the paper's contribution. Standard Lasso theory requires either the irrepresentable condition on the Gram matrix or a beta-min lower bound that scales with λ; the manuscript must explicitly state the precise assumptions used, show how they differ from (or relax) these standard requirements, and indicate whether support recovery is proved at the minimal rate or only under stronger conditions.
- [Theoretical results] Theoretical results section (likely §3): if the analysis invokes the same irrepresentable or beta-min conditions as the Lasso without relaxation, then the 'improved prediction' guarantee reduces to a standard post-selection Ridge step whose advantage is primarily empirical; the paper should clarify whether any new theoretical relaxation is obtained or whether the improvement is demonstrated only via simulations.
minor comments (2)
- [Simulations] Simulation section: the reported designs should be supplemented with a table or figure showing performance under varying levels of feature correlation to confirm robustness beyond the presented cases.
- [Notation] Notation: ensure the Lasso tuning parameter is denoted consistently (e.g., λ) throughout the theoretical statements and simulation descriptions.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. We address the major points below and will revise the manuscript to explicitly detail the assumptions and clarify the scope of the theoretical guarantees.
read point-by-point responses
-
Referee: [Abstract and theoretical results] Abstract and main theoretical results section: the claim that the estimator 'retains ... reliable variable selection under mild conditions' even at the Lasso's theoretical optimal rate √(log(p)/n) is load-bearing for the paper's contribution. Standard Lasso theory requires either the irrepresentable condition on the Gram matrix or a beta-min lower bound that scales with λ; the manuscript must explicitly state the precise assumptions used, show how they differ from (or relax) these standard requirements, and indicate whether support recovery is proved at the minimal rate or only under stronger conditions.
Authors: We agree that the assumptions require more explicit statement. In the revised manuscript we will insert a dedicated paragraph in Section 3 that lists the precise conditions: the irrepresentable condition on the Gram matrix together with a beta-min condition of order λ. These are the same conditions used for Lasso support recovery at rate √(log p / n). We do not relax the irrepresentable condition; support recovery is established at the minimal rate under exactly these standard assumptions. The refitting step yields a strictly smaller prediction-error bound than Lasso while preserving the same selection consistency. revision: yes
-
Referee: [Theoretical results] Theoretical results section (likely §3): if the analysis invokes the same irrepresentable or beta-min conditions as the Lasso without relaxation, then the 'improved prediction' guarantee reduces to a standard post-selection Ridge step whose advantage is primarily empirical; the paper should clarify whether any new theoretical relaxation is obtained or whether the improvement is demonstrated only via simulations.
Authors: The analysis uses the standard irrepresentable condition without relaxation. We will revise the text to state this clearly and to emphasize that the theoretical contribution lies in the improved finite-sample prediction bound obtained after Ridge refitting (via a direct comparison of the bias-variance trade-off), not in a weaker selection condition. The prediction improvement is therefore both theoretical and empirical; the simulations illustrate the magnitude of the gain under the same conditions. revision: yes
Circularity Check
No circularity detected; claims rest on external simulations and standard Lasso theory
full rationale
The abstract and description contain no equations, derivations, or explicit proof steps. Claims of improved prediction at the Lasso's optimal rate and retention of consistency under mild conditions are presented as results of analysis and simulations, without any self-definitional reduction, fitted-input-as-prediction, or load-bearing self-citation that collapses the central estimator back to its inputs by construction. The proposal is therefore self-contained against external benchmarks such as standard Lasso support-recovery conditions.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
A. Belloni and V. Chernozhukov. Least squares after model selection in high-dimensional sparse models.Bernoulli, 19(2):521–547, 2013. P. J. Bickel, Y. Ritov, and A. B. Tsybakov. Simultaneous analysis of lasso and dantzig selector.The Annals of Statistics, 37(4):1705–1732, 2009. E. Chzhen, M. Hebiri, and J. Salmon. On lasso refitting strategies.Bernoulli, ...
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.