Lasso-Ridge Refitting: A Two-Stage Estimator for High-Dimensional Linear Regression

Guo Liu (Waseda University)

arxiv: 2512.10632 · v1 · submitted 2025-12-11 · 📊 stat.ME

Lasso-Ridge Refitting: A Two-Stage Estimator for High-Dimensional Linear Regression

Guo Liu (Waseda University) This is my paper

Pith reviewed 2026-05-16 23:01 UTC · model grok-4.3

classification 📊 stat.ME

keywords LassoRidge regressionrefittinghigh-dimensional regressionvariable selectionprediction errorbias correction

0 comments

The pith

Lasso followed by Ridge refitting on the selected variables lowers prediction error while preserving selection consistency in high-dimensional linear models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a two-stage estimator that first runs Lasso to identify a support and then applies Ridge regression to the selected variables. This refitting step reduces the estimation bias that plain Lasso typically carries, yielding lower prediction error across a range of tuning choices, including the theoretical optimal rate for Lasso. The method keeps the Lasso's prediction consistency and reliable variable selection under the same mild conditions the Lasso already requires. Simulations show concrete gains in both out-of-sample prediction and coefficient estimation accuracy over standard Lasso.

Core claim

The Lasso-Ridge refitting estimator, formed by applying Ridge regression to the variables chosen by Lasso, achieves strictly better prediction performance than the Lasso estimator itself while retaining prediction consistency and variable-selection reliability under mild conditions.

What carries the argument

Two-stage procedure: Lasso variable selection followed by ordinary Ridge regression restricted to the Lasso-selected support.

If this is right

Lower out-of-sample prediction error holds even when Lasso is tuned at its theoretical optimal rate sqrt(log p / n).
Variable selection consistency of the Lasso is inherited by the refitted estimator.
Estimation accuracy for the nonzero coefficients improves relative to plain Lasso.
The procedure remains computationally comparable to a single Lasso fit plus a small Ridge solve.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same refitting idea could be tested with other initial selectors such as forward stepwise or elastic net.
In settings with moderate predictor correlation the bias reduction from Ridge may be larger than in orthogonal designs.
Practitioners could use cross-validation on the second-stage Ridge penalty without needing to retune the first-stage Lasso.

Load-bearing premise

The Lasso must select a support that is close enough to the true support for the subsequent Ridge step to reduce bias without reintroducing large errors.

What would settle it

A simulation or real dataset in which the two-stage estimator records higher mean-squared prediction error than the plain Lasso tuned at the same rate.

read the original abstract

The least absolute shrinkage and selection operator (Lasso) is a popular method for high-dimensional statistics. However, it is known that the Lasso often has estimation bias and prediction error. To address such disadvantages, many alternatives and refitting strategies have been proposed and studied. This work introduces a novel Lasso--Ridge method. Our analysis indicates that the proposed estimator achieves improved prediction performance in a range of settings, including cases where the Lasso is tuned at its theoretical optimal rate \(\sqrt{\log(p)/n}\). Moreover, the proposed method retains several key advantages of the Lasso, such as prediction consistency and reliable variable selection under mild conditions. Through extensive simulations, we further demonstrate that our estimator outperforms the Lasso in both prediction and estimation accuracy, highlighting its potential as a powerful tool for high-dimensional linear regression.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Lasso-Ridge refitting is a clean practical tweak with decent simulations, but the claim of improved performance under mild conditions at the minimal Lasso rate rests on assumptions that standard theory does not treat as mild.

read the letter

The paper puts forward a two-stage procedure: run Lasso at the usual rate for selection, then refit the selected variables with Ridge. The simulations are the strongest part. They report lower prediction error and better coefficient estimates than plain Lasso across several high-dimensional regimes, including cases where p greatly exceeds n. That kind of empirical check is useful for applied people who already trust Lasso for screening but want less shrinkage bias afterward.

Referee Report

2 major / 2 minor

Summary. The paper proposes a two-stage Lasso-Ridge refitting estimator for high-dimensional linear regression: the Lasso is first applied for variable selection, after which Ridge regression is performed on the selected variables to produce the final coefficient estimates. The central claims are that the procedure yields improved prediction performance relative to the Lasso (including when the Lasso penalty is set at its theoretical rate λ ∼ √(log p / n)), while preserving prediction consistency and reliable variable selection under mild conditions; these properties are supported by theoretical analysis and extensive simulations demonstrating gains in both prediction and estimation accuracy.

Significance. If the theoretical claims hold under conditions that are genuinely milder than those required for standard post-Lasso estimators, the method would provide a simple, computationally attractive way to reduce Lasso bias while retaining its selection advantages, which could be useful in applied high-dimensional regression settings.

major comments (2)

[Abstract and theoretical results] Abstract and main theoretical results section: the claim that the estimator 'retains ... reliable variable selection under mild conditions' even at the Lasso's theoretical optimal rate √(log(p)/n) is load-bearing for the paper's contribution. Standard Lasso theory requires either the irrepresentable condition on the Gram matrix or a beta-min lower bound that scales with λ; the manuscript must explicitly state the precise assumptions used, show how they differ from (or relax) these standard requirements, and indicate whether support recovery is proved at the minimal rate or only under stronger conditions.
[Theoretical results] Theoretical results section (likely §3): if the analysis invokes the same irrepresentable or beta-min conditions as the Lasso without relaxation, then the 'improved prediction' guarantee reduces to a standard post-selection Ridge step whose advantage is primarily empirical; the paper should clarify whether any new theoretical relaxation is obtained or whether the improvement is demonstrated only via simulations.

minor comments (2)

[Simulations] Simulation section: the reported designs should be supplemented with a table or figure showing performance under varying levels of feature correlation to confirm robustness beyond the presented cases.
[Notation] Notation: ensure the Lasso tuning parameter is denoted consistently (e.g., λ) throughout the theoretical statements and simulation descriptions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address the major points below and will revise the manuscript to explicitly detail the assumptions and clarify the scope of the theoretical guarantees.

read point-by-point responses

Referee: [Abstract and theoretical results] Abstract and main theoretical results section: the claim that the estimator 'retains ... reliable variable selection under mild conditions' even at the Lasso's theoretical optimal rate √(log(p)/n) is load-bearing for the paper's contribution. Standard Lasso theory requires either the irrepresentable condition on the Gram matrix or a beta-min lower bound that scales with λ; the manuscript must explicitly state the precise assumptions used, show how they differ from (or relax) these standard requirements, and indicate whether support recovery is proved at the minimal rate or only under stronger conditions.

Authors: We agree that the assumptions require more explicit statement. In the revised manuscript we will insert a dedicated paragraph in Section 3 that lists the precise conditions: the irrepresentable condition on the Gram matrix together with a beta-min condition of order λ. These are the same conditions used for Lasso support recovery at rate √(log p / n). We do not relax the irrepresentable condition; support recovery is established at the minimal rate under exactly these standard assumptions. The refitting step yields a strictly smaller prediction-error bound than Lasso while preserving the same selection consistency. revision: yes
Referee: [Theoretical results] Theoretical results section (likely §3): if the analysis invokes the same irrepresentable or beta-min conditions as the Lasso without relaxation, then the 'improved prediction' guarantee reduces to a standard post-selection Ridge step whose advantage is primarily empirical; the paper should clarify whether any new theoretical relaxation is obtained or whether the improvement is demonstrated only via simulations.

Authors: The analysis uses the standard irrepresentable condition without relaxation. We will revise the text to state this clearly and to emphasize that the theoretical contribution lies in the improved finite-sample prediction bound obtained after Ridge refitting (via a direct comparison of the bias-variance trade-off), not in a weaker selection condition. The prediction improvement is therefore both theoretical and empirical; the simulations illustrate the magnitude of the gain under the same conditions. revision: yes

Circularity Check

0 steps flagged

No circularity detected; claims rest on external simulations and standard Lasso theory

full rationale

The abstract and description contain no equations, derivations, or explicit proof steps. Claims of improved prediction at the Lasso's optimal rate and retention of consistency under mild conditions are presented as results of analysis and simulations, without any self-definitional reduction, fitted-input-as-prediction, or load-bearing self-citation that collapses the central estimator back to its inputs by construction. The proposal is therefore self-contained against external benchmarks such as standard Lasso support-recovery conditions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; all details are deferred to the full manuscript.

pith-pipeline@v0.9.0 · 5433 in / 950 out tokens · 46260 ms · 2026-05-16T23:01:53.437504+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

1 extracted references · 1 canonical work pages

[1]

Belloni and V

A. Belloni and V. Chernozhukov. Least squares after model selection in high-dimensional sparse models.Bernoulli, 19(2):521–547, 2013. P. J. Bickel, Y. Ritov, and A. B. Tsybakov. Simultaneous analysis of lasso and dantzig selector.The Annals of Statistics, 37(4):1705–1732, 2009. E. Chzhen, M. Hebiri, and J. Salmon. On lasso refitting strategies.Bernoulli, ...

work page 2013

[1] [1]

Belloni and V

A. Belloni and V. Chernozhukov. Least squares after model selection in high-dimensional sparse models.Bernoulli, 19(2):521–547, 2013. P. J. Bickel, Y. Ritov, and A. B. Tsybakov. Simultaneous analysis of lasso and dantzig selector.The Annals of Statistics, 37(4):1705–1732, 2009. E. Chzhen, M. Hebiri, and J. Salmon. On lasso refitting strategies.Bernoulli, ...

work page 2013