2D Stability Selection: Design Jittering for Doubly Stable Feature Selection
Pith reviewed 2026-05-08 19:19 UTC · model grok-4.3
The pith
Doubly stable feature selection finds predictors whose inclusion holds up under both sampling variability and added design noise by jittering the matrix and aggregating selections.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We study feature selection in high-dimensional regression under two distinct sources of instability: sampling variability and measurement error in the design matrix. Stability Selection addresses the former through sub-sampling and aggregation, but does not explicitly stress-test robustness to noisy predictors. We introduce doubly stable feature selection, a perturb-and-aggregate framework that targets features whose inclusion is stable both across randomization and across increasing levels of design noise. The method injects controlled additive noise into the design matrix, fits a fixed base selector such as the Lasso on the perturbed data, and aggregates selection frequencies. Sweeping a 2
What carries the argument
Doubly stable feature selection, which uses design jittering to inject additive noise at multiple levels into the design matrix, applies a base selector, and aggregates frequencies to produce a stability path across noise strengths.
Load-bearing premise
The added perturbations remain small enough that classical model-selection conditions such as the irrepresentable condition or restricted eigenvalue condition continue to hold.
What would settle it
A dataset with known measurement-error variances where features selected by the method become unstable or change when actual noise at the swept levels is introduced, violating the small-perturbation preservation of selection conditions.
read the original abstract
We study feature selection in high-dimensional regression under two distinct sources of instability: sampling variability and measurement error in the design matrix. Stability Selection addresses the former through sub-sampling and aggregation, but does not explicitly stress-test robustness to noisy predictors. We introduce doubly stable feature selection, a perturb-and-aggregate framework that targets features whose inclusion is stable both across randomization and across increasing levels of design noise. The method injects controlled additive noise into the design matrix, fits a fixed base selector such as the Lasso on the perturbed data, and aggregates selection frequencies. Sweeping over a grid of noise levels yields a stability path that summarizes robustness to measurement error while using the full sample size and isolating the effect of design perturbations. On the theory side, we show that classical model-selection conditions are preserved under sufficiently small perturbations, with a high-probability extension for Gaussian noise. Empirically, experiments on synthetic and real datasets show improved robustness compared with Stability Selection and standard base selectors.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes doubly stable feature selection, a perturb-and-aggregate procedure that extends stability selection by injecting additive noise into the design matrix at multiple levels, fitting a base selector (e.g., Lasso) on each perturbed copy, and aggregating selection frequencies across both subsamples and noise levels. The resulting stability path is intended to identify features robust to both sampling variability and measurement error. Theoretically, the authors claim that standard model-selection conditions (irrepresentable condition, restricted eigenvalue) are preserved under sufficiently small perturbations, with a high-probability extension when the noise is Gaussian. Empirical results on synthetic and real data are reported to show gains in robustness relative to ordinary stability selection.
Significance. If the central claims hold, the work supplies a practical, computationally straightforward way to stress-test feature stability against design-matrix noise while retaining the full sample size. The explicit preservation result for small perturbations is a clear technical contribution that could be useful for theoretical analysis of other jittering schemes. The empirical demonstration of improved robustness is potentially valuable for applications with noisy covariates, though its scope depends on how well the additive-noise model matches real measurement processes.
major comments (2)
- [Abstract / theoretical results] Abstract and theoretical results section: the preservation of the irrepresentable condition and restricted-eigenvalue bounds is shown only for sufficiently small perturbations, yet the doubly stable procedure explicitly constructs a stability path by sweeping a grid of increasing noise levels. No analysis or bounds are supplied for the regime in which these conditions begin to fail, so the theoretical justification for aggregating selection frequencies at larger noise amplitudes is missing; it is therefore unclear whether observed stability at high noise levels reflects genuine robustness or simply the breakdown of the base selector.
- [Methods] Methods and aggregation step: the procedure isolates the effect of design perturbations while using the full sample, but the final selection rule (thresholding or ranking of the two-dimensional stability path) is defined procedurally without a derivation showing that the aggregated frequencies remain consistent estimators of the population selection probabilities once the perturbation size exceeds the small-noise regime.
minor comments (2)
- [Methods] The precise definition of the stability path (how the two dimensions—subsampling and noise level—are combined into a single summary statistic) should be stated formally with notation, rather than described only procedurally.
- [Experiments] Figure captions and experimental details should explicitly list the noise grid values, the number of perturbations per level, and the exact aggregation function used to produce the reported stability paths.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the revisions we will incorporate.
read point-by-point responses
-
Referee: [Abstract / theoretical results] Abstract and theoretical results section: the preservation of the irrepresentable condition and restricted-eigenvalue bounds is shown only for sufficiently small perturbations, yet the doubly stable procedure explicitly constructs a stability path by sweeping a grid of increasing noise levels. No analysis or bounds are supplied for the regime in which these conditions begin to fail, so the theoretical justification for aggregating selection frequencies at larger noise amplitudes is missing; it is therefore unclear whether observed stability at high noise levels reflects genuine robustness or simply the breakdown of the base selector.
Authors: We agree that the theoretical results establish preservation of the irrepresentable condition and restricted eigenvalue bounds only under sufficiently small perturbations. The stability path is constructed empirically by sweeping noise levels to observe how selection frequencies change, with the goal of identifying features that remain stable over a wider range. We acknowledge that the manuscript does not supply analysis or bounds once the conditions begin to fail, making it difficult to interpret high stability at large noise as genuine robustness versus selector breakdown. In the revision we will add a dedicated discussion paragraph clarifying the scope of the theory, stating that guarantees apply only in the small-noise regime, and interpreting the path as an empirical diagnostic tool whose behavior at larger amplitudes should be viewed with the appropriate caveat. revision: partial
-
Referee: [Methods] Methods and aggregation step: the procedure isolates the effect of design perturbations while using the full sample, but the final selection rule (thresholding or ranking of the two-dimensional stability path) is defined procedurally without a derivation showing that the aggregated frequencies remain consistent estimators of the population selection probabilities once the perturbation size exceeds the small-noise regime.
Authors: The two-dimensional stability path is formed by averaging selection indicators across subsamples and noise levels, extending the aggregation idea of stability selection. The manuscript presents this aggregation procedurally and supports it with empirical results rather than a formal consistency derivation for the large-perturbation regime. We recognize that showing the aggregated frequencies remain consistent estimators beyond the small-noise regime would require additional arguments not currently provided. In the revision we will insert a short remark in the methods section noting this limitation and indicating that establishing such consistency results constitutes an interesting direction for future work. revision: partial
- A complete theoretical derivation establishing consistency of the aggregated selection frequencies once perturbation sizes exceed the small-noise regime where classical conditions hold.
Circularity Check
No circularity: procedural definition and independent theory
full rationale
The paper introduces a perturb-and-aggregate procedure that injects additive noise into the design matrix at a grid of levels and aggregates selection frequencies from a base selector such as Lasso. The central theoretical claim is that classical conditions (irrepresentable, restricted eigenvalue) are preserved for sufficiently small perturbations, with a high-probability Gaussian extension; this is a standard perturbation analysis that does not reduce to any fitted parameter or self-citation chain within the paper. The stability path is generated directly from the aggregation step without any equation or definition that equates the output to an input quantity by construction. No self-definitional loops, fitted-input predictions, uniqueness theorems imported from the authors, or ansatz smuggling appear. The derivation remains self-contained against external benchmarks such as existing stability selection.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
Cost.FunctionalEquation / Foundation.AlphaCoordinateFixationNo parallel: RS perturbation theory concerns the J-cost functional equation and α-coordinate fixation, not Gram-matrix sub-block norms. unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Lemma 1 (Small perturbation preserves irrepresentability): ... ∥Σ_{S^cS}Σ_{SS}^{-1}∥_∞ ≤ 1 - η ... Σ^(δ) = Σ + ∆ satisfy ∥∆∥_∞ ≤ C_1 δ + C_2 δ^2
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
Statistics and Computing , volume =
Ibrahim Joudah and Samuel Muller and Houying Zhu , title =. Statistics and Computing , volume =
-
[3]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =
Wang, Xiangyu and Leng, Chenlei , title =. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =
-
[4]
M. On. International Statistical Review , volume=
-
[5]
Nogueira, Sarah and Sechidis, Konstantinos and Brown, Gavin , journal=. On the
-
[6]
Statistics and Computing , volume=
Bayesian stability selection and inference on selection probabilities , author=. Statistics and Computing , volume=. 2026 , publisher=
work page 2026
-
[7]
arXiv preprint arXiv:2504.19393 , year=
Ridge partial correlation screening for ultrahigh-dimensional data , author=. arXiv preprint arXiv:2504.19393 , year=
-
[8]
The Annals of Statistics , volume =
Meinshausen, Nicolai and B. The Annals of Statistics , volume =
-
[9]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Regularization and variable selection via the elastic net , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
-
[10]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Variable selection with error control: another look at stability selection , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2013 , publisher=
work page 2013
-
[11]
The Annals of Applied Statistics , volume=
Bogdan, Ma. The Annals of Applied Statistics , volume=
-
[12]
Tibshirani, Robert and Saunders, Michael and Rosset, Saharon and Zhu, Ji and Knight, Keith , journal=
-
[13]
Nouraie, Mahdi and Smith, Connor and Muller, Samuel , journal=
- [14]
-
[15]
Statistics for High-Dimensional Data: Methods, Theory and Applications , author=. 2011 , publisher=
work page 2011
-
[16]
Jian Huang and Shuangge Ma and Cun-Hui Zhang , journal =
-
[17]
Zou, Hui , journal=. The
-
[18]
BMC Medical Research Methodology , volume=
New adaptive lasso approaches for variable selection in automated pharmacovigilance signal detection , author=. BMC Medical Research Methodology , volume=. 2021 , publisher=
work page 2021
-
[19]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Model selection and estimation in regression with grouped variables , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
-
[20]
Park, Trevor and Casella, George , journal=. The
-
[21]
Lorbert, Alexander and Eis, David and Kostina, Victoria and M. Blei, David and J. Ramadge, Peter , booktitle =. Exploiting. 2010 , volume =
work page 2010
-
[22]
Proceedings of the 24th International Conference on Neural Information Processing Systems , pages =
Grave, \'. Proceedings of the 24th International Conference on Neural Information Processing Systems , pages =. 2011 , publisher =
work page 2011
-
[23]
Simon, Noah and Friedman, Jerome and Hastie, Trevor and Tibshirani, Robert , journal=. A
-
[24]
Proceedings of the AAAI Conference on Artificial Intelligence , author=
Uncorrelated. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2013 , pages=
work page 2013
- [25]
-
[26]
Craig, Erin and Pilanci, Mert and Menestrel, Thomas Le and Narasimhan, Balasubramanian and Rivas, Manuel A. and Gullaksen, Stein-Erik and Dehghannasiri, Roozbeh and Salzman, Julia and Taylor, Jonathan and Tibshirani, Robert , journal=. Pretraining and the
-
[27]
Knowledge and Information Systems , volume=
Stability of feature selection algorithms: a study on high-dimensional spaces , author=. Knowledge and Information Systems , volume=
-
[28]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Stability selection , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
-
[29]
Nouraie, Mahdi and Muller, Samuel , journal=. On the
-
[30]
On the S tability of F eature S election in the P resence of F eature C orrelations
Sechidis, Konstantinos and Papangelou, Konstantinos and Nogueira, Sarah and Weatherall, James and Brown, Gavin. On the S tability of F eature S election in the P resence of F eature C orrelations. Machine Learning and Knowledge Discovery in Databases. 2020
work page 2020
-
[31]
Faletto, Gregory and Bien, Jacob , journal=. Cluster
-
[32]
WIREs Computational Statistics , volume =
Alin, Aylin , title =. WIREs Computational Statistics , volume =
-
[33]
Farrar, Donald E. and Glauber, Robert R. , journal=. Multicollinearity in
-
[34]
Shen, Zheyan and Cui, Peng and Zhang, Tong and Kunag, Kun , year =. Stable
- [35]
-
[36]
A group bridge approach for variable selection , author=. Biometrika , volume=
-
[37]
Penalized methods for bi-level variable selection , author=. Statistics and
-
[38]
Statistics and Computing , volume=
Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors , author=. Statistics and Computing , volume=
-
[39]
Breheny, Patrick , journal=. The
- [40]
-
[41]
Chemometrics and Intelligent Laboratory Systems , volume=
Ordered homogeneity pursuit lasso for group variable selection with applications to spectroscopic data , author=. Chemometrics and Intelligent Laboratory Systems , volume=
- [42]
-
[43]
Zeng, Xiangrong and Figueiredo, Mário A. T. , journal=. Decreasing Weighted Sorted. 2014 , volume=
work page 2014
-
[44]
Collinear groupwise feature selection via discrete fusion group regression , author=. Pattern Recognition , volume=
-
[45]
Blockwise sparse regression , author=. Statistica Sinica , volume =
-
[46]
Proceedings of the 26th Annual International Conference on Machine Learning , pages =
Kowalski, Matthieu and Szafranski, Marie and Ralaivola, Liva , title =. Proceedings of the 26th Annual International Conference on Machine Learning , pages =. 2009 , publisher =
work page 2009
- [47]
-
[48]
The Annals of Statistics , number =
Peng Zhao and Guilherme Rocha and Bin Yu , title =. The Annals of Statistics , number =
-
[49]
Xu, Huan and Caramanis, Constantine and Mannor, Shie , journal=. Sparse. 2012 , volume=
work page 2012
-
[50]
Bickel and Ya’acov Ritov and Alexandre B
Peter J. Bickel and Ya’acov Ritov and Alexandre B. Tsybakov , title =. The Annals of Statistics , number =
-
[51]
Peter B\"uhlmann and Sara van de Geer , title =
-
[52]
Zhou, Yang and Jin, Rong and Hoi, Steven Chu–Hong , booktitle =. 2010 , volume =
work page 2010
-
[53]
Frederick Campbell and Genevera I. Allen , title =. Electronic Journal of Statistics , number =
-
[54]
Journal of Computational and Graphical Statistics , volume =
Ray-Bing Chen and Chi-Hsiang Chu and Shinsheng Yuan and Ying Nian Wu , title =. Journal of Computational and Graphical Statistics , volume =. 2016 , publisher =
work page 2016
-
[55]
Journal of Statistical Planning and Inference , volume =
Correlated variables in regression:. Journal of Statistical Planning and Inference , volume =. 2013 , author =
work page 2013
-
[56]
Electronic Journal of Statistics , number =
Yiyuan She , title =. Electronic Journal of Statistics , number =
-
[57]
The Annals of Statistics , number =
Jian Huang and Shuangge Ma and Hongzhe Li and Cun-Hui Zhang , title =. The Annals of Statistics , number =
- [58]
-
[59]
The Annals of Applied Statistics , number =
Jerome Friedman and Trevor Hastie and Holger H. The Annals of Applied Statistics , number =
-
[60]
Friedman, Jerome and Hastie, Trevor and Tibshirani, Rob , journal=. 2010 , publisher=
work page 2010
-
[61]
David L. Donoho and Iain M. Johnstone , title =. Journal of the American Statistical Association , volume =. 1995 , publisher =
work page 1995
- [62]
-
[63]
Jenatton, Rodolphe and Mairal, Julien and Obozinski. Guillaume and Bach. Francis , title =. Journal of Machine Learning Research , volume =
-
[64]
and Müller, Samuel and Carroll, Raymond J
Garcia, Tanya P. and Müller, Samuel and Carroll, Raymond J. and Walzem, Rosemary L. , title =. Bioinformatics , volume =
- [65]
-
[66]
Lan, Hong AND Chen, Meng AND Flowers, Jessica B AND Yandell, Brian S AND Stapleton, Donnie S AND Mata, Christine M AND Mui, Eric Ton-Keen AND Flowers, Matthew T AND Schueler, Kathryn L AND Manly, Kenneth F AND Williams, Robert W AND Kendziorski, Christina AND Attie, Alan D , journal =. 2006 , volume =
work page 2006
-
[67]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =
Song, Qifan and Liang, Faming , title =. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =
-
[68]
Christian Staerk and Maria Kateri and Ioannis Ntzoufras , title =. Bayesian Analysis , number =
-
[69]
Charles H. Redfern and Michael Y. Degtyarev and Andrew T. Kwa and Nathan Salomonis and Nathalie Cotte and Tania Nanevicz and Nick Fidelman and Kavin Desai and Karen Vranizan and Elena K. Lee and Peter Coward and Nila Shah and Janet A. Warrington and Glenn I. Fishman and Daniel Bernstein and Anthony J. Baker and Bruce R. Conklin , title =. Proceedings of t...
-
[70]
The Annals of Statistics , number =
Bradley Efron and Trevor Hastie and Iain Johnstone and Robert Tibshirani , title =. The Annals of Statistics , number =
-
[71]
Segal, Mark R. and Dahlquist, Kam D. and Conklin, Bruce R. , title =. Journal of Computational Biology , volume =
-
[72]
Journal of Computational and Graphical Statistics , volume =
Peter Hall and Hugh Miller , title =. Journal of Computational and Graphical Statistics , volume =. 2009 , publisher =
work page 2009
-
[73]
Journal of the American Statistical Association , volume =
Runze Li, Wei Zhong and Liping Zhu , title =. Journal of the American Statistical Association , volume =. 2012 , publisher =
work page 2012
-
[74]
Efficient test-based variable selection for high-dimensional linear models , journal =. 2018 , author =
work page 2018
-
[75]
Soloff and Rina Foygel Barber and Rebecca Willett , title =
Jake A. Soloff and Rina Foygel Barber and Rebecca Willett , title =. Journal of Machine Learning Research , year =
-
[76]
Zhao, Peng and Yu, Bin , journal=
-
[77]
IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=
Sparse algorithms are not stable: A no-free-lunch theorem , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=
- [78]
-
[79]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =
Fan, Jianqing and Lv, Jinchi , title =. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =
-
[80]
Ridge regression: Biased estimation for nonorthogonal problems , author=. Technometrics , volume=. 1970 , publisher=
work page 1970
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.