pith. sign in

arxiv: 2605.02205 · v1 · submitted 2026-05-04 · 📊 stat.ME · stat.ML

2D Stability Selection: Design Jittering for Doubly Stable Feature Selection

Pith reviewed 2026-05-08 19:19 UTC · model grok-4.3

classification 📊 stat.ME stat.ML
keywords feature selectionstability selectionhigh-dimensional regressionmeasurement errorperturb-and-aggregateLassorobustness to noisedesign matrix
0
0 comments X

The pith

Doubly stable feature selection finds predictors whose inclusion holds up under both sampling variability and added design noise by jittering the matrix and aggregating selections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a perturb-and-aggregate procedure for high-dimensional regression that addresses instability from two sources: random sampling of observations and measurement error in the predictors. It adds controlled additive noise to the design matrix at increasing levels, applies a base selector such as the Lasso to each noisy version, and tracks selection frequencies to identify features stable across both dimensions. A stability path is produced by sweeping the noise grid, which isolates the effect of design perturbations while retaining the full sample. If correct, this yields feature selections more reliable in settings where predictors contain error, beyond what subsampling alone achieves, and the authors prove that standard selection conditions remain valid for small perturbations with high-probability extensions for Gaussian noise.

Core claim

We study feature selection in high-dimensional regression under two distinct sources of instability: sampling variability and measurement error in the design matrix. Stability Selection addresses the former through sub-sampling and aggregation, but does not explicitly stress-test robustness to noisy predictors. We introduce doubly stable feature selection, a perturb-and-aggregate framework that targets features whose inclusion is stable both across randomization and across increasing levels of design noise. The method injects controlled additive noise into the design matrix, fits a fixed base selector such as the Lasso on the perturbed data, and aggregates selection frequencies. Sweeping a 2

What carries the argument

Doubly stable feature selection, which uses design jittering to inject additive noise at multiple levels into the design matrix, applies a base selector, and aggregates frequencies to produce a stability path across noise strengths.

Load-bearing premise

The added perturbations remain small enough that classical model-selection conditions such as the irrepresentable condition or restricted eigenvalue condition continue to hold.

What would settle it

A dataset with known measurement-error variances where features selected by the method become unstable or change when actual noise at the swept levels is introduced, violating the small-perturbation preservation of selection conditions.

read the original abstract

We study feature selection in high-dimensional regression under two distinct sources of instability: sampling variability and measurement error in the design matrix. Stability Selection addresses the former through sub-sampling and aggregation, but does not explicitly stress-test robustness to noisy predictors. We introduce doubly stable feature selection, a perturb-and-aggregate framework that targets features whose inclusion is stable both across randomization and across increasing levels of design noise. The method injects controlled additive noise into the design matrix, fits a fixed base selector such as the Lasso on the perturbed data, and aggregates selection frequencies. Sweeping over a grid of noise levels yields a stability path that summarizes robustness to measurement error while using the full sample size and isolating the effect of design perturbations. On the theory side, we show that classical model-selection conditions are preserved under sufficiently small perturbations, with a high-probability extension for Gaussian noise. Empirically, experiments on synthetic and real datasets show improved robustness compared with Stability Selection and standard base selectors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes doubly stable feature selection, a perturb-and-aggregate procedure that extends stability selection by injecting additive noise into the design matrix at multiple levels, fitting a base selector (e.g., Lasso) on each perturbed copy, and aggregating selection frequencies across both subsamples and noise levels. The resulting stability path is intended to identify features robust to both sampling variability and measurement error. Theoretically, the authors claim that standard model-selection conditions (irrepresentable condition, restricted eigenvalue) are preserved under sufficiently small perturbations, with a high-probability extension when the noise is Gaussian. Empirical results on synthetic and real data are reported to show gains in robustness relative to ordinary stability selection.

Significance. If the central claims hold, the work supplies a practical, computationally straightforward way to stress-test feature stability against design-matrix noise while retaining the full sample size. The explicit preservation result for small perturbations is a clear technical contribution that could be useful for theoretical analysis of other jittering schemes. The empirical demonstration of improved robustness is potentially valuable for applications with noisy covariates, though its scope depends on how well the additive-noise model matches real measurement processes.

major comments (2)
  1. [Abstract / theoretical results] Abstract and theoretical results section: the preservation of the irrepresentable condition and restricted-eigenvalue bounds is shown only for sufficiently small perturbations, yet the doubly stable procedure explicitly constructs a stability path by sweeping a grid of increasing noise levels. No analysis or bounds are supplied for the regime in which these conditions begin to fail, so the theoretical justification for aggregating selection frequencies at larger noise amplitudes is missing; it is therefore unclear whether observed stability at high noise levels reflects genuine robustness or simply the breakdown of the base selector.
  2. [Methods] Methods and aggregation step: the procedure isolates the effect of design perturbations while using the full sample, but the final selection rule (thresholding or ranking of the two-dimensional stability path) is defined procedurally without a derivation showing that the aggregated frequencies remain consistent estimators of the population selection probabilities once the perturbation size exceeds the small-noise regime.
minor comments (2)
  1. [Methods] The precise definition of the stability path (how the two dimensions—subsampling and noise level—are combined into a single summary statistic) should be stated formally with notation, rather than described only procedurally.
  2. [Experiments] Figure captions and experimental details should explicitly list the noise grid values, the number of perturbations per level, and the exact aggregation function used to produce the reported stability paths.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the revisions we will incorporate.

read point-by-point responses
  1. Referee: [Abstract / theoretical results] Abstract and theoretical results section: the preservation of the irrepresentable condition and restricted-eigenvalue bounds is shown only for sufficiently small perturbations, yet the doubly stable procedure explicitly constructs a stability path by sweeping a grid of increasing noise levels. No analysis or bounds are supplied for the regime in which these conditions begin to fail, so the theoretical justification for aggregating selection frequencies at larger noise amplitudes is missing; it is therefore unclear whether observed stability at high noise levels reflects genuine robustness or simply the breakdown of the base selector.

    Authors: We agree that the theoretical results establish preservation of the irrepresentable condition and restricted eigenvalue bounds only under sufficiently small perturbations. The stability path is constructed empirically by sweeping noise levels to observe how selection frequencies change, with the goal of identifying features that remain stable over a wider range. We acknowledge that the manuscript does not supply analysis or bounds once the conditions begin to fail, making it difficult to interpret high stability at large noise as genuine robustness versus selector breakdown. In the revision we will add a dedicated discussion paragraph clarifying the scope of the theory, stating that guarantees apply only in the small-noise regime, and interpreting the path as an empirical diagnostic tool whose behavior at larger amplitudes should be viewed with the appropriate caveat. revision: partial

  2. Referee: [Methods] Methods and aggregation step: the procedure isolates the effect of design perturbations while using the full sample, but the final selection rule (thresholding or ranking of the two-dimensional stability path) is defined procedurally without a derivation showing that the aggregated frequencies remain consistent estimators of the population selection probabilities once the perturbation size exceeds the small-noise regime.

    Authors: The two-dimensional stability path is formed by averaging selection indicators across subsamples and noise levels, extending the aggregation idea of stability selection. The manuscript presents this aggregation procedurally and supports it with empirical results rather than a formal consistency derivation for the large-perturbation regime. We recognize that showing the aggregated frequencies remain consistent estimators beyond the small-noise regime would require additional arguments not currently provided. In the revision we will insert a short remark in the methods section noting this limitation and indicating that establishing such consistency results constitutes an interesting direction for future work. revision: partial

standing simulated objections not resolved
  • A complete theoretical derivation establishing consistency of the aggregated selection frequencies once perturbation sizes exceed the small-noise regime where classical conditions hold.

Circularity Check

0 steps flagged

No circularity: procedural definition and independent theory

full rationale

The paper introduces a perturb-and-aggregate procedure that injects additive noise into the design matrix at a grid of levels and aggregates selection frequencies from a base selector such as Lasso. The central theoretical claim is that classical conditions (irrepresentable, restricted eigenvalue) are preserved for sufficiently small perturbations, with a high-probability Gaussian extension; this is a standard perturbation analysis that does not reduce to any fitted parameter or self-citation chain within the paper. The stability path is generated directly from the aggregation step without any equation or definition that equates the output to an input quantity by construction. No self-definitional loops, fitted-input predictions, uniqueness theorems imported from the authors, or ansatz smuggling appear. The derivation remains self-contained against external benchmarks such as existing stability selection.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated beyond the standard Lasso assumptions and the claim that small additive perturbations preserve selection consistency.

pith-pipeline@v0.9.0 · 5471 in / 1156 out tokens · 49616 ms · 2026-05-08T19:19:24.856657+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

153 extracted references · 153 canonical work pages

  1. [1]

    Regression

    Tibshirani, Robert , journal=. Regression

  2. [2]

    Statistics and Computing , volume =

    Ibrahim Joudah and Samuel Muller and Houying Zhu , title =. Statistics and Computing , volume =

  3. [3]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =

    Wang, Xiangyu and Leng, Chenlei , title =. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =

  4. [4]

    M. On. International Statistical Review , volume=

  5. [5]

    Nogueira, Sarah and Sechidis, Konstantinos and Brown, Gavin , journal=. On the

  6. [6]

    Statistics and Computing , volume=

    Bayesian stability selection and inference on selection probabilities , author=. Statistics and Computing , volume=. 2026 , publisher=

  7. [7]

    arXiv preprint arXiv:2504.19393 , year=

    Ridge partial correlation screening for ultrahigh-dimensional data , author=. arXiv preprint arXiv:2504.19393 , year=

  8. [8]

    The Annals of Statistics , volume =

    Meinshausen, Nicolai and B. The Annals of Statistics , volume =

  9. [9]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Regularization and variable selection via the elastic net , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

  10. [10]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Variable selection with error control: another look at stability selection , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2013 , publisher=

  11. [11]

    The Annals of Applied Statistics , volume=

    Bogdan, Ma. The Annals of Applied Statistics , volume=

  12. [12]

    Tibshirani, Robert and Saunders, Michael and Rosset, Saharon and Zhu, Ji and Knight, Keith , journal=

  13. [13]

    Nouraie, Mahdi and Smith, Connor and Muller, Samuel , journal=

  14. [14]

    1994 , author =

    Linear Algebra and its Applications , volume =. 1994 , author =

  15. [15]

    2011 , publisher=

    Statistics for High-Dimensional Data: Methods, Theory and Applications , author=. 2011 , publisher=

  16. [16]

    Jian Huang and Shuangge Ma and Cun-Hui Zhang , journal =

  17. [17]

    Zou, Hui , journal=. The

  18. [18]

    BMC Medical Research Methodology , volume=

    New adaptive lasso approaches for variable selection in automated pharmacovigilance signal detection , author=. BMC Medical Research Methodology , volume=. 2021 , publisher=

  19. [19]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Model selection and estimation in regression with grouped variables , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

  20. [20]

    Park, Trevor and Casella, George , journal=. The

  21. [21]

    Blei, David and J

    Lorbert, Alexander and Eis, David and Kostina, Victoria and M. Blei, David and J. Ramadge, Peter , booktitle =. Exploiting. 2010 , volume =

  22. [22]

    Proceedings of the 24th International Conference on Neural Information Processing Systems , pages =

    Grave, \'. Proceedings of the 24th International Conference on Neural Information Processing Systems , pages =. 2011 , publisher =

  23. [23]

    Simon, Noah and Friedman, Jerome and Hastie, Trevor and Tibshirani, Robert , journal=. A

  24. [24]

    Proceedings of the AAAI Conference on Artificial Intelligence , author=

    Uncorrelated. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2013 , pages=

  25. [25]

    Exclusive

    Kong, Deguang and Fujimaki, Ryohei and Liu, Ji and Nie, Feiping and Ding, Chris , booktitle =. Exclusive

  26. [26]

    and Gullaksen, Stein-Erik and Dehghannasiri, Roozbeh and Salzman, Julia and Taylor, Jonathan and Tibshirani, Robert , journal=

    Craig, Erin and Pilanci, Mert and Menestrel, Thomas Le and Narasimhan, Balasubramanian and Rivas, Manuel A. and Gullaksen, Stein-Erik and Dehghannasiri, Roozbeh and Salzman, Julia and Taylor, Jonathan and Tibshirani, Robert , journal=. Pretraining and the

  27. [27]

    Knowledge and Information Systems , volume=

    Stability of feature selection algorithms: a study on high-dimensional spaces , author=. Knowledge and Information Systems , volume=

  28. [28]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Stability selection , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

  29. [29]

    Nouraie, Mahdi and Muller, Samuel , journal=. On the

  30. [30]

    On the S tability of F eature S election in the P resence of F eature C orrelations

    Sechidis, Konstantinos and Papangelou, Konstantinos and Nogueira, Sarah and Weatherall, James and Brown, Gavin. On the S tability of F eature S election in the P resence of F eature C orrelations. Machine Learning and Knowledge Discovery in Databases. 2020

  31. [31]

    Faletto, Gregory and Bien, Jacob , journal=. Cluster

  32. [32]

    WIREs Computational Statistics , volume =

    Alin, Aylin , title =. WIREs Computational Statistics , volume =

  33. [33]

    and Glauber, Robert R

    Farrar, Donald E. and Glauber, Robert R. , journal=. Multicollinearity in

  34. [34]

    Shen, Zheyan and Cui, Peng and Zhang, Tong and Kunag, Kun , year =. Stable

  35. [35]

    , title =

    Buch, Gregor and Schulz, Andreas and Schmidtmann, Irene and Strauch, Konstantin and Wild, Philipp S. , title =. Statistics in Medicine , volume =

  36. [36]

    Biometrika , volume=

    A group bridge approach for variable selection , author=. Biometrika , volume=

  37. [37]

    Statistics and

    Penalized methods for bi-level variable selection , author=. Statistics and

  38. [38]

    Statistics and Computing , volume=

    Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors , author=. Statistics and Computing , volume=

  39. [39]

    Breheny, Patrick , journal=. The

  40. [40]

    Stagewise

    Vaughan, Gregory and Aseltine, Robert and Chen, Kun and Yan, Jun , journal=. Stagewise

  41. [41]

    Chemometrics and Intelligent Laboratory Systems , volume=

    Ordered homogeneity pursuit lasso for group variable selection with applications to spectroscopic data , author=. Chemometrics and Intelligent Laboratory Systems , volume=

  42. [42]

    and Reich, Brian J

    Bondell, Howard D. and Reich, Brian J. , journal=

  43. [43]

    Zeng, Xiangrong and Figueiredo, Mário A. T. , journal=. Decreasing Weighted Sorted. 2014 , volume=

  44. [44]

    Pattern Recognition , volume=

    Collinear groupwise feature selection via discrete fusion group regression , author=. Pattern Recognition , volume=

  45. [45]

    Statistica Sinica , volume =

    Blockwise sparse regression , author=. Statistica Sinica , volume =

  46. [46]

    Proceedings of the 26th Annual International Conference on Machine Learning , pages =

    Kowalski, Matthieu and Szafranski, Marie and Ralaivola, Liva , title =. Proceedings of the 26th Annual International Conference on Machine Learning , pages =. 2009 , publisher =

  47. [47]

    2009 , author =

    Sparse regression using mixed norms , journal =. 2009 , author =

  48. [48]

    The Annals of Statistics , number =

    Peng Zhao and Guilherme Rocha and Bin Yu , title =. The Annals of Statistics , number =

  49. [49]

    Xu, Huan and Caramanis, Constantine and Mannor, Shie , journal=. Sparse. 2012 , volume=

  50. [50]

    Bickel and Ya’acov Ritov and Alexandre B

    Peter J. Bickel and Ya’acov Ritov and Alexandre B. Tsybakov , title =. The Annals of Statistics , number =

  51. [51]

    Peter B\"uhlmann and Sara van de Geer , title =

  52. [52]

    2010 , volume =

    Zhou, Yang and Jin, Rong and Hoi, Steven Chu–Hong , booktitle =. 2010 , volume =

  53. [53]

    Allen , title =

    Frederick Campbell and Genevera I. Allen , title =. Electronic Journal of Statistics , number =

  54. [54]

    Journal of Computational and Graphical Statistics , volume =

    Ray-Bing Chen and Chi-Hsiang Chu and Shinsheng Yuan and Ying Nian Wu , title =. Journal of Computational and Graphical Statistics , volume =. 2016 , publisher =

  55. [55]

    Journal of Statistical Planning and Inference , volume =

    Correlated variables in regression:. Journal of Statistical Planning and Inference , volume =. 2013 , author =

  56. [56]

    Electronic Journal of Statistics , number =

    Yiyuan She , title =. Electronic Journal of Statistics , number =

  57. [57]

    The Annals of Statistics , number =

    Jian Huang and Shuangge Ma and Hongzhe Li and Cun-Hui Zhang , title =. The Annals of Statistics , number =

  58. [58]

    Cours D'

    Pareto, Vilfredo , year=. Cours D'

  59. [59]

    The Annals of Applied Statistics , number =

    Jerome Friedman and Trevor Hastie and Holger H. The Annals of Applied Statistics , number =

  60. [60]

    2010 , publisher=

    Friedman, Jerome and Hastie, Trevor and Tibshirani, Rob , journal=. 2010 , publisher=

  61. [61]

    Donoho and Iain M

    David L. Donoho and Iain M. Johnstone , title =. Journal of the American Statistical Association , volume =. 1995 , publisher =

  62. [62]

    , title =

    Tseng, P. , title =. Journal of Optimization Theory and Applications , volume =

  63. [63]

    Guillaume and Bach

    Jenatton, Rodolphe and Mairal, Julien and Obozinski. Guillaume and Bach. Francis , title =. Journal of Machine Learning Research , volume =

  64. [64]

    and Müller, Samuel and Carroll, Raymond J

    Garcia, Tanya P. and Müller, Samuel and Carroll, Raymond J. and Walzem, Rosemary L. , title =. Bioinformatics , volume =

  65. [65]

    , title =

    Kim, Rakheon and Müller, Samuel and Garcia, Tanya P. , title =. Biometrical Journal , volume =

  66. [66]

    2006 , volume =

    Lan, Hong AND Chen, Meng AND Flowers, Jessica B AND Yandell, Brian S AND Stapleton, Donnie S AND Mata, Christine M AND Mui, Eric Ton-Keen AND Flowers, Matthew T AND Schueler, Kathryn L AND Manly, Kenneth F AND Williams, Robert W AND Kendziorski, Christina AND Attie, Alan D , journal =. 2006 , volume =

  67. [67]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =

    Song, Qifan and Liang, Faming , title =. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =

  68. [68]

    Bayesian Analysis , number =

    Christian Staerk and Maria Kateri and Ioannis Ntzoufras , title =. Bayesian Analysis , number =

  69. [69]

    Redfern and Michael Y

    Charles H. Redfern and Michael Y. Degtyarev and Andrew T. Kwa and Nathan Salomonis and Nathalie Cotte and Tania Nanevicz and Nick Fidelman and Kavin Desai and Karen Vranizan and Elena K. Lee and Peter Coward and Nila Shah and Janet A. Warrington and Glenn I. Fishman and Daniel Bernstein and Anthony J. Baker and Bruce R. Conklin , title =. Proceedings of t...

  70. [70]

    The Annals of Statistics , number =

    Bradley Efron and Trevor Hastie and Iain Johnstone and Robert Tibshirani , title =. The Annals of Statistics , number =

  71. [71]

    and Dahlquist, Kam D

    Segal, Mark R. and Dahlquist, Kam D. and Conklin, Bruce R. , title =. Journal of Computational Biology , volume =

  72. [72]

    Journal of Computational and Graphical Statistics , volume =

    Peter Hall and Hugh Miller , title =. Journal of Computational and Graphical Statistics , volume =. 2009 , publisher =

  73. [73]

    Journal of the American Statistical Association , volume =

    Runze Li, Wei Zhong and Liping Zhu , title =. Journal of the American Statistical Association , volume =. 2012 , publisher =

  74. [74]

    2018 , author =

    Efficient test-based variable selection for high-dimensional linear models , journal =. 2018 , author =

  75. [75]

    Soloff and Rina Foygel Barber and Rebecca Willett , title =

    Jake A. Soloff and Rina Foygel Barber and Rebecca Willett , title =. Journal of Machine Learning Research , year =

  76. [76]

    Zhao, Peng and Yu, Bin , journal=

  77. [77]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

    Sparse algorithms are not stable: A no-free-lunch theorem , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

  78. [78]

    A NOTE ON THE

    Chenlei Leng and Yi Lin and Grace Wahba , journal =. A NOTE ON THE

  79. [79]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =

    Fan, Jianqing and Lv, Jinchi , title =. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =

  80. [80]

    Technometrics , volume=

    Ridge regression: Biased estimation for nonorthogonal problems , author=. Technometrics , volume=. 1970 , publisher=

Showing first 80 references.