pith. sign in

arxiv: 2509.10853 · v2 · submitted 2025-09-13 · 📊 stat.ML · cs.LG

Variable Selection Using Relative Importance Rankings

Pith reviewed 2026-05-18 16:52 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords variable selectionrelative importanceCRI.Zlassofeature rankingcorrelated predictorshigh-dimensional datagene expression
0
0 comments X

The pith

Relative importance rankings select variables as effectively as lasso methods, especially with highly correlated predictors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes using relative importance measures for ranking and selecting variables before fitting a model, rather than only for post-hoc explanation. These measures capture both the direct contribution of each predictor and its combined effects with others, giving them an advantage over marginal correlations that ignore predictor dependencies. Simulations demonstrate that rankings from RI measures, including a new efficient variant called CRI.Z, produce predictive models that compete with or outperform lasso and relaxed lasso. The advantage is clearest in datasets containing clusters of highly correlated predictors, and the approach is shown to work on real high-dimensional gene expression data.

Core claim

RI-based variable ranking and selection methods, including the new CRI.Z, are highly competitive with state-of-the-art linear-model methods such as the lasso and relaxed lasso, and particularly effective in cases involving clusters of highly correlated predictors because RI measures incorporate both direct and combined effects of predictors.

What carries the argument

Relative importance (RI) measures for pre-model variable ranking and filter-based selection, with CRI.Z as a computationally efficient implementation.

If this is right

  • RI rankings are more accurate than marginal correlation rankings, especially for suppressed or weak predictors.
  • Predictive models using RI-based selection are competitive with or better than lasso and relaxed lasso.
  • RI-based selection performs well in the presence of highly correlated predictor clusters.
  • RI methods provide practical utility for high-dimensional problems such as gene expression analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • RI ranking could be used as a fast pre-filter before applying more expensive selection procedures in very large feature spaces.
  • The method might be extended to non-linear base models where direct lasso equivalents are unavailable.
  • CRI.Z's efficiency opens the possibility of repeated RI-based selection inside cross-validation loops.

Load-bearing premise

Relative importance measures computed from a model fitted on training data will identify variables that improve out-of-sample prediction without post-hoc selection bias.

What would settle it

A simulation or real-data experiment with known true variables and correlated clusters in which models built from top RI-ranked variables show higher out-of-sample prediction error than models built from lasso-selected variables.

Figures

Figures reproduced from arXiv: 2509.10853 by Argon Chen, Tien-En Chang.

Figure 1
Figure 1. Figure 1: Boxplots for S for the GD, CRI, CRI.Z, and SIS methods for ρ ∈ {0.35, 0.7, 0.9} and SNR ∈ {0.05, 0.25, 1.22, 6} based on 100 replications under different examples with (n, p) = (100, 10). Correlation 0.35 Correlation 0.7 Correlation 0.9 SNR 0.05 SNR 0.25 SNR 1.22 SNR 6 SNR 0.05 SNR 0.25 SNR 1.22 SNR 6 SNR 0.05 SNR 0.25 SNR 1.22 SNR 6 Example 1 Example 2 Example 3 1 5 10 1 5 10 1 5 10 1 5 10 1 5 10 1 5 10 1… view at source ↗
Figure 2
Figure 2. Figure 2: Summary results for Pr(k) for the GD, CRI, CRI.Z, and SIS methods for ρ ∈ {0.35, 0.7, 0.9} and SNR ∈ {0.05, 0.25, 1.22, 6} based on 100 replications under different examples with (n, p) = (100, 10). In the high-100 dimension setting (Figs. 3–4), SIS again performs poorly in Examples 2 and 3. Among RI measures, CRI.Z and CAR consistently out￾12 [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Boxplots for S for the GD, CRI, CRI.Z, and SIS methods for ρ ∈ {0.35, 0.7, 0.9} and SNR ∈ {0.05, 0.25, 1.22, 6} based on 100 replications under different examples with (n, p) = (100, 1000). Correlation 0.35 Correlation 0.7 Correlation 0.9 SNR 0.05 SNR 0.25 SNR 1.22 SNR 6 SNR 0.05 SNR 0.25 SNR 1.22 SNR 6 SNR 0.05 SNR 0.25 SNR 1.22 SNR 6 Example 1 Example 2 Example 3 1 25 50 1 25 50 1 25 50 1 25 50 1 25 50 1… view at source ↗
Figure 4
Figure 4. Figure 4: Summary results for Pr(k) for the CRI, CAR, CRI.Z, and SIS methods for ρ ∈ {0.35, 0.7, 0.9} and SNR ∈ {0.05, 0.25, 1.22, 6} based on 100 replications under different examples with (n, p) = (100, 1000). perform CRI, particularly under higher correlations. That highlights that a simple identity reallocation from the orthogonal predictors to original pre￾13 [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: F1-score as function of SNR in the low setting with [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: RTE as function of SNR in the low setting with [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: F1-score as function of SNR in the high-100 setting with [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: RTE as function of SNR in the high-100 setting with [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Effective degrees of freedom for the benchmark methods, LS-SIS, and LS-RI [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
read the original abstract

Although conceptually related, variable selection and relative importance (RI) analysis have been treated quite differently in the literature. While RI is typically used for post-hoc model explanation, this paper explores its potential for variable or feature ranking and filter-based selection before model creation. Specifically, we anticipate strong performance from the RI measures because they incorporate both direct and combined effects of predictors, addressing a key limitation of marginal correlation, which ignores dependencies among predictors. We implement and evaluate the RI-based variable ranking and selection methods, including a newly proposed RI measure, CRI.Z, with improved computational efficiency relative to conventional RI measures. Through extensive simulations, we first demonstrate how the RI measures more accurately rank the variables than the marginal correlation, especially when there are suppressed or weak predictors. We then show that predictive models built on these rankings are highly competitive, often outperforming state-of-the-art linear-model methods such as the lasso and relaxed lasso. The proposed RI-based methods are particularly effective in challenging cases involving clusters of highly correlated predictors, a setting known to cause failures in many benchmark methods. The practical utility and efficiency of RI-based methods are further demonstrated through two high-dimensional gene expression datasets. Although lasso methods have dominated the recent literature on variable selection, our study reveals that the RI-based method is a powerful and competitive alternative. We believe these underutilized tools deserve greater attention in statistics and machine learning communities. The code is available at: https://github.com/tien-endotchang/RI-variable-selection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes using relative importance (RI) measures—including a new computationally efficient variant CRI.Z—for variable ranking and pre-filter selection in linear models. It argues that these measures, by incorporating direct and combined predictor effects, outperform marginal correlation and are competitive with or superior to lasso and relaxed lasso, particularly in regimes with clusters of highly correlated predictors. Support comes from extensive simulations plus two high-dimensional gene-expression applications, with code released publicly.

Significance. If the empirical claims hold under properly isolated evaluation, the work would usefully redirect attention to RI tools as practical, interpretable alternatives to regularization-based selection in multicollinear high-dimensional settings common in genomics and elsewhere. Public code is a clear strength that supports reproducibility.

major comments (2)
  1. [§4] §4 (Simulation protocol) and associated tables/figures: the description of how RI rankings (including CRI.Z) are computed from a model fit to the full training set and then used as a hard pre-filter does not explicitly confirm use of nested cross-validation that isolates the ranking step from final performance evaluation. This is load-bearing for the headline claim of competitiveness with lasso, because any leakage would inflate the reported advantage in correlated-cluster regimes.
  2. [§5] §5 (Real-data experiments): the gene-expression results compare RI-based selection to lasso/relaxed lasso, yet the manuscript does not state whether the RI computation and variable ranking occur inside each CV fold or on the pooled training data before CV. Without this detail the superiority claims cannot be fully assessed for post-hoc selection bias.
minor comments (2)
  1. [Abstract] The abstract and §3 would benefit from a concise statement of the exact number of Monte Carlo replications, predictor dimensions, and noise levels used in the simulations so readers can gauge the scope of the reported advantages.
  2. [§3.1] Notation for the new CRI.Z measure in §3.1 is introduced without an immediate side-by-side complexity comparison to LMG or PMVD; adding a short table or sentence would clarify the efficiency gain.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their detailed and constructive feedback. The concerns raised about the simulation and real-data evaluation protocols are important for ensuring the validity of our claims. We have revised the manuscript to explicitly describe the use of nested cross-validation in both settings and updated the relevant sections, tables, and figures. Our responses to the major comments are as follows.

read point-by-point responses
  1. Referee: [§4] §4 (Simulation protocol) and associated tables/figures: the description of how RI rankings (including CRI.Z) are computed from a model fit to the full training set and then used as a hard pre-filter does not explicitly confirm use of nested cross-validation that isolates the ranking step from final performance evaluation. This is load-bearing for the headline claim of competitiveness with lasso, because any leakage would inflate the reported advantage in correlated-cluster regimes.

    Authors: We thank the referee for this critical observation. The original description indicated computation on the full training set, which could introduce leakage if not properly isolated. To strengthen the evaluation, we have revised the simulation protocol in Section 4 to employ nested cross-validation. Specifically, for each outer CV fold, the RI rankings (including CRI.Z) are computed exclusively on the training portion of that fold, variables are selected as a pre-filter, and the final model is trained and evaluated on the corresponding test fold. This ensures no information from the test data influences the ranking. The revised results continue to support the competitiveness of RI-based methods with lasso, especially in correlated predictor clusters, and we have updated the tables and figures to reflect these changes. revision: yes

  2. Referee: [§5] §5 (Real-data experiments): the gene-expression results compare RI-based selection to lasso/relaxed lasso, yet the manuscript does not state whether the RI computation and variable ranking occur inside each CV fold or on the pooled training data before CV. Without this detail the superiority claims cannot be fully assessed for post-hoc selection bias.

    Authors: We appreciate the referee highlighting the lack of explicit detail in the real-data section. Upon review, the RI computations and rankings in the gene-expression experiments were performed on the training data within each cross-validation fold to prevent post-hoc selection bias. We have now revised Section 5 to clearly state this nested procedure, confirming that variable selection via RI measures occurs inside the CV loop, with performance assessed on held-out folds. This clarification supports the validity of the reported superiority in the high-dimensional settings. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation is independent of fitted inputs

full rationale

The paper proposes and empirically evaluates RI-based ranking and selection methods (including new CRI.Z) via simulations and gene-expression datasets. No derivation chain exists that reduces predictions or rankings to the same fitted quantities by construction. Performance is assessed against external benchmarks (lasso, relaxed lasso) and held-out or simulated ground truth, satisfying self-contained evaluation against external benchmarks. No self-citations, ansatzes, or fitted-input renamings are load-bearing for the central claims.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach relies on standard regression assumptions for computing RI measures and on the representativeness of the simulated and gene-expression datasets; no new free parameters or invented entities are introduced beyond the definition of CRI.Z.

axioms (1)
  • domain assumption Relative importance measures computed from a fitted linear model capture both direct and combined effects of predictors.
    Stated in the abstract as the reason RI should outperform marginal correlation.

pith-pipeline@v0.9.0 · 5789 in / 1282 out tokens · 27730 ms · 2026-05-18T16:52:14.452241+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

  1. [1]

    Guyon, A

    I. Guyon, A. Elisseeff, An introduction to variable and feature selection, Journal of machine learning research (2003)

  2. [2]

    T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, science (1999)

  3. [3]

    E. M. L. Beale, M. G. Kendall, D. Mann, The discarding of variables in multivariate analysis, Biometrika (1967)

  4. [4]

    R. R. Hocking, R. Leslie, Selection of the best subset in regression anal- ysis, Technometrics (1967)

  5. [5]

    Efroymson, Stepwise regression–a backward and forward look, in: Eastern Regional Meetings of the Institute of Mathematical Statistics, 1966

    M. Efroymson, Stepwise regression–a backward and forward look, in: Eastern Regional Meetings of the Institute of Mathematical Statistics, 1966

  6. [6]

    N. R. Draper, H. Smith, Applied regression analysis, Vol. 326, John Wiley & Sons, 1998

  7. [7]

    Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society Series B: Statistical Methodology (1996)

    R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society Series B: Statistical Methodology (1996)

  8. [8]

    J. Fan, J. Lv, Sure independence screening for ultrahigh dimensional fea- ture space, Journal of the Royal Statistical Society Series B: Statistical Methodology (2008)

  9. [9]

    D. V. Budescu, Dominance analysis - a new approach to the problem of relative importance of predictors in multiple-regression, Psychological Bulletin (1993)

  10. [10]

    R. Azen, D. V. Budescu, The dominance analysis approach for compar- ing predictors in multiple regression., Psychological methods (2003). 23

  11. [11]

    J. W. Johnson, A heuristic method for estimating the relative weight of predictor variables in multiple regression, Multivariate behavioral re- search (2000)

  12. [12]

    J. W. Johnson, J. M. LeBreton, History and use of relative impor- tance indices in organizational research, Organizational research meth- ods (2004)

  13. [13]

    Tonidandel, J

    S. Tonidandel, J. M. LeBreton, Relative importance analysis: A useful supplement to regression analysis, Journal of Business and Psychology (2011)

  14. [14]

    J. W. Johnson, Best practice recommendations for conducting key driver analyses, Industrial and Organizational Psychology (2017)

  15. [15]

    Zuber, K

    V. Zuber, K. Strimmer, High-dimensional regression and variable selec- tion using car scores, Statistical Applications in Genetics and Molecular Biology (2011)

  16. [16]

    Z. Shen, A. Chen, Comprehensive relative importance analysis and its applications to high dimensional gene expression data analysis, Knowledge-Based Systems (2020)

  17. [17]

    Hastie, R

    T. Hastie, R. Tibshirani, R. Tibshirani, Best subset, forward stepwise or lasso? Analysis and recommendations based on extensive comparisons, Statistical Science (2020)

  18. [18]

    Bertsimas, A

    D. Bertsimas, A. King, R. Mazumder, Best subset selection via a modern optimization lens, The Annals of Statistics (2016)

  19. [19]

    L. S. Shapley, A Value for n-Person Games, Princeton University Press, 1953, pp. 307––317

  20. [20]

    S. M. Lundberg, S.-I. Lee, A unified approach to interpreting model predictions, Advances in neural information processing systems (2017)

  21. [21]

    R. M. Johnson, The minimal transformation to orthonormality, Psy- chometrika (1966)

  22. [22]

    J. M. LeBreton, R. E. Ployhart, R. T. Ladd, A monte carlo comparison of relative importance methodologies, Organizational Research Methods (2004). 24

  23. [23]

    Y. C. Chao, Y. Zhao, L. L. Kupper, L. A. Nylander-French, Quantifying the relative importance of predictors in multiple linear regression analy- ses for public health studies, Journal of occupational and environmental hygiene (2008)

  24. [24]

    Schäfer, K

    J. Schäfer, K. Strimmer, A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics, Statistical applications in genetics and molecular biology (2005)

  25. [25]

    A. E. Hoerl, R. W. Kennard, Ridge regression: Biased estimation for nonorthogonal problems, Technometrics (1970)

  26. [26]

    Grömping, Relative importance for linear regression in R: the package relaimpo, Journal of statistical software (2007)

    U. Grömping, Relative importance for linear regression in R: the package relaimpo, Journal of statistical software (2007)

  27. [27]

    Zuber, K

    V. Zuber, K. Strimmer., care: High-Dimensional Regression and CAR Score Variable Selection, r package version 1.1.11 (2021). URL https://CRAN.R-project.org/package=care

  28. [28]

    Efron, How biased is the apparent error rate of a prediction rule?, Journal of the American statistical Association (1986)

    B. Efron, How biased is the apparent error rate of a prediction rule?, Journal of the American statistical Association (1986)

  29. [29]

    Variable Selection using Relati ve Importance Rankings

    R. J. Tibshirani, Degrees of freedom and model search, Statistica Sinica (2015). 25 Supplementary Material to “Variable Selection using Relati ve Importance Rankings” Tien-En Chang, Argon Chen This supplementary document contains plots from the simulation sui te described in the paper “Variable Selection using Relative Importance Rankings”. The plots in S...