Variable Selection Using Relative Importance Rankings
Pith reviewed 2026-05-18 16:52 UTC · model grok-4.3
The pith
Relative importance rankings select variables as effectively as lasso methods, especially with highly correlated predictors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RI-based variable ranking and selection methods, including the new CRI.Z, are highly competitive with state-of-the-art linear-model methods such as the lasso and relaxed lasso, and particularly effective in cases involving clusters of highly correlated predictors because RI measures incorporate both direct and combined effects of predictors.
What carries the argument
Relative importance (RI) measures for pre-model variable ranking and filter-based selection, with CRI.Z as a computationally efficient implementation.
If this is right
- RI rankings are more accurate than marginal correlation rankings, especially for suppressed or weak predictors.
- Predictive models using RI-based selection are competitive with or better than lasso and relaxed lasso.
- RI-based selection performs well in the presence of highly correlated predictor clusters.
- RI methods provide practical utility for high-dimensional problems such as gene expression analysis.
Where Pith is reading between the lines
- RI ranking could be used as a fast pre-filter before applying more expensive selection procedures in very large feature spaces.
- The method might be extended to non-linear base models where direct lasso equivalents are unavailable.
- CRI.Z's efficiency opens the possibility of repeated RI-based selection inside cross-validation loops.
Load-bearing premise
Relative importance measures computed from a model fitted on training data will identify variables that improve out-of-sample prediction without post-hoc selection bias.
What would settle it
A simulation or real-data experiment with known true variables and correlated clusters in which models built from top RI-ranked variables show higher out-of-sample prediction error than models built from lasso-selected variables.
Figures
read the original abstract
Although conceptually related, variable selection and relative importance (RI) analysis have been treated quite differently in the literature. While RI is typically used for post-hoc model explanation, this paper explores its potential for variable or feature ranking and filter-based selection before model creation. Specifically, we anticipate strong performance from the RI measures because they incorporate both direct and combined effects of predictors, addressing a key limitation of marginal correlation, which ignores dependencies among predictors. We implement and evaluate the RI-based variable ranking and selection methods, including a newly proposed RI measure, CRI.Z, with improved computational efficiency relative to conventional RI measures. Through extensive simulations, we first demonstrate how the RI measures more accurately rank the variables than the marginal correlation, especially when there are suppressed or weak predictors. We then show that predictive models built on these rankings are highly competitive, often outperforming state-of-the-art linear-model methods such as the lasso and relaxed lasso. The proposed RI-based methods are particularly effective in challenging cases involving clusters of highly correlated predictors, a setting known to cause failures in many benchmark methods. The practical utility and efficiency of RI-based methods are further demonstrated through two high-dimensional gene expression datasets. Although lasso methods have dominated the recent literature on variable selection, our study reveals that the RI-based method is a powerful and competitive alternative. We believe these underutilized tools deserve greater attention in statistics and machine learning communities. The code is available at: https://github.com/tien-endotchang/RI-variable-selection.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes using relative importance (RI) measures—including a new computationally efficient variant CRI.Z—for variable ranking and pre-filter selection in linear models. It argues that these measures, by incorporating direct and combined predictor effects, outperform marginal correlation and are competitive with or superior to lasso and relaxed lasso, particularly in regimes with clusters of highly correlated predictors. Support comes from extensive simulations plus two high-dimensional gene-expression applications, with code released publicly.
Significance. If the empirical claims hold under properly isolated evaluation, the work would usefully redirect attention to RI tools as practical, interpretable alternatives to regularization-based selection in multicollinear high-dimensional settings common in genomics and elsewhere. Public code is a clear strength that supports reproducibility.
major comments (2)
- [§4] §4 (Simulation protocol) and associated tables/figures: the description of how RI rankings (including CRI.Z) are computed from a model fit to the full training set and then used as a hard pre-filter does not explicitly confirm use of nested cross-validation that isolates the ranking step from final performance evaluation. This is load-bearing for the headline claim of competitiveness with lasso, because any leakage would inflate the reported advantage in correlated-cluster regimes.
- [§5] §5 (Real-data experiments): the gene-expression results compare RI-based selection to lasso/relaxed lasso, yet the manuscript does not state whether the RI computation and variable ranking occur inside each CV fold or on the pooled training data before CV. Without this detail the superiority claims cannot be fully assessed for post-hoc selection bias.
minor comments (2)
- [Abstract] The abstract and §3 would benefit from a concise statement of the exact number of Monte Carlo replications, predictor dimensions, and noise levels used in the simulations so readers can gauge the scope of the reported advantages.
- [§3.1] Notation for the new CRI.Z measure in §3.1 is introduced without an immediate side-by-side complexity comparison to LMG or PMVD; adding a short table or sentence would clarify the efficiency gain.
Simulated Author's Rebuttal
We are grateful to the referee for their detailed and constructive feedback. The concerns raised about the simulation and real-data evaluation protocols are important for ensuring the validity of our claims. We have revised the manuscript to explicitly describe the use of nested cross-validation in both settings and updated the relevant sections, tables, and figures. Our responses to the major comments are as follows.
read point-by-point responses
-
Referee: [§4] §4 (Simulation protocol) and associated tables/figures: the description of how RI rankings (including CRI.Z) are computed from a model fit to the full training set and then used as a hard pre-filter does not explicitly confirm use of nested cross-validation that isolates the ranking step from final performance evaluation. This is load-bearing for the headline claim of competitiveness with lasso, because any leakage would inflate the reported advantage in correlated-cluster regimes.
Authors: We thank the referee for this critical observation. The original description indicated computation on the full training set, which could introduce leakage if not properly isolated. To strengthen the evaluation, we have revised the simulation protocol in Section 4 to employ nested cross-validation. Specifically, for each outer CV fold, the RI rankings (including CRI.Z) are computed exclusively on the training portion of that fold, variables are selected as a pre-filter, and the final model is trained and evaluated on the corresponding test fold. This ensures no information from the test data influences the ranking. The revised results continue to support the competitiveness of RI-based methods with lasso, especially in correlated predictor clusters, and we have updated the tables and figures to reflect these changes. revision: yes
-
Referee: [§5] §5 (Real-data experiments): the gene-expression results compare RI-based selection to lasso/relaxed lasso, yet the manuscript does not state whether the RI computation and variable ranking occur inside each CV fold or on the pooled training data before CV. Without this detail the superiority claims cannot be fully assessed for post-hoc selection bias.
Authors: We appreciate the referee highlighting the lack of explicit detail in the real-data section. Upon review, the RI computations and rankings in the gene-expression experiments were performed on the training data within each cross-validation fold to prevent post-hoc selection bias. We have now revised Section 5 to clearly state this nested procedure, confirming that variable selection via RI measures occurs inside the CV loop, with performance assessed on held-out folds. This clarification supports the validity of the reported superiority in the high-dimensional settings. revision: yes
Circularity Check
No significant circularity; empirical evaluation is independent of fitted inputs
full rationale
The paper proposes and empirically evaluates RI-based ranking and selection methods (including new CRI.Z) via simulations and gene-expression datasets. No derivation chain exists that reduces predictions or rankings to the same fitted quantities by construction. Performance is assessed against external benchmarks (lasso, relaxed lasso) and held-out or simulated ground truth, satisfying self-contained evaluation against external benchmarks. No self-citations, ansatzes, or fitted-input renamings are load-bearing for the central claims.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Relative importance measures computed from a fitted linear model capture both direct and combined effects of predictors.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
GD(xi) = 1/p ∑_{S⊆P∖{i}} (R²_{y·XS∪{i}} − R²_{y·XS}) … CRI via reduced SVD X=UrSrVr⊤ and reallocation (VrSrVr⊤)⊙(VrSrVr⊤)
-
IndisputableMonolith/Foundation/BranchSelection.leanRCLCombiner_isCoupling_iff unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CRI.Z replaces reallocation term with identity, yielding w²_G = (VrU⊤_r y)⊙(VrU⊤_r y)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, science (1999)
work page 1999
-
[3]
E. M. L. Beale, M. G. Kendall, D. Mann, The discarding of variables in multivariate analysis, Biometrika (1967)
work page 1967
-
[4]
R. R. Hocking, R. Leslie, Selection of the best subset in regression anal- ysis, Technometrics (1967)
work page 1967
-
[5]
M. Efroymson, Stepwise regression–a backward and forward look, in: Eastern Regional Meetings of the Institute of Mathematical Statistics, 1966
work page 1966
-
[6]
N. R. Draper, H. Smith, Applied regression analysis, Vol. 326, John Wiley & Sons, 1998
work page 1998
-
[7]
R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society Series B: Statistical Methodology (1996)
work page 1996
-
[8]
J. Fan, J. Lv, Sure independence screening for ultrahigh dimensional fea- ture space, Journal of the Royal Statistical Society Series B: Statistical Methodology (2008)
work page 2008
-
[9]
D. V. Budescu, Dominance analysis - a new approach to the problem of relative importance of predictors in multiple-regression, Psychological Bulletin (1993)
work page 1993
-
[10]
R. Azen, D. V. Budescu, The dominance analysis approach for compar- ing predictors in multiple regression., Psychological methods (2003). 23
work page 2003
-
[11]
J. W. Johnson, A heuristic method for estimating the relative weight of predictor variables in multiple regression, Multivariate behavioral re- search (2000)
work page 2000
-
[12]
J. W. Johnson, J. M. LeBreton, History and use of relative impor- tance indices in organizational research, Organizational research meth- ods (2004)
work page 2004
-
[13]
S. Tonidandel, J. M. LeBreton, Relative importance analysis: A useful supplement to regression analysis, Journal of Business and Psychology (2011)
work page 2011
-
[14]
J. W. Johnson, Best practice recommendations for conducting key driver analyses, Industrial and Organizational Psychology (2017)
work page 2017
- [15]
-
[16]
Z. Shen, A. Chen, Comprehensive relative importance analysis and its applications to high dimensional gene expression data analysis, Knowledge-Based Systems (2020)
work page 2020
- [17]
-
[18]
D. Bertsimas, A. King, R. Mazumder, Best subset selection via a modern optimization lens, The Annals of Statistics (2016)
work page 2016
-
[19]
L. S. Shapley, A Value for n-Person Games, Princeton University Press, 1953, pp. 307––317
work page 1953
-
[20]
S. M. Lundberg, S.-I. Lee, A unified approach to interpreting model predictions, Advances in neural information processing systems (2017)
work page 2017
-
[21]
R. M. Johnson, The minimal transformation to orthonormality, Psy- chometrika (1966)
work page 1966
-
[22]
J. M. LeBreton, R. E. Ployhart, R. T. Ladd, A monte carlo comparison of relative importance methodologies, Organizational Research Methods (2004). 24
work page 2004
-
[23]
Y. C. Chao, Y. Zhao, L. L. Kupper, L. A. Nylander-French, Quantifying the relative importance of predictors in multiple linear regression analy- ses for public health studies, Journal of occupational and environmental hygiene (2008)
work page 2008
-
[24]
J. Schäfer, K. Strimmer, A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics, Statistical applications in genetics and molecular biology (2005)
work page 2005
-
[25]
A. E. Hoerl, R. W. Kennard, Ridge regression: Biased estimation for nonorthogonal problems, Technometrics (1970)
work page 1970
-
[26]
U. Grömping, Relative importance for linear regression in R: the package relaimpo, Journal of statistical software (2007)
work page 2007
- [27]
-
[28]
B. Efron, How biased is the apparent error rate of a prediction rule?, Journal of the American statistical Association (1986)
work page 1986
-
[29]
Variable Selection using Relati ve Importance Rankings
R. J. Tibshirani, Degrees of freedom and model search, Statistica Sinica (2015). 25 Supplementary Material to “Variable Selection using Relati ve Importance Rankings” Tien-En Chang, Argon Chen This supplementary document contains plots from the simulation sui te described in the paper “Variable Selection using Relative Importance Rankings”. The plots in S...
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.