Extended Comparisons of Best Subset Selection, Forward Stepwise Selection, and the Lasso
pith:THKVBL5J Add to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{THKVBL5J}
Prints a linked pith:THKVBL5J badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
read the original abstract
In exciting new work, Bertsimas et al. (2016) showed that the classical best subset selection problem in regression modeling can be formulated as a mixed integer optimization (MIO) problem. Using recent advances in MIO algorithms, they demonstrated that best subset selection can now be solved at much larger problem sizes that what was thought possible in the statistics community. They presented empirical comparisons of best subset selection with other popular variable selection procedures, in particular, the lasso and forward stepwise selection. Surprisingly (to us), their simulations suggested that best subset selection consistently outperformed both methods in terms of prediction accuracy. Here we present an expanded set of simulations to shed more light on these comparisons. The summary is roughly as follows: (a) neither best subset selection nor the lasso uniformly dominate the other, with best subset selection generally performing better in high signal-to-noise (SNR) ratio regimes, and the lasso better in low SNR regimes; (b) best subset selection and forward stepwise perform quite similarly throughout; (c) the relaxed lasso (actually, a simplified version of the original relaxed estimator defined in Meinshausen, 2007) is the overall winner, performing just about as well as the lasso in low SNR scenarios, and as well as best subset selection in high SNR scenarios.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Efficient compression of neural networks and datasets
Refined probabilistic and smooth l0 pruning techniques approximate minimum description length for neural networks, achieving high compression with minimal accuracy loss and empirically verifying better sample efficien...
-
A systematic review of metaheuristics-based and machine learning-driven intrusion detection systems in IoT
A systematic literature review that maps metaheuristic algorithms to ML-based intrusion detection systems in IoT, with separate analysis of feature selection, parameter tuning, and hybrid applications plus a proposed ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.