pith. sign in

arxiv: 1703.07285 · v2 · pith:BL5BGBCGnew · submitted 2017-03-21 · 📊 stat.ML · cs.LG· math.OC· stat.CO

From safe screening rules to working sets for faster Lasso-type solvers

classification 📊 stat.ML cs.LGmath.OCstat.CO
keywords lassoworkingconstraintslearningloopoptimizationproblemsrules
0
0 comments X
read the original abstract

Convex sparsity-promoting regularizations are ubiquitous in modern statistical learning. By construction, they yield solutions with few non-zero coefficients, which correspond to saturated constraints in the dual optimization formulation. Working set (WS) strategies are generic optimization techniques that consist in solving simpler problems that only consider a subset of constraints, whose indices form the WS. Working set methods therefore involve two nested iterations: the outer loop corresponds to the definition of the WS and the inner loop calls a solver for the subproblems. For the Lasso estimator a WS is a set of features, while for a Group Lasso it refers to a set of groups. In practice, WS are generally small in this context so the associated feature Gram matrix can fit in memory. Here we show that the Gauss-Southwell rule (a greedy strategy for block coordinate descent techniques) leads to fast solvers in this case. Combined with a working set strategy based on an aggressive use of so-called Gap Safe screening rules, we propose a solver achieving state-of-the-art performance on sparse learning problems. Results are presented on Lasso and multi-task Lasso estimators.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Large scale Lasso with windowed active set for convolutional spike sorting

    stat.ML 2019-06 unverdicted novelty 6.0

    A windowed active set algorithm solves large-scale Lasso for convolutional spike sorting with linear complexity in temporal dimension and parallel efficiency.