pith. sign in

High Dimensional Robust Sparse Regression

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

We provide a novel -- and to the best of our knowledge, the first -- algorithm for high dimensional sparse regression with constant fraction of corruptions in explanatory and/or response variables. Our algorithm recovers the true sparse parameters with sub-linear sample complexity, in the presence of a constant fraction of arbitrary corruptions. Our main contribution is a robust variant of Iterative Hard Thresholding. Using this, we provide accurate estimators: when the covariance matrix in sparse regression is identity, our error guarantee is near information-theoretically optimal. We then deal with robust sparse regression with unknown structured covariance matrix. We propose a filtering algorithm which consists of a novel randomized outlier removal technique for robust sparse mean estimation that may be of interest in its own right: the filtering algorithm is flexible enough to deal with unknown covariance. Also, it is orderwise more efficient computationally than the ellipsoid algorithm. Using sub-linear sample complexity, our algorithm achieves the best known (and first) error guarantee. We demonstrate the effectiveness on large-scale sparse regression problems with arbitrary corruptions.

fields

stat.ML 1

years

2019 1

verdicts

UNVERDICTED 1

representative citing papers

A Unified Approach to Robust Mean Estimation

stat.ML · 2019-07-01 · unverdicted · novelty 7.0

A connection between Huber's contamination and heavy-tailed models yields unified robust mean estimators that are both computationally efficient and statistically optimal under certain conditions.

citing papers explorer

Showing 1 of 1 citing paper.

  • A Unified Approach to Robust Mean Estimation stat.ML · 2019-07-01 · unverdicted · none · ref 28 · internal anchor

    A connection between Huber's contamination and heavy-tailed models yields unified robust mean estimators that are both computationally efficient and statistically optimal under certain conditions.