Provable Deterministic Leverage Score Sampling

Anastasios Kyrillidis; Christos Boutsidis; Dimitris Papailiopoulos

arxiv: 1404.1530 · v3 · pith:B4WK3RJ3new · submitted 2014-04-06 · 💻 cs.DS · cs.IT· cs.NA· math.IT· math.ST· stat.ML· stat.TH

Provable Deterministic Leverage Score Sampling

Dimitris Papailiopoulos , Anastasios Kyrillidis , Christos Boutsidis This is my paper

classification 💻 cs.DS cs.ITcs.NAmath.ITmath.STstat.MLstat.TH

keywords leveragesamplingdeterministicscorescorescolumnsdecayempirical

0 comments

read the original abstract

We explain theoretically a curious empirical phenomenon: "Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate". To obtain provable guarantees, previous work requires randomized sampling of the columns with probabilities proportional to their leverage scores. In this work, we provide a novel theoretical analysis of deterministic leverage score sampling. We show that such deterministic sampling can be provably as accurate as its randomized counterparts, if the leverage scores follow a moderately steep power-law decay. We support this power-law assumption by providing empirical evidence that such decay laws are abundant in real-world data sets. We then demonstrate empirically the performance of deterministic leverage score sampling, which many times matches or outperforms the state-of-the-art techniques.

This paper has not been read by Pith yet.

Provable Deterministic Leverage Score Sampling

discussion (0)