pith. sign in

arxiv: 1109.3843 · v2 · pith:MQLDYDARnew · submitted 2011-09-18 · 💻 cs.DS · cs.DM· cs.LG

Fast approximation of matrix coherence and statistical leverage

classification 💻 cs.DS cs.DMcs.LG
keywords matrixapproximationleveragestatisticalalgorithmscoresanalysiscoherence
0
0 comments X
read the original abstract

The statistical leverage scores of a matrix $A$ are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix completion and Nystr\"{o}m-based low-rank matrix approximation as well as in large-scale statistical data analysis applications more generally; moreover, they are of interest since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary $n \times d$ matrix $A$, with $n \gg d$, and that returns as output relative-error approximations to all $n$ of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of $n$ and $d$) in $O(n d \log n)$ time, as opposed to the $O(nd^2)$ time required by the na\"{i}ve algorithm that involves computing an orthogonal basis for the range of $A$. Our analysis may be viewed in terms of computing a relative-error approximation to an underconstrained least-squares approximation problem, or, relatedly, it may be viewed as an application of Johnson-Lindenstrauss type ideas. Several practically-important extensions of our basic result are also described, including the approximation of so-called cross-leverage scores, the extension of these ideas to matrices with $n \approx d$, and the extension to streaming environments.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.