pith. sign in

arxiv: 1007.5510 · v2 · pith:QTSUEVBQnew · submitted 2010-07-30 · 📊 stat.CO · cs.NA

An algorithm for the principal component analysis of large data sets

classification 📊 stat.CO cs.NA
keywords datalargealgorithmanalysiscomponentefficientlymethodsprincipal
0
0 comments X
read the original abstract

Recently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy --- even on parallel processors --- unlike the classical (deterministic) alternatives. We adapt one of these randomized methods for use with data sets that are too large to be stored in random-access memory (RAM). (The traditional terminology is that our procedure works efficiently "out-of-core.") We illustrate the performance of the algorithm via several numerical examples. For example, we report on the PCA of a data set stored on disk that is so large that less than a hundredth of it can fit in our computer's RAM.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.