pith. sign in

arxiv: 1708.06897 · v2 · pith:TEDR7ZRZnew · submitted 2017-08-23 · 📊 stat.ME

Projected support points: a new method for high-dimensional data reduction

classification 📊 stat.ME
keywords datareductionhigh-dimensionalmethodpspsreducingcarloframework
0
0 comments X
read the original abstract

In an era where big and high-dimensional data is readily available, data scientists are inevitably faced with the challenge of reducing this data for expensive downstream computation or analysis. To this end, we present here a new method for reducing high-dimensional big data into a representative point set, called projected support points (PSPs). A key ingredient in our method is the so-called sparsity-inducing (SpIn) kernel, which encourages the preservation of low-dimensional features when reducing high-dimensional data. We begin by introducing a unifying theoretical framework for data reduction, connecting PSPs with fundamental sampling principles from experimental design and Quasi-Monte Carlo. Through this framework, we then derive sparsity conditions under which the curse-of-dimensionality in data reduction can be lifted for our method. Next, we propose two algorithms for one-shot and sequential reduction via PSPs, both of which exploit big data subsampling and majorization-minimization for efficient optimization. Finally, we demonstrate the practical usefulness of PSPs in two real-world applications, the first for data reduction in kernel learning, and the second for reducing Markov Chain Monte Carlo (MCMC) chains.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. BAMIFun: Bayesian Multiple Imputation for Functional Data

    stat.ME 2026-05 unverdicted novelty 7.0

    BAMIFun provides Bayesian multiple imputation for functional data via low-rank penalized spline models, achieving accurate imputation and improved coverage in simulations and real datasets compared to single-imputatio...