pith. sign in

arXiv preprint arXiv:1708.06897 , year=

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it
abstract

In an era where big and high-dimensional data is readily available, data scientists are inevitably faced with the challenge of reducing this data for expensive downstream computation or analysis. To this end, we present here a new method for reducing high-dimensional big data into a representative point set, called projected support points (PSPs). A key ingredient in our method is the so-called sparsity-inducing (SpIn) kernel, which encourages the preservation of low-dimensional features when reducing high-dimensional data. We begin by introducing a unifying theoretical framework for data reduction, connecting PSPs with fundamental sampling principles from experimental design and Quasi-Monte Carlo. Through this framework, we then derive sparsity conditions under which the curse-of-dimensionality in data reduction can be lifted for our method. Next, we propose two algorithms for one-shot and sequential reduction via PSPs, both of which exploit big data subsampling and majorization-minimization for efficient optimization. Finally, we demonstrate the practical usefulness of PSPs in two real-world applications, the first for data reduction in kernel learning, and the second for reducing Markov Chain Monte Carlo (MCMC) chains.

years

2026 2

verdicts

UNVERDICTED 2

representative citing papers

BAMIFun: Bayesian Multiple Imputation for Functional Data

stat.ME · 2026-05-08 · unverdicted · novelty 7.0

BAMIFun provides Bayesian multiple imputation for functional data via low-rank penalized spline models, achieving accurate imputation and improved coverage in simulations and real datasets compared to single-imputation FPCA methods.

Stein Kernelized Molecular Dynamics for Active Learning of Interatomic Potentials

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

SKMD adapts Stein variational gradient descent into molecular dynamics with asynchronous updates and global atomic descriptor kernels to acquire non-redundant training configurations while preserving the Boltzmann distribution, yielding higher MLIP accuracy with fewer samples than baselines.

citing papers explorer

Showing 2 of 2 citing papers.

  • BAMIFun: Bayesian Multiple Imputation for Functional Data stat.ME · 2026-05-08 · unverdicted · none · ref 166

    BAMIFun provides Bayesian multiple imputation for functional data via low-rank penalized spline models, achieving accurate imputation and improved coverage in simulations and real datasets compared to single-imputation FPCA methods.

  • Stein Kernelized Molecular Dynamics for Active Learning of Interatomic Potentials cs.LG · 2026-06-02 · unverdicted · none · ref 61 · internal anchor

    SKMD adapts Stein variational gradient descent into molecular dynamics with asynchronous updates and global atomic descriptor kernels to acquire non-redundant training configurations while preserving the Boltzmann distribution, yielding higher MLIP accuracy with fewer samples than baselines.