pith. sign in

arxiv: 0811.2769 · v2 · submitted 2008-11-17 · 🧮 math.PR · math.ST· stat.TH

Quantitative asymptotics of graphical projection pursuit

classification 🧮 math.PR math.STstat.TH
keywords thetaboundrandomsigmadataepsilonfixedgaussian
0
0 comments X
read the original abstract

There is a result of Diaconis and Freedman which says that, in a limiting sense, for large collections of high-dimensional data most one-dimensional projections of the data are approximately Gaussian. This paper gives quantitative versions of that result. For a set of deterministic vectors $\{x_i\}_{i=1}^n$ in $\R^d$ with $n$ and $d$ fixed, let $\theta\in\s^{d-1}$ be a random point of the sphere and let $\mu_n^\theta$ denote the random measure which puts mass $\frac{1}{n}$ at each of the points $\inprod{x_1}{\theta},...,\inprod{x_n}{\theta}$. For a fixed bounded Lipschitz test function $f$, $Z$ a standard Gaussian random variable and $\sigma^2$ a suitable constant, an explicit bound is derived for the quantity $\ds\P[|\int f d\mu_n^\theta-\E f(\sigma Z)|>\epsilon]$. A bound is also given for $\ds\P[d_{BL}(\mu_n^\theta, N(0,\sigma^2))>\epsilon]$, where $d_{BL}$ denotes the bounded-Lipschitz distance, which yields a lower bound on the waiting time to finding a non-Gaussian projection of the $\{x_i\}$ if directions are tried independently and uniformly on $\s^{d-1}$.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.