pith. sign in

arxiv: 0906.0684 · v1 · submitted 2009-06-03 · 💻 cs.DB · cs.IR

New Instability Results for High Dimensional Nearest Neighbor Search

classification 💻 cs.DB cs.IR
keywords epsilondatasetdistancelesspointsprobabilityratioresults
0
0 comments X
read the original abstract

Consider a dataset of n(d) points generated independently from R^d according to a common p.d.f. f_d with support(f_d) = [0,1]^d and sup{f_d([0,1]^d)} growing sub-exponentially in d. We prove that: (i) if n(d) grows sub-exponentially in d, then, for any query point q^d in [0,1]^d and any epsilon>0, the ratio of the distance between any two dataset points and q^d is less that 1+epsilon with probability -->1 as d-->infinity; (ii) if n(d)>[4(1+epsilon)]^d for large d, then for all q^d in [0,1]^d (except a small subset) and any epsilon>0, the distance ratio is less than 1+epsilon with limiting probability strictly bounded away from one. Moreover, we provide preliminary results along the lines of (i) when f_d=N(mu_d,Sigma_d).

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.