Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search

Tobias Christiani

arxiv: 1708.07586 · v2 · pith:2JZNE2OUnew · submitted 2017-08-25 · 💻 cs.DS

Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search

Tobias Christiani This is my paper

classification 💻 cs.DS

keywords locality-sensitiveframeworkhashcomputationsdistancefunctionshashingmathcal

0 comments

read the original abstract

The Indyk-Motwani Locality-Sensitive Hashing (LSH) framework (STOC 1998) is a general technique for constructing a data structure to answer approximate near neighbor queries by using a distribution $\mathcal{H}$ over locality-sensitive hash functions that partition space. For a collection of $n$ points, after preprocessing, the query time is dominated by $O(n^{\rho} \log n)$ evaluations of hash functions from $\mathcal{H}$ and $O(n^{\rho})$ hash table lookups and distance computations where $\rho \in (0,1)$ is determined by the locality-sensitivity properties of $\mathcal{H}$. It follows from a recent result by Dahlgaard et al. (FOCS 2017) that the number of locality-sensitive hash functions can be reduced to $O(\log^2 n)$, leaving the query time to be dominated by $O(n^{\rho})$ distance computations and $O(n^{\rho} \log n)$ additional word-RAM operations. We state this result as a general framework and provide a simpler analysis showing that the number of lookups and distance computations closely match the Indyk-Motwani framework, making it a viable replacement in practice. Using ideas from another locality-sensitive hashing framework by Andoni and Indyk (SODA 2006) we are able to reduce the number of additional word-RAM operations to $O(n^\rho)$.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Algorithms for Similarity Search and Pseudorandomness
cs.DS 2019-06 unverdicted novelty 7.0

Improved LSH frameworks for ANN search with space-time tradeoffs and matching lower bounds, a novel set-based ANN approach, self-tuning experiments, and deterministic/randomized pseudorandom generators with near-optim...
PUFFINN: Parameterless and Universally Fast FInding of Nearest Neighbors
cs.DS 2019-06 unverdicted novelty 6.0

PUFFINN is a parameterless probabilistic LSH index for k-NN search that combines heuristics into a rigorous method with guarantees and outperforms prior approaches on a new hard synthetic dataset.