Hashing for Similarity Search: A Survey

Heng Tao Shen; Jianqiu Ji; Jingdong Wang; Jingkuan Song

arxiv: 1408.2927 · v1 · pith:Q3ZQPPPUnew · submitted 2014-08-13 · 💻 cs.DS · cs.CV· cs.DB

Hashing for Similarity Search: A Survey

Jingdong Wang , Heng Tao Shen , Jingkuan Song , Jianqiu Ji This is my paper

classification 💻 cs.DS cs.CVcs.DB

keywords hashhashingsearchbeendatadistributionfunctionslocality

0 comments

read the original abstract

Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RNSG: A Range-Aware Graph Index for Efficient Range-Filtered Approximate Nearest Neighbor Search
cs.DB 2026-03 unverdicted novelty 7.0

RNSG approximates the range-aware relative neighborhood graph (RRNG) to enable high-performance range-filtered ANN queries with one compact index instead of many.
Algorithms for Similarity Search and Pseudorandomness
cs.DS 2019-06 unverdicted novelty 7.0

Improved LSH frameworks for ANN search with space-time tradeoffs and matching lower bounds, a novel set-based ANN approach, self-tuning experiments, and deterministic/randomized pseudorandom generators with near-optim...
Statistical Clear Sky Fitting Algorithm
eess.SY 2019-07 unverdicted novelty 6.0

A statistical algorithm extracts a clear-sky performance signal from PV power measurements without external weather, irradiance, or configuration data.
Learning Compressed Sentence Representations for On-Device Text Processing
cs.CL 2019-06 unverdicted novelty 5.0

Four binarization strategies turn continuous sentence embeddings into binary form, cutting storage by over 98% with only about 2% performance drop on downstream tasks.