pith. sign in

arxiv: 1603.09320 · v4 · pith:RIVF4YFHnew · submitted 2016-03-30 · 💻 cs.DS · cs.CV· cs.IR· cs.SI

Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs

classification 💻 cs.DS cs.CVcs.IRcs.SI
keywords searchgraphshierarchicalallowsnavigableperformanceproximitysmall
0
0 comments X
read the original abstract

We present a new approach for the approximate K-nearest neighbor search based on navigable small world graphs with controllable hierarchy (Hierarchical NSW, HNSW). The proposed solution is fully graph-based, without any need for additional search structures, which are typically used at the coarse search stage of the most proximity graph techniques. Hierarchical NSW incrementally builds a multi-layer structure consisting from hierarchical set of proximity graphs (layers) for nested subsets of the stored elements. The maximum layer in which an element is present is selected randomly with an exponentially decaying probability distribution. This allows producing graphs similar to the previously studied Navigable Small World (NSW) structures while additionally having the links separated by their characteristic distance scales. Starting search from the upper layer together with utilizing the scale separation boosts the performance compared to NSW and allows a logarithmic complexity scaling. Additional employment of a heuristic for selecting proximity graph neighbors significantly increases performance at high recall and in case of highly clustered data. Performance evaluation has demonstrated that the proposed general metric space search index is able to strongly outperform previous opensource state-of-the-art vector-only approaches. Similarity of the algorithm to the skip list structure allows straightforward balanced distributed implementation.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Mycelium-Index: A Streaming Approximate Nearest Neighbor Index with Myelial Edge Decay, Traffic-Driven Reinforcement, and Adaptive Living Hierarchy

    cs.LG 2026-04 unverdicted novelty 7.0

    Mycelium-index matches state-of-the-art recall on streaming and static ANN benchmarks while using 5x less RAM and delivering higher query throughput on SIFT-1M.

  2. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

    cs.CL 2020-05 accept novelty 7.0

    RAG models set new state-of-the-art results on open-domain QA by retrieving Wikipedia passages and conditioning a generative model on them, while also producing more factual text than parametric baselines.

  3. Storage Is Not Memory: A Retrieval-Centered Architecture for Agent Recall

    cs.CL 2026-05 conditional novelty 6.0

    True Memory is a verbatim-event retrieval pipeline running on a single SQLite file that reaches 93% accuracy on LoCoMo multi-session questions, outperforming Mem0, Supermemory, Zep, and matching or exceeding EverMemOS...

  4. PUFFINN: Parameterless and Universally Fast FInding of Nearest Neighbors

    cs.DS 2019-06 unverdicted novelty 6.0

    PUFFINN is a parameterless probabilistic LSH index for k-NN search that combines heuristics into a rigorous method with guarantees and outperforms prior approaches on a new hard synthetic dataset.

  5. Using predefined vector systems to speed up neural network multimillion class classification

    cs.LG 2026-04 unverdicted novelty 5.0

    Predefined vector systems structure neural network latent spaces to allow O(1) label prediction via index searches on embedding vectors, delivering up to 11.6x speedup on multimillion-class tasks while preserving accu...

  6. KadiAssistant: A conversational AI Agent for information retrieval in Kadi4Mat

    cs.IR 2026-05 unverdicted novelty 4.0

    KadiAssistant is a privacy-by-design conversational AI that pairs a self-hosted LLM with semantic search to retrieve and structure information from the Kadi4Mat research data platform while respecting fine-grained per...