PiPNN: Ultra-Scalable Graph-Based Nearest Neighbor Indexing
read the original abstract
The fastest indexes for Approximate Nearest Neighbor Search today are also the slowest to build: graph-based methods like HNSW and Vamana achieve state-of-the-art query performance but have large construction times due to relying on random-access-heavy beam searches. We introduce PiPNN (Pick-in-Partitions Nearest Neighbors), an ultra-scalable graph construction algorithm that avoids this ``search bottleneck'' that existing graph-based methods suffer from. PiPNN's core innovation is HashPrune, a novel online pruning algorithm which dynamically maintains sparse collections of edges. HashPrune enables PiPNN to partition the dataset into overlapping sub-problems, efficiently perform bulk distance comparisons via dense matrix multiplication kernels, and stream a subset of the edges into HashPrune. HashPrune guarantees bounded memory during index construction which permits PiPNN to build higher quality indices without the use of extra intermediate memory. PiPNN builds state-of-the-art indexes up to 11.6x faster than Vamana (DiskANN) and up to 12.9x faster than HNSW. PiPNN is significantly more scalable than recent algorithms for fast graph construction. PiPNN builds indexes at least 19.1x faster than MIRAGE and 17.3x than FastKCNA while producing indexes that achieve higher query throughput. PiPNN enables us to build, for the first time, high-quality ANN indexes on billion-scale datasets in under 20 minutes using a single multicore machine.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
QuIVer: Rethinking ANN Graph Topology via Training-Free Binary Quantization
QuIVer constructs ANN graph indices entirely inside a 2-bit quantized metric space, delivering high recall and throughput on embedding datasets while using far less memory than standard HNSW implementations.
-
QuIVer: Rethinking ANN Graph Topology via Training-Free Binary Quantization
QuIVer performs Vamana-style graph construction entirely inside a 2-bit Sign-Magnitude BQ space, achieving >=88% Recall@10 on contrastive-learning embeddings and 2.5-5.5x higher throughput than DiskANN/HNSW at matched...
-
QuIVer: Rethinking ANN Graph Topology via Training-Free Binary Quantization
QuIVer constructs ANN graphs using only 2-bit sign-magnitude binary quantization for topology decisions, achieving at least 88% Recall@10 at high throughput with low memory on embedding datasets.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.