pith. sign in

hub Mixed citations

Billion-scale similarity search with GPUs

Mixed citation behavior. Most common role is method (60%).

23 Pith papers citing it
Method 60% of classified citations
abstract

Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data-parallel tasks, prior approaches are bottlenecked by algorithms that expose less parallelism, such as k-min selection, or make poor use of the memory hierarchy. We propose a design for k-selection that operates at up to 55% of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5x faster than prior GPU state of the art. We apply it in different similarity search scenarios, by proposing optimized design for brute-force, approximate and compressed-domain search based on product quantization. In all these setups, we outperform the state of the art by large margins. Our implementation enables the construction of a high accuracy k-NN graph on 95 million images from the Yfcc100M dataset in 35 minutes, and of a graph connecting 1 billion vectors in less than 12 hours on 4 Maxwell Titan X GPUs. We have open-sourced our approach for the sake of comparison and reproducibility.

hub tools

citation-role summary

method 3 background 2

citation-polarity summary

representative citing papers

Dense Passage Retrieval for Open-Domain Question Answering

cs.CL · 2020-04-10 · accept · novelty 8.0

Dense dual-encoder retrievers outperform BM25 by 9-19% absolute in top-20 passage retrieval accuracy across open-domain QA datasets and enable new state-of-the-art end-to-end QA results.

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

cs.CL · 2019-08-27 · unverdicted · novelty 8.0

Sentence-BERT adapts BERT with siamese and triplet networks to produce sentence embeddings for efficient cosine-similarity comparisons, cutting computation time from hours to seconds on similarity search while matching BERT accuracy.

Unsupervised Adversarial Graph Alignment with Graph Embedding

cs.SI · 2019-07-01 · unverdicted · novelty 6.0

UAGA aligns two graph embedding spaces via adversarial training in a fully unsupervised setting, with an incremental extension iUAGA that uses discovered pseudo-anchors to refine both embeddings and alignments.

Pyramid: A General Framework for Distributed Similarity Search

cs.DC · 2019-06-25 · unverdicted · novelty 6.0

Pyramid is a distributed similarity search framework based on HNSW that partitions datasets into similar-item sub-datasets for efficient query processing and includes failure recovery and straggler mitigation.

Product Quantization for Surface Soil Similarity

cs.LG · 2025-06-03 · unverdicted · novelty 4.0

A pipeline using product quantization and systematic parameter evaluation creates data-driven soil taxonomies with higher specificity than human-derived classifications.

citing papers explorer

Showing 23 of 23 citing papers.