Billion-scale similarity search with GPUs

Jeff Johnson , Matthijs Douze , Herv\'e J\'egou

Authors on Pith no claims yet

classification 💻 cs.CV cs.DBcs.DScs.IR

keywords gpussearchsimilaritydesigngraphimagesimplementationless

read the original abstract

Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data-parallel tasks, prior approaches are bottlenecked by algorithms that expose less parallelism, such as k-min selection, or make poor use of the memory hierarchy. We propose a design for k-selection that operates at up to 55% of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5x faster than prior GPU state of the art. We apply it in different similarity search scenarios, by proposing optimized design for brute-force, approximate and compressed-domain search based on product quantization. In all these setups, we outperform the state of the art by large margins. Our implementation enables the construction of a high accuracy k-NN graph on 95 million images from the Yfcc100M dataset in 35 minutes, and of a graph connecting 1 billion vectors in less than 12 hours on 4 Maxwell Titan X GPUs. We have open-sourced our approach for the sake of comparison and reproducibility.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 13 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models
cs.IR 2021-04 accept novelty 8.0

BEIR is a heterogeneous zero-shot benchmark showing BM25 as a robust baseline while re-ranking and late-interaction models perform best on average at higher cost, with dense and sparse models lagging in generalization.
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
cs.CL 2019-08 unverdicted novelty 8.0

Sentence-BERT adapts BERT with siamese and triplet networks to produce sentence embeddings for efficient cosine-similarity comparisons, cutting computation time from hours to seconds on similarity search while matchin...
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
cs.CL 2020-05 accept novelty 7.0

RAG models set new state-of-the-art results on open-domain QA by retrieving Wikipedia passages and conditioning a generative model on them, while also producing more factual text than parametric baselines.
QRAFTI: An Agentic Framework for Empirical Research in Quantitative Finance
cs.MA 2026-04 unverdicted novelty 6.0

QRAFTI is a multi-agent framework using tool-calling and reflection-based planning to emulate quant research tasks like factor replication and signal testing on financial data.
TRUST Agents: A Collaborative Multi-Agent Framework for Fake News Detection, Explainable Verification, and Logic-Aware Claim Reasoning
cs.AI 2026-04 unverdicted novelty 6.0

A multi-agent system for explainable fake news detection that decomposes claims, retrieves evidence, verifies with calibrated confidence, and aggregates logic verdicts, showing better interpretability than BERT/RoBERT...
Beyond RAG for Cyber Threat Intelligence: A Systematic Evaluation of Graph-Based and Agentic Retrieval
cs.AI 2026-04 unverdicted novelty 6.0

A hybrid graph-text retrieval system for cyber threat intelligence improves multi-hop question answering by up to 35% over vector-based RAG on a 3,300-question benchmark.
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
cs.CL 2024-01 unverdicted novelty 6.0

RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.
Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use
cs.CR 2026-05 unverdicted novelty 5.0

A server-side architecture with policy-aware ingestion and ABAC-based retrieval gating prevents cross-tenant data leakage in multitenant enterprise RAG and agent systems.
When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI
cs.CR 2026-05 unverdicted novelty 5.0

A survey providing a taxonomy of TEE platforms, an agent-centric threat model, and open challenges for applying confidential computing to secure agentic AI systems.
Using predefined vector systems to speed up neural network multimillion class classification
cs.LG 2026-04 unverdicted novelty 5.0

Predefined vector systems structure neural network latent spaces to allow O(1) label prediction via index searches on embedding vectors, delivering up to 11.6x speedup on multimillion-class tasks while preserving accu...
ESGLens: An LLM-Based RAG Framework for Interactive ESG Report Analysis and Score Prediction
cs.CL 2026-03 conditional novelty 5.0

ESGLens applies RAG and LLM embeddings to extract GRI-aligned information from ESG reports and achieves 0.48 Pearson correlation when regressing environmental scores on 300 company reports.
When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI
cs.CR 2026-05 unverdicted novelty 4.0

A structured survey of confidential computing for agentic AI that catalogs TEE platforms, agent-specific threats, transferable defenses, and remaining gaps in end-to-end frameworks.
From Knowledge to Action: Outcomes of the 2025 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry
cond-mat.mtrl-sci 2026-05 unverdicted novelty 2.0

Hackathon submissions indicate LLMs are moving from general assistants toward composable multi-agent systems for structuring scientific knowledge and automating tasks in materials science and chemistry.