{"work":{"id":"53d778be-7bb9-4afb-bed9-2bd47f141488","openalex_id":null,"doi":null,"arxiv_id":"1702.08734","raw_key":null,"title":"Billion-scale similarity search with GPUs","authors":null,"authors_text":"J","year":2017,"venue":"cs.CV","abstract":"Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data-parallel tasks, prior approaches are bottlenecked by algorithms that expose less parallelism, such as k-min selection, or make poor use of the memory hierarchy.\n  We propose a design for k-selection that operates at up to 55% of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5x faster than prior GPU state of the art. We apply it in different similarity search scenarios, by proposing optimized design for brute-force, approximate and compressed-domain search based on product quantization. In all these setups, we outperform the state of the art by large margins. Our implementation enables the construction of a high accuracy k-NN graph on 95 million images from the Yfcc100M dataset in 35 minutes, and of a graph connecting 1 billion vectors in less than 12 hours on 4 Maxwell Titan X GPUs. We have open-sourced our approach for the sake of comparison and reproducibility.","external_url":"https://arxiv.org/abs/1702.08734","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-25T17:26:05.303505+00:00","pith_arxiv_id":"1702.08734","created_at":"2026-05-09T06:55:44.467647+00:00","updated_at":"2026-06-05T21:23:00.469572+00:00","title_quality_ok":true,"display_title":"Billion-scale similarity search with GPUs","render_title":"Billion-scale similarity search with GPUs"},"hub":{"state":{"work_id":"53d778be-7bb9-4afb-bed9-2bd47f141488","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":23,"external_cited_by_count":null,"distinct_field_count":10,"first_pith_cited_at":"2019-06-24T17:24:11+00:00","last_pith_cited_at":"2026-05-21T09:06:13+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-09T18:25:32.772767+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"method","n":3},{"context_role":"background","n":2}],"polarity_counts":[{"context_polarity":"use_method","n":3},{"context_polarity":"background","n":1},{"context_polarity":"unclear","n":1}],"runs":{},"summary":{},"graph":{},"authors":[]}}