QuIVer: Rethinking ANN Graph Topology via Training-Free Binary Quantization

Chengcheng Li; Wenxuan Xiao; Zhiyou Wang

arxiv: 2605.02171 · v3 · pith:AMOY4UFTnew · submitted 2026-05-04 · 💻 cs.DB

QuIVer: Rethinking ANN Graph Topology via Training-Free Binary Quantization

Wenxuan Xiao , Zhiyou Wang , Chengcheng Li This is my paper

Pith reviewed 2026-05-21 01:08 UTC · model grok-4.3

classification 💻 cs.DB

keywords approximate nearest neighborbinary quantizationANN graph indexvector retrievalVamanacontrastive embeddingsrecall performancememory efficiency

0 comments

The pith

Binary quantization can define ANN graph topology itself for cosine-native contrastive embeddings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests whether binary quantization can construct the graph topology for approximate nearest neighbor search rather than serving only as a post-hoc distance estimator. It presents QuIVer, a training-free index that carries out Vamana edge selection, diversity pruning, and beam search navigation inside a 2-bit sign-magnitude binary quantization space, touching full-precision vectors only at final reranking. On contrastive-learning embeddings the method reaches at least 88 percent Recall@10 at ef=64 across five datasets while using far less memory than conventional indexes. Evaluation across twelve million-scale collections exposes a sharp boundary: the approach works well for cosine-native data, less so for multimodal CLIP vectors, and poorly for Euclidean or unstructured distributions.

Core claim

QuIVer shows that performing all topology construction and navigation inside a 2-bit sign-magnitude binary quantization metric space produces Recall@10 of 88 percent or higher at ef=64 on cosine-native contrastive-learning embeddings from 384 to 3072 dimensions, 71 to 78 percent on multimodal CLIP data, and under 15 percent on Euclidean-native or structureless distributions, while delivering 2.5 to 5.5 times higher multi-threaded throughput and 4.7 times less hot memory than DiskANN or HNSW at matched recall with no codebook training required.

What carries the argument

Vamana edge selection, diversity pruning, and beam-search navigation executed entirely inside the 2-bit sign-magnitude binary quantization metric space, with float32 vectors accessed only for final reranking.

If this is right

On compatible contrastive-learning workloads the BQ-native hot path uses less than 1.3 GB for one million vectors.
The method yields 2.5 to 5.5 times higher multi-threaded throughput than DiskANN Rust and HNSW variants at matched recall.
Results indicate an empirical impossible triangle among aggressive compression, high throughput, and universal data compatibility.
No codebook or rotation training is needed, unlike PQ, OPQ, or RaBitQ approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Systems could inspect embedding properties such as cosine versus Euclidean affinity to decide whether to apply BQ-native topology.
Hybrid indexes that switch quantization fidelity based on data distribution might extend the operating range beyond the observed boundary.
Additional tests on new embedding families would refine the concrete criteria for safe use of compressed navigation.

Load-bearing premise

The twelve million-scale datasets and the tested contrastive-learning and CLIP distributions represent the workloads where binary-quantized topology would be deployed in practice.

What would settle it

Observation of Recall@10 above 80 percent at ef=64 on a Euclidean-native dataset or below 50 percent on a contrastive-learning embedding dataset under identical BQ-native construction would falsify the reported applicability boundary.

Figures

Figures reproduced from arXiv: 2605.02171 by Chengcheng Li, Wenxuan Xiao, Zhiyou Wang.

**Figure 1.** Figure 1: QuIVer system overview. Blue-shaded components view at source ↗

**Figure 1.** Figure 1: QuIVer system overview. Blue-shaded components [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Recall@10 vs. QPS Pareto curves on MiniLM-1M, view at source ↗

**Figure 2.** Figure 2: Recall@10 vs. MT-QPS Pareto curves on Cohere [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Recall@10 at ef=64 across nine datasets, illustrating view at source ↗

**Figure 3.** Figure 3: Recall@10 at ef=64 across twelve datasets, illustrat [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Empirical misranking rate vs. theoretical upper [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

read the original abstract

Approximate nearest neighbor (ANN) graph indices such as HNSW and Vamana construct their edge topology in full-precision or high-fidelity quantized metric spaces, relegating binary quantization (BQ) to a post-hoc distance estimator during search. This paper asks a different question: Can binary quantization define the graph topology itself -- and if so, under what conditions? We study this question through QuIVer (Quantized Index for Vector Retrieval), a training-free ANN graph index that performs Vamana edge selection, diversity pruning, and beam-search navigation entirely within a 2-bit Sign-Magnitude BQ metric space, accessing float32 vectors only for final reranking. Systematic evaluation on twelve million-scale datasets reveals a sharp applicability boundary: BQ-native topology is highly effective on cosine-native contrastive-learning embeddings (>=88% Recall@10 at ef=64 across five datasets, 384--3072 dimensions), moderately effective on multimodal CLIP data (71--78%), and empirically unsuitable for Euclidean-native or structureless distributions (<15%). Our results suggest an empirical "impossible triangle" between aggressive compression, high throughput, and universal data compatibility. The central contribution is not merely the system, but the boundary it reveals: falsifiable criteria for when industrial vector search systems can safely trade metric fidelity for compact BQ-native navigation. On compatible workloads, the system benefits are substantial: QuIVer's BQ-native hot path (<1.3 GB for 1M vectors) yields 2.5--5.5x higher multi-threaded throughput than DiskANN Rust and HNSW variants at matched recall, with 4.7x less hot memory and no codebook or rotation training (unlike PQ/OPQ/RaBitQ).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

QuIVer shows binary quantization can define Vamana graph topology itself and delivers strong recall and speed on cosine contrastive embeddings while failing on Euclidean data.

read the letter

The main thing to know is that this paper flips binary quantization from a search-time trick into the actual space where the graph edges get chosen and pruned. QuIVer runs Vamana-style selection and beam search inside a 2-bit sign-magnitude representation and only touches float32 vectors at the end for reranking. On five contrastive-learning cosine datasets it reaches at least 88% Recall@10 at ef=64, and the multi-threaded throughput is 2.5-5.5x higher than DiskANN Rust or HNSW at matched recall with roughly 4.7x less hot memory and no training or codebook steps. On CLIP multimodal data the recall sits in the low 70s; on Euclidean-native or structureless sets it collapses below 15%. That split is the clearest new signal they report across the twelve million-scale collections they tested.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces QuIVer, a training-free ANN graph index that performs Vamana edge selection, diversity pruning, and beam-search navigation entirely in a 2-bit Sign-Magnitude binary quantization (BQ) metric space, accessing float32 vectors only for final reranking. Systematic evaluation across twelve million-scale datasets identifies a sharp applicability boundary: BQ-native topology yields >=88% Recall@10 at ef=64 on cosine-native contrastive-learning embeddings (five datasets, 384-3072 dimensions), 71-78% on multimodal CLIP data, and <15% on Euclidean-native or structureless distributions. The work reports 2.5-5.5x higher multi-threaded throughput than DiskANN Rust and HNSW variants at matched recall on compatible workloads, with 4.7x less hot memory and no codebook or rotation training.

Significance. If the empirical applicability boundary holds and generalizes, the paper provides actionable, falsifiable criteria for when industrial vector search systems can safely adopt aggressive BQ-native navigation, trading metric fidelity for compactness and throughput. The training-free design and concrete gains on contrastive embeddings constitute a practical contribution to ANN index design, particularly for high-dimensional cosine-native workloads.

major comments (2)

[Experimental Evaluation] Experimental Evaluation section (around the twelve-dataset protocol): The criteria used to label datasets as 'cosine-native contrastive-learning embeddings' versus 'Euclidean-native or structureless distributions' are not explicitly defined (e.g., via correlation between BQ distortion and angular/Euclidean metrics, or training-objective metadata). This directly affects the load-bearing claim of a sharp applicability boundary and the 'impossible triangle' conclusion, as confounding factors such as normalization or dimensionality could explain the observed performance gap.
[§4.3] §4.3, Recall/Throughput Results: The reported >=88% Recall@10 figures lack error bars, multiple random seeds, or exact data-exclusion rules, which weakens confidence in the cross-dataset consistency of the BQ-native topology advantage.

minor comments (2)

[Abstract] The abstract and introduction would benefit from a brief explicit statement of the Sign-Magnitude BQ encoding scheme (e.g., how the 2 bits are allocated to sign and magnitude).
[Table 2] Table 2 (or equivalent throughput table): Clarify whether the 2.5-5.5x gains are measured at identical recall targets or at the same ef parameter.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment point by point below, providing clarifications and committing to revisions where they strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: [Experimental Evaluation] Experimental Evaluation section (around the twelve-dataset protocol): The criteria used to label datasets as 'cosine-native contrastive-learning embeddings' versus 'Euclidean-native or structureless distributions' are not explicitly defined (e.g., via correlation between BQ distortion and angular/Euclidean metrics, or training-objective metadata). This directly affects the load-bearing claim of a sharp applicability boundary and the 'impossible triangle' conclusion, as confounding factors such as normalization or dimensionality could explain the observed performance gap.

Authors: We agree that explicit criteria will strengthen the presentation. Classification was performed according to the documented training objectives and preprocessing of each embedding: the five contrastive-learning datasets use objectives that optimize for cosine similarity on L2-normalized vectors (e.g., SimCLR-style or equivalent contrastive losses), the multimodal CLIP data follows the same cosine-native pattern but with different training data, while Euclidean-native datasets are drawn from sources whose original papers specify Euclidean distance and lack normalization, and structureless distributions are synthetic uniform or Gaussian samples. In the revision we will add a dedicated paragraph plus a summary table in the Experimental Evaluation section that lists each dataset, its source training objective, normalization status, and the resulting label. This will make the applicability boundary falsifiable and help address potential confounding factors. revision: yes
Referee: [§4.3] §4.3, Recall/Throughput Results: The reported >=88% Recall@10 figures lack error bars, multiple random seeds, or exact data-exclusion rules, which weakens confidence in the cross-dataset consistency of the BQ-native topology advantage.

Authors: We acknowledge the value of statistical reporting. The original experiments used a single fixed seed per dataset for reproducibility and computational efficiency on million-scale data. We will rerun the key Recall@10 measurements on the five contrastive-learning datasets using three independent random seeds, report means with standard deviations as error bars, and add an explicit statement of the data-exclusion rule (vectors with zero norm were removed prior to indexing as they produce undefined cosine distances). These additions will be incorporated into the revised §4.3 and associated figures. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claims rest on direct measurements

full rationale

The paper constructs QuIVer and evaluates it empirically across twelve million-scale datasets, reporting recall@10, throughput, and an applicability boundary between cosine-native contrastive embeddings, CLIP data, and Euclidean/structureless distributions. No equations, fitted parameters, or derivations are presented that reduce any prediction or boundary to the inputs by construction. The central claims are falsifiable via the stated experimental protocol and do not rely on self-citation chains, uniqueness theorems, or ansatzes imported from prior work. This is the normal case of a self-contained empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work introduces no new free parameters, axioms beyond standard ANN assumptions, or invented entities; the central contribution is an empirical demonstration rather than a theoretical derivation.

axioms (1)

domain assumption Vamana edge selection, diversity pruning, and beam search remain effective when performed entirely inside a 2-bit Sign-Magnitude BQ metric space on cosine-native data.
This premise is invoked to justify performing all topology construction in the quantized space.

pith-pipeline@v0.9.0 · 5853 in / 1322 out tokens · 46535 ms · 2026-05-21T01:08:16.139730+00:00 · methodology

Review history (3 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

BQ-native topology is highly effective on cosine-native contrastive-learning embeddings (>=88% Recall@10 at ef=64 across five datasets, 384--3072 dimensions), moderately effective on multimodal CLIP data (71--78%), and empirically unsuitable for Euclidean-native or structureless distributions (<15%).
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The resulting 2-bit binary signatures... preserve enough directional structure to support graph navigability—the property that greedy traversal can find monotonically improving paths to the target neighborhood.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 3 internal anchors

[1]

Lewis, E

P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, S. Riedel, and D. Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. InNeurIPS, pages 9459–9474, 2020

work page 2020
[2]

Wang and P

T. Wang and P. Isola. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. InICML, pages 9929–9939, 2020

work page 2020
[3]

Representation Learning with Contrastive Predictive Coding

A. van den Oord, Y. Li, and O. Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[4]

Y. A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs.IEEE TPAMI, 42(4):824–836, 2020

work page 2020
[5]

S. J. Subramanya et al. DiskANN: Fast accurate billion-point nearest neighbor search on a single node. InNeurIPS, 2019

work page 2019
[6]

Kleinberg

J. Kleinberg. The small-world phenomenon: An algorithmic perspective. InSTOC, pages 163–170, 2000

work page 2000
[7]

Jégou, M

H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search.IEEE TPAMI, 33(1):117–128, 2011

work page 2011
[8]

T. Ge, K. He, Q. Ke, and J. Sun. Optimized product quantization for approximate nearest neighbor search. InCVPR, pages 2946–2953, 2013

work page 2013
[9]

R. Guo, P. Sun, E. Lindgren, Q. Geng, D. Simcha, F. Chern, and S. Kumar. Acceler- ating large-scale inference with anisotropic vector quantization. InICML, pages 3887–3896, 2020

work page 2020
[10]

M. S. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, pages 380–388, 2002

work page 2002
[11]

Indyk and R

P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. InSTOC, pages 604–613, 1998

work page 1998
[12]

Gao and C

J. Gao and C. Long. RaBitQ: Quantizing high-dimensional vectors with a theo- retical error bound for approximate nearest neighbor search. InSIGMOD, 2024

work page 2024
[13]

Betser, E

R. Betser, E. Gofer, M. Y. Levi, and G. Gilboa. InfoNCE induces Gaussian distribu- tion. InICLR, 2026

work page 2026
[14]

M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming.Jour- nal of the ACM, 42(6):1115–1145, 1995

work page 1995
[15]

Datar, N

M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. InSoCG, pages 253–262, 2004

work page 2004
[16]

C. Fu, C. Xiang, C. Wang, and D. Cai. Fast approximate nearest neighbor search with the navigating spreading-out graph. InProc. VLDB Endowment, 12(5):461– 474, 2019

work page 2019
[17]

Zhu and C

M. Zhu and C. Zhang. Understanding and generalizing monotonic proximity graphs for approximate nearest neighbor search.arXiv preprint arXiv:2101.12631, 2021

work page arXiv 2021
[18]

S. N. Bernstein. On a modification of Chebyshev’s inequality and of the error formula of Laplace.Annals of Science of the Ukrainian National Academy of Sciences, 1:38–49, 1924

work page 1924
[19]

Boucheron, G

S. Boucheron, G. Lugosi, and P. Massart.Concentration Inequalities: A Nonasymp- totic Theory of Independence. Oxford University Press, 2013

work page 2013
[20]

Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science

R. Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press, 2018

work page 2018
[21]

J. Chen, S. Xiao, P. Zhang, K. Luo, D. Lian, and Z. Liu. BGE M3-Embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation.arXiv preprint arXiv:2402.03216, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[22]

Wolt product embeddings dataset

Wolt Engineering. Wolt product embeddings dataset. https://huggingface.co/ datasets/Wolt/CLIP-ViT-B-32-Product-images-v1, 2024

work page 2024
[23]

Desai, G

K. Desai, G. Kaul, Z. Aysola, and J. Johnson. RedCaps: Web-curated image-text data created by the people, for the people. InNeurIPS Datasets and Benchmarks, 2021

work page 2021
[24]

Aumüller, E

M. Aumüller, E. Bernhardsson, and A. Faithfull. ANN-Benchmarks: A bench- marking tool for approximate nearest neighbor algorithms.Information Systems, 87:101374, 2020

work page 2020
[25]

Aumüller and M

M. Aumüller and M. Ceccarello. Recent approaches and trends in approximate nearest neighbor search, with remarks on benchmarking.Data Engineering, pages 27–38, 2023

work page 2023
[26]

W. Li, Y. Zhang, Y. Sun, W. Wang, M. Li, W. Zhang, and X. Lin. Approximate nearest neighbor search on high dimensional data—experiments, analyses, and improvement.IEEE TKDE, 32(8):1475–1488, 2020

work page 2020
[27]

Johnson, M

J. Johnson, M. Douze, and H. Jégou. Billion-scale similarity search with GPUs. IEEE Trans. Big Data, 7(3):535–547, 2021

work page 2021
[28]

Vardanian

A. Vardanian. USearch: Smaller & faster single-file similarity search engine for vectors & strings, 2024. https://github.com/unum-cloud/usearch

work page 2024
[29]

D. R. Karger and M. Ruhl. Finding nearest neighbors in growth-restricted metrics. InSTOC, pages 741–750, 2002

work page 2002
[30]

V. W. Liang, Y. Zhang, Y. Kwon, S. Yeung, and J. Y. Zou. Mind the gap: Under- standing the modality gap in multi-modal contrastive representation learning. InNeurIPS, pages 17612–17625, 2022

work page 2022
[31]

M. Wang, X. Xu, Q. Yue, and Y. Wang. A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search.Proc. VLDB Endow., 14(11):1964–1978, 2021

work page 1964
[32]

Xiang, J

L. Xiang, J. Feng, Z. Yin, Z. Li, D. Xue, H. Qin, R. Li, and G. Wang. 𝛿-EMG: A monotonic graph index for approximate nearest neighbor search.arXiv preprint arXiv:2511.16921, 2025

work page arXiv 2025
[33]

Zhong, H

X. Zhong, H. Li, J. Jin, M. Yang, D. Chu, X. Wang, Z. Shen, W. Jia, G. Gu, Y. Xie, X. Lin, H. T. Shen, J. Song, and P. Cheng. VSAG: An optimized search frame- work for graph-based approximate nearest neighbor search.Proc. VLDB Endow., 18(12):5017–5030, 2025

work page 2025
[34]

Matsui, Y

Y. Matsui, Y. Uchida, H. Jégou, and S. Satoh. A survey of product quantization. ITE Trans. on Media Technology and Applications, 6(1):2–10, 2018

work page 2018
[35]

André, A.-M

F. André, A.-M. Kermarrec, and N. Le Scouarnec. Cache locality is not enough: High-performance nearest neighbor search with product quantization fast scan. Proc. VLDB Endow., 9(4):288–299, 2015

work page 2015
[36]

André, A.-M

F. André, A.-M. Kermarrec, and N. Le Scouarnec. Accelerated nearest neighbor search with quick ADC. InICMR, pages 159–166, 2017

work page 2017
[37]

Aguerrebere, I

C. Aguerrebere, I. S. Bhati, M. Hildebrand, M. Tepper, and T. L. Willke. Similar- ity search in the blink of an eye with compressed indices.Proc. VLDB Endow., 16(11):3433–3446, 2023

work page 2023
[38]

B. Trent. Better Binary Quantization (BBQ) in Lucene and Elasticsearch. Elas- tic Search Labs Blog, November 2024. https://www.elastic.co/search-labs/blog/ better-binary-quantization-lucene-elasticsearch

work page 2024
[39]

Vector quantization

OpenSearch Project. Vector quantization. OpenSearch Documentation, 2024. https://docs.opensearch.org/latest/vector-search/optimizing-storage/knn- vector-quantization/

work page 2024
[40]

Ailon and B

N. Ailon and B. Chazelle. The fast Johnson–Lindenstrauss transform and approx- imate nearest neighbors.SIAM J. Computing, 39(1):302–322, 2009

work page 2009
[41]

Z. Gong, Y. Zeng, and L. Chen. Accelerating approximate nearest neighbor search in hierarchical graphs: Efficient level navigation with shortcuts.Proc. VLDB Endow., 18(10), 2025

work page 2025
[42]

M. Wang, X. Xu, and Y. Wang. Accelerating graph indexing for ANNS on modern CPUs. InSIGMOD, 2025

work page 2025
[43]

W. Dong, M. Kandpal, and S. Mussmann. PiPNN: Ultra-scalable graph-based nearest neighbor indexing.arXiv preprint arXiv:2602.21247, 2026

work page internal anchor Pith review arXiv 2026
[44]

K. Lu, C. Gao, and Y. Cong. Approximate nearest neighbor search for modern AI: A projection-augmented graph approach.arXiv preprint arXiv:2603.00497, 2026

work page arXiv 2026

[1] [1]

Lewis, E

P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, S. Riedel, and D. Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. InNeurIPS, pages 9459–9474, 2020

work page 2020

[2] [2]

Wang and P

T. Wang and P. Isola. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. InICML, pages 9929–9939, 2020

work page 2020

[3] [3]

Representation Learning with Contrastive Predictive Coding

A. van den Oord, Y. Li, and O. Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[4] [4]

Y. A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs.IEEE TPAMI, 42(4):824–836, 2020

work page 2020

[5] [5]

S. J. Subramanya et al. DiskANN: Fast accurate billion-point nearest neighbor search on a single node. InNeurIPS, 2019

work page 2019

[6] [6]

Kleinberg

J. Kleinberg. The small-world phenomenon: An algorithmic perspective. InSTOC, pages 163–170, 2000

work page 2000

[7] [7]

Jégou, M

H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search.IEEE TPAMI, 33(1):117–128, 2011

work page 2011

[8] [8]

T. Ge, K. He, Q. Ke, and J. Sun. Optimized product quantization for approximate nearest neighbor search. InCVPR, pages 2946–2953, 2013

work page 2013

[9] [9]

R. Guo, P. Sun, E. Lindgren, Q. Geng, D. Simcha, F. Chern, and S. Kumar. Acceler- ating large-scale inference with anisotropic vector quantization. InICML, pages 3887–3896, 2020

work page 2020

[10] [10]

M. S. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, pages 380–388, 2002

work page 2002

[11] [11]

Indyk and R

P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. InSTOC, pages 604–613, 1998

work page 1998

[12] [12]

Gao and C

J. Gao and C. Long. RaBitQ: Quantizing high-dimensional vectors with a theo- retical error bound for approximate nearest neighbor search. InSIGMOD, 2024

work page 2024

[13] [13]

Betser, E

R. Betser, E. Gofer, M. Y. Levi, and G. Gilboa. InfoNCE induces Gaussian distribu- tion. InICLR, 2026

work page 2026

[14] [14]

M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming.Jour- nal of the ACM, 42(6):1115–1145, 1995

work page 1995

[15] [15]

Datar, N

M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. InSoCG, pages 253–262, 2004

work page 2004

[16] [16]

C. Fu, C. Xiang, C. Wang, and D. Cai. Fast approximate nearest neighbor search with the navigating spreading-out graph. InProc. VLDB Endowment, 12(5):461– 474, 2019

work page 2019

[17] [17]

Zhu and C

M. Zhu and C. Zhang. Understanding and generalizing monotonic proximity graphs for approximate nearest neighbor search.arXiv preprint arXiv:2101.12631, 2021

work page arXiv 2021

[18] [18]

S. N. Bernstein. On a modification of Chebyshev’s inequality and of the error formula of Laplace.Annals of Science of the Ukrainian National Academy of Sciences, 1:38–49, 1924

work page 1924

[19] [19]

Boucheron, G

S. Boucheron, G. Lugosi, and P. Massart.Concentration Inequalities: A Nonasymp- totic Theory of Independence. Oxford University Press, 2013

work page 2013

[20] [20]

Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science

R. Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press, 2018

work page 2018

[21] [21]

J. Chen, S. Xiao, P. Zhang, K. Luo, D. Lian, and Z. Liu. BGE M3-Embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation.arXiv preprint arXiv:2402.03216, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[22] [22]

Wolt product embeddings dataset

Wolt Engineering. Wolt product embeddings dataset. https://huggingface.co/ datasets/Wolt/CLIP-ViT-B-32-Product-images-v1, 2024

work page 2024

[23] [23]

Desai, G

K. Desai, G. Kaul, Z. Aysola, and J. Johnson. RedCaps: Web-curated image-text data created by the people, for the people. InNeurIPS Datasets and Benchmarks, 2021

work page 2021

[24] [24]

Aumüller, E

M. Aumüller, E. Bernhardsson, and A. Faithfull. ANN-Benchmarks: A bench- marking tool for approximate nearest neighbor algorithms.Information Systems, 87:101374, 2020

work page 2020

[25] [25]

Aumüller and M

M. Aumüller and M. Ceccarello. Recent approaches and trends in approximate nearest neighbor search, with remarks on benchmarking.Data Engineering, pages 27–38, 2023

work page 2023

[26] [26]

W. Li, Y. Zhang, Y. Sun, W. Wang, M. Li, W. Zhang, and X. Lin. Approximate nearest neighbor search on high dimensional data—experiments, analyses, and improvement.IEEE TKDE, 32(8):1475–1488, 2020

work page 2020

[27] [27]

Johnson, M

J. Johnson, M. Douze, and H. Jégou. Billion-scale similarity search with GPUs. IEEE Trans. Big Data, 7(3):535–547, 2021

work page 2021

[28] [28]

Vardanian

A. Vardanian. USearch: Smaller & faster single-file similarity search engine for vectors & strings, 2024. https://github.com/unum-cloud/usearch

work page 2024

[29] [29]

D. R. Karger and M. Ruhl. Finding nearest neighbors in growth-restricted metrics. InSTOC, pages 741–750, 2002

work page 2002

[30] [30]

V. W. Liang, Y. Zhang, Y. Kwon, S. Yeung, and J. Y. Zou. Mind the gap: Under- standing the modality gap in multi-modal contrastive representation learning. InNeurIPS, pages 17612–17625, 2022

work page 2022

[31] [31]

M. Wang, X. Xu, Q. Yue, and Y. Wang. A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search.Proc. VLDB Endow., 14(11):1964–1978, 2021

work page 1964

[32] [32]

Xiang, J

L. Xiang, J. Feng, Z. Yin, Z. Li, D. Xue, H. Qin, R. Li, and G. Wang. 𝛿-EMG: A monotonic graph index for approximate nearest neighbor search.arXiv preprint arXiv:2511.16921, 2025

work page arXiv 2025

[33] [33]

Zhong, H

X. Zhong, H. Li, J. Jin, M. Yang, D. Chu, X. Wang, Z. Shen, W. Jia, G. Gu, Y. Xie, X. Lin, H. T. Shen, J. Song, and P. Cheng. VSAG: An optimized search frame- work for graph-based approximate nearest neighbor search.Proc. VLDB Endow., 18(12):5017–5030, 2025

work page 2025

[34] [34]

Matsui, Y

Y. Matsui, Y. Uchida, H. Jégou, and S. Satoh. A survey of product quantization. ITE Trans. on Media Technology and Applications, 6(1):2–10, 2018

work page 2018

[35] [35]

André, A.-M

F. André, A.-M. Kermarrec, and N. Le Scouarnec. Cache locality is not enough: High-performance nearest neighbor search with product quantization fast scan. Proc. VLDB Endow., 9(4):288–299, 2015

work page 2015

[36] [36]

André, A.-M

F. André, A.-M. Kermarrec, and N. Le Scouarnec. Accelerated nearest neighbor search with quick ADC. InICMR, pages 159–166, 2017

work page 2017

[37] [37]

Aguerrebere, I

C. Aguerrebere, I. S. Bhati, M. Hildebrand, M. Tepper, and T. L. Willke. Similar- ity search in the blink of an eye with compressed indices.Proc. VLDB Endow., 16(11):3433–3446, 2023

work page 2023

[38] [38]

B. Trent. Better Binary Quantization (BBQ) in Lucene and Elasticsearch. Elas- tic Search Labs Blog, November 2024. https://www.elastic.co/search-labs/blog/ better-binary-quantization-lucene-elasticsearch

work page 2024

[39] [39]

Vector quantization

OpenSearch Project. Vector quantization. OpenSearch Documentation, 2024. https://docs.opensearch.org/latest/vector-search/optimizing-storage/knn- vector-quantization/

work page 2024

[40] [40]

Ailon and B

N. Ailon and B. Chazelle. The fast Johnson–Lindenstrauss transform and approx- imate nearest neighbors.SIAM J. Computing, 39(1):302–322, 2009

work page 2009

[41] [41]

Z. Gong, Y. Zeng, and L. Chen. Accelerating approximate nearest neighbor search in hierarchical graphs: Efficient level navigation with shortcuts.Proc. VLDB Endow., 18(10), 2025

work page 2025

[42] [42]

M. Wang, X. Xu, and Y. Wang. Accelerating graph indexing for ANNS on modern CPUs. InSIGMOD, 2025

work page 2025

[43] [43]

W. Dong, M. Kandpal, and S. Mussmann. PiPNN: Ultra-scalable graph-based nearest neighbor indexing.arXiv preprint arXiv:2602.21247, 2026

work page internal anchor Pith review arXiv 2026

[44] [44]

K. Lu, C. Gao, and Y. Cong. Approximate nearest neighbor search for modern AI: A projection-augmented graph approach.arXiv preprint arXiv:2603.00497, 2026

work page arXiv 2026