IVF-TQ: Calibration-Free Streaming Vector Search via a Codebook-Free Residual Layer

Tarun Sharma

arxiv: 2605.17415 · v2 · pith:WXYHVGU7new · submitted 2026-05-17 · 💻 cs.LG · cs.AI· cs.DB· cs.IR

IVF-TQ: Calibration-Free Streaming Vector Search via a Codebook-Free Residual Layer

Tarun Sharma This is my paper

Pith reviewed 2026-05-25 05:50 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.DBcs.IR

keywords streaming vector searchapproximate nearest neighborproduct quantizationinverted file indexresidual compressioncodebook-free quantizationrecall stabilitycalibration-free index

0 comments

The pith

IVF-TQ replaces learned codebooks with a fixed random rotation and precomputed scalar quantizer to keep recall stable during streaming ingestion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard PQ, OPQ and ScaNN indexes fit a codebook to an initial sample and reuse it as the corpus grows, producing recall loss even under shuffled i.i.d. streaming. IVF-TQ keeps only the coarse IVF partition trainable while the residual layer is data-independent: a fixed random rotation followed by a Lloyd-Max scalar quantizer whose parameters depend solely on bit width b and dimension d. A uniform-over-sphere inner-product error bound that depends only on (b, d, delta) supplies a guarantee unavailable to any learned-codebook method. Across nine controlled cells on three 10 M datasets, IVF-TQ stays within [-0.80, +0.56] pp at one fixed (b, d) setting while PQ variants either degrade or require per-dataset bit-budget retuning.

Core claim

IVF-TQ is an inverted-file index whose residual compression layer is data-independent: a fixed random rotation followed by a precomputed Lloyd-Max scalar quantizer parameterised only by the bit width b and dimension d. Only the IVF coarse k-means partition is trained. The same design produces an IVF-amplification effect that closes the gap to Extended RaBitQ to within statistical noise and supports an Adaptive variant that refreshes the partition without touching the compression layer. The uniform-over-sphere inner-product error bound depending only on (b, d, delta) supplies the structural guarantee.

What carries the argument

Codebook-free residual layer consisting of a fixed random rotation followed by a precomputed Lloyd-Max scalar quantizer parameterized only by bit width b and dimension d.

If this is right

Per-batch PQ codebook retraining never recovers the streaming recall gap.
IVF-PQ streaming stability requires per-dataset bit-budget tuning.
IVF-TQ maintains performance at one fixed (b, d) configuration across all three datasets.
An Adaptive variant refreshes the IVF partition without retraining the compression layer.
The same design produces an amplification effect that closes the gap to Extended RaBitQ to within statistical noise.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Production pipelines could avoid periodic full retraining cycles if the residual layer never needs to be updated.
The separation of partition training from residual compression may simplify hybrid indexes that combine multiple coarse quantizers.
If the bound holds across modalities, the same fixed rotation-plus-scalar approach could be tested on non-Euclidean distances.
Reduced per-dataset hyper-parameter search lowers the operational cost of deploying the index on new corpora.

Load-bearing premise

The uniform-over-sphere inner-product error bound that depends only on bit width, dimension and delta supplies a structural guarantee that holds for real data distributions and explains the observed streaming stability.

What would settle it

A controlled streaming run on any of the three 10 M datasets in which IVF-TQ recall drops more than one percentage point over the ingestion period while the (b, d) configuration remains fixed.

Figures

Figures reproduced from arXiv: 2605.17415 by Tarun Sharma.

**Figure 1.** Figure 1: IVF-TQ architecture. The coarse 𝑘-means partition (blue) is the only trained component. The residual compression layer (orange) is data-independent: a fixed random rotation Π, a precomputed Lloyd–Max scalar quantizer parameterised only by (𝑏, 𝑑), and a sign-bit refinement of one extra bit per coordinate. At search time, asymmetric inner-product is computed in the rotated frame; the coarse term ⟨𝑞, 𝑐ℓ⟩ is e… view at source ↗

**Figure 2.** Figure 2: Streaming ingestion on SIFT-1M (200K trained, then 8 batches of 100K). IVF-TQ recall [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 2.** Figure 2: Streaming ingestion on SIFT-1M (200K trained, then 8 batches of 100K). IVF-TQ recall [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Recall@10 vs. cumulative codebook re-training cost (SIFT-1M streaming). IVF-TQ re [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 3.** Figure 3: Recall@10 vs. cumulative codebook re-training cost (SIFT-1M streaming). IVF-TQ requires zero re-training (single [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Approximate nearest neighbor (ANN) indexes deployed against streaming corpora silently lose recall over weeks. The standard diagnosis is distribution shift, but under shuffled-i.i.d. ingestion -- no shift at all -- product quantization still degrades -3.8pp at sub-matched bit budgets. The dominant production compression methods (PQ, OPQ, ScaNN) all fit a codebook to an initial sample and reuse it as the database grows by orders of magnitude. This paper presents IVF-TQ, an inverted-file index whose residual compression layer is data-independent: a fixed random rotation followed by a precomputed Lloyd-Max scalar quantizer parameterised only by the bit width b and dimension d. Only the IVF coarse k-means partition is trained. A uniform-over-sphere inner-product error bound depending only on (b, d, delta) provides a structural guarantee no learned-codebook method admits. The same codebook-free design enables an IVF-amplification effect that closes the gap to Extended RaBitQ to within statistical noise (+17.7pp over flat TQ at matched bit budget), and an Adaptive variant that refreshes the partition without touching the compression layer. Across nine controlled cells (three 10M datasets, three PQ memory regimes, three seeds), per-batch PQ codebook retraining never recovers the streaming gap; IVF-PQ streaming stability requires per-dataset bit-budget tuning, while IVF-TQ holds at one fixed (b, d) configuration on all three datasets with Delta in [-0.80, +0.56]pp. The contribution is operational: no codebook to train, no per-dataset bit-budget tuning, no retraining cycle that ever closes the gap.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a codebook-free residual quantizer for IVF that holds recall steady under streaming ingestion at fixed parameters, but the data-independent bound looks unlikely to explain why on real embeddings.

read the letter

The main takeaway is that IVF-TQ replaces the learned residual codebook with a fixed random rotation plus a precomputed Lloyd-Max scalar quantizer. Only the coarse IVF partition is trained on data. The authors report that this stays stable across three 10M datasets and three bit-budget regimes while standard PQ drops several points even under shuffled i.i.d. ingestion, and that per-batch PQ retraining does not close the gap. They also show an amplification trick that brings performance close to Extended RaBitQ at matched budget and an adaptive variant that refreshes only the partition. That construction and the empirical pattern across nine cells are the concrete new pieces. The operational claim—no per-dataset tuning, no retraining cycle—is useful for anyone running production vector stores that grow continuously. The experiments appear controlled and the deltas are small and consistent, which is worth noting. The soft spot is the uniform-over-sphere inner-product bound. It is presented as depending only on b, d, and delta, yet real embedding distributions are not uniform on the sphere, so the bound supplies no automatic guarantee that the observed stability will hold or that it is caused by the bound rather than the rotation or the IVF structure itself. The abstract states the bound but gives no derivation or check against the actual data, and the experimental protocol details are thin. Without those, the stability result remains an empirical observation that still needs per-dataset verification in practice. This paper is for engineers who maintain large, growing ANN indexes and want to reduce retraining overhead. A reader focused on production vector search will get a workable alternative construction and a clear comparison to PQ. It is coherent enough and tackles a real deployment issue, so it deserves a serious referee even though the theoretical part will need more work.

Referee Report

2 major / 0 minor

Summary. The paper introduces IVF-TQ, an inverted-file ANN index whose residual compression layer is data-independent: a fixed random rotation followed by a precomputed Lloyd-Max scalar quantizer parameterized solely by bit width b and dimension d (only the IVF coarse partition is trained). It claims that a uniform-over-sphere inner-product error bound depending only on (b, d, delta) supplies a structural guarantee of streaming stability without per-dataset tuning or codebook retraining. Under shuffled i.i.d. ingestion, PQ degrades while IVF-TQ remains stable (Delta in [-0.80, +0.56]pp) at one fixed (b, d) configuration across three 10M datasets; per-batch PQ retraining never closes the gap, and IVF-PQ requires per-dataset bit-budget tuning. The design also enables an IVF-amplification effect closing the gap to Extended RaBitQ.

Significance. If the bound is valid for real embedding distributions and directly accounts for the observed stability, the result would be significant for production streaming vector search by eliminating codebook training cycles and per-dataset hyperparameter search. Credit is due for the controlled experimental design across nine cells (three datasets, three PQ memory regimes, three seeds) that isolates the effect of codebook retraining under i.i.d. shuffling.

major comments (2)

[Abstract] Abstract: the central claim that the uniform-over-sphere inner-product error bound depends only on (b, d, delta) and supplies a structural guarantee is load-bearing, yet no derivation of the bound is supplied and no reduction to a data-dependent quantity is shown; without this, it is impossible to verify whether the bound remains independent of the target data distribution or merely restates an assumption of uniformity on the sphere.
[Abstract] Abstract: the reported Delta range [-0.80, +0.56]pp across the nine cells is presented as evidence that IVF-TQ holds at fixed (b, d) while PQ does not, but the abstract gives no error-bar details, no full experimental protocol, and no indication of how the bound was checked against the actual embedding distributions of the three 10M datasets; this leaves the attribution of stability to the bound versus the random rotation or IVF partition uninspectable.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for acknowledging the controlled experimental design. We respond point-by-point to the major comments below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the uniform-over-sphere inner-product error bound depends only on (b, d, delta) and supplies a structural guarantee is load-bearing, yet no derivation of the bound is supplied and no reduction to a data-dependent quantity is shown; without this, it is impossible to verify whether the bound remains independent of the target data distribution or merely restates an assumption of uniformity on the sphere.

Authors: The derivation appears in Section 3: we begin with the inner-product distortion after a fixed random rotation, apply the Lloyd-Max scalar quantizer, and invoke spherical concentration to obtain an explicit upper bound on the error that is a function exclusively of b, d, and delta. The bound is deliberately not reduced to a data-dependent form; its purpose is to supply a guarantee that holds whenever residuals are approximately uniform on the sphere (a modeling assumption justified by the random rotation and validated empirically for the three datasets in Appendix C). We will revise the abstract to cite Section 3 explicitly. revision: partial
Referee: [Abstract] Abstract: the reported Delta range [-0.80, +0.56]pp across the nine cells is presented as evidence that IVF-TQ holds at fixed (b, d) while PQ does not, but the abstract gives no error-bar details, no full experimental protocol, and no indication of how the bound was checked against the actual embedding distributions of the three 10M datasets; this leaves the attribution of stability to the bound versus the random rotation or IVF partition uninspectable.

Authors: Section 4.1 details the nine-cell protocol (three 10 M datasets, three memory regimes, three random seeds) with per-seed standard deviations reported in Table 2. Section 5.3 and Figure 6 directly compare the theoretical bound against observed recall degradation on each dataset's embeddings, confirming the bound is tight and that stability is attributable to the fixed quantizer (IVF-PQ with identical rotation and partition exhibits degradation). We will condense the protocol and bound-verification statement into the abstract. revision: yes

Circularity Check

0 steps flagged

No circularity: bound and residual layer are parameterized independently of target data

full rationale

The paper's derivation chain presents the residual compression layer as a fixed random rotation plus a precomputed Lloyd-Max scalar quantizer whose parameters are stated to depend only on bit width b and dimension d, with the uniform-over-sphere inner-product error bound likewise depending only on (b, d, delta). No equation or claim reduces this bound or the stability prediction to a quantity fitted from the three 10M datasets; the IVF partition is the sole data-dependent component. No self-citations are used to justify the bound or uniqueness, and no 'prediction' is shown to be statistically forced by a prior fit. The empirical Delta values are reported as observations, not as the source of the bound. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central guarantee rests on the existence of a uniform-over-sphere inner-product error bound that is independent of any data distribution; b and d are method parameters chosen by memory budget rather than fitted values. No new entities are postulated.

free parameters (2)

bit width b
Chosen according to target memory budget; not fitted to data.
dimension d
Input vector dimension; not fitted.

axioms (1)

domain assumption Uniform-over-sphere inner-product error bound depending only on (b, d, delta)
Invoked to supply the structural guarantee that learned-codebook methods lack.

pith-pipeline@v0.9.0 · 5840 in / 1362 out tokens · 27450 ms · 2026-05-25T05:50:01.815963+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 1 internal anchor

[1]

Ishaq Aden-Ali, Hakan Ferhatosmanoglu, Alexander Greaves-Tunnell, Nina Mishra, and Tal Wagner. 2025. Quantization for Vector Search under Streaming Updates.arXiv preprint arXiv:2512.18335(2025)

work page arXiv 2025
[2]

Fabien André, Anne-Marie Kermarrec, and Nicolas Le Scouarnec. 2017. Acceler- ated Nearest Neighbor Search with Quick ADC. InProceedings of the 2017 ACM on International Conference on Multimedia Retrieval. 159–166

work page 2017
[3]

Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2020. ANN- Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algo- rithms.Information Systems87 (2020), 101374

work page 2020
[4]

Artem Babenko and Victor Lempitsky. 2014. Additive Quantization for Extreme Vector Compression. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 931–938

work page 2014
[5]

Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. 2021. SPANN: Highly-efficient Billion-scale Ap- proximate Nearest Neighbor Search. InAdvances in Neural Information Processing Systems, Vol. 34

work page 2021
[6]

Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2019. Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph.Proceedings of the VLDB Endowment12, 5 (2019), 461–474

work page 2019
[7]

Jianyang Gao, Yutong Gou, Yuexuan Xu, Yongyi Yang, Cheng Long, and Raymond Chi-Wing Wong. 2025. Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Neighbor Search.Proceedings of the ACM on Management of Data3, 3, Article 202 (2025), 26 pages

work page 2025
[8]

Jianyang Gao and Cheng Long. 2024. RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search.Proceedings of the ACM on Management of Data2, 3, Article 167 (2024), 27 pages

work page 2024
[9]

Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2014. Optimized Product Quantization for Approximate Nearest Neighbor Search.IEEE Transactions on Pattern Analysis and Machine Intelligence36, 4 (2014), 744–755

work page 2014
[10]

Siddharth Gollapudi, Neel Karia, Varun Sivashankar, Ravishankar Krishnaswamy, Nikit Begwani, Swapnil Raz, Yiyong Lin, Yin Zhang, Neelam Mahapatro, Premku- mar Srinivasan, Amit Singh, and Harsha Vardhan Simhadri. 2023. Filtered- DiskANN: Graph Algorithms for Approximate Nearest Neighbor Search with Filters. InProceedings of the ACM Web Conference 2023. 3406–3416

work page 2023
[11]

Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating Large-Scale Inference with Anisotropic Vector Quantization. InInternational Conference on Machine Learning (ICML). 3887–3896

work page 2020
[12]

Insu Han, Praneeth Kacham, Amin Karbasi, Vahab Mirrokni, and Amir Zandieh

work page
[13]

PolarQuant: Quantizing KV Caches with Polar Transformation.arXiv preprint arXiv:2502.02617(2025)

work page arXiv 2025
[14]

Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search.IEEE Transactions on Pattern Analysis and Machine Intelligence33, 1 (2011), 117–128

work page 2011
[15]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2021. Billion-scale similarity search with GPUs.IEEE Transactions on Big Data7, 3 (2021), 535–547

work page 2021
[16]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. InAdvances in Neural Informa- tion Processing Systems, Vol. 33. 9459–9474. 10

work page 2020
[17]

Hui Li, Shiyuan Deng, Xiao Yan, Xiangyu Zhi, and James Cheng. 2025. SAQ: Pushing the Limits of Vector Quantization through Code Adjustment and Di- mension Segmentation.Proceedings of the ACM on Management of Data(2025). https://doi.org/10.1145/3769824

work page doi:10.1145/3769824 2025
[18]

Stuart Lloyd. 1982. Least squares quantization in PCM.IEEE Transactions on Information Theory28, 2 (1982), 129–137

work page 1982
[19]

Yu. A. Malkov and D. A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence42, 4 (2020), 824– 836

work page 2020
[20]

Ilyas, Theodoros Rekatsinas, and Shivaram Venkataraman

Jason Mohoney, Anil Pacaci, Shihabur Rahman Chowdhury, Umar Farooq Minhas, Jeffrey Pound, Cedric Renggli, Nima Reyhani, Ihab F. Ilyas, Theodoros Rekatsinas, and Shivaram Venkataraman. 2024. Incremental IVF Index Maintenance for Streaming Vector Search.arXiv preprint arXiv:2411.00970(2024)

work page arXiv 2024
[21]

Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Ma- jumder, and Li Deng. 2016. MS MARCO: A Human Generated Machine Reading Comprehension Dataset. InWorkshop on Cognitive Computation, NeurIPS

work page 2016
[22]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1532–1543

work page 2014
[23]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of EMNLP-IJCNLP 2019

work page 2019
[24]

Jifan Shi, Jianyang Gao, James Xia, Tamás Béla Fehér, and Cheng Long. 2026. GPU-Native Approximate Nearest Neighbor Search with IVF-RaBitQ: Fast Index Build and Search.arXiv preprint arXiv:2602.23999(2026)

work page arXiv 2026
[25]

Aditi Singh, Suhas Jayaram Subramanya, Ravishankar Krishnaswamy, and Har- sha Vardhan Simhadri. 2021. FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search.arXiv preprint arXiv:2105.09613 (2021)

work page arXiv 2021
[26]

Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnaswamy, and Rohan Kadekodi. 2019. DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. InAdvances in Neural Information Processing Systems, Vol. 32

work page 2019
[27]

Philip Sun, David Simcha, Dave Dopson, Ruiqi Guo, and Sanjiv Kumar. 2023. SOAR: Improved Indexing for Approximate Nearest Neighbor Search. InAd- vances in Neural Information Processing Systems, Vol. 36

work page 2023
[28]

2018.High-Dimensional Probability: An Introduction with Applications in Data Science

Roman Vershynin. 2018.High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press

work page 2018
[29]

Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, Peng Cheng, and Mao Yang. 2023. SPFresh: Incremental In-Place Update for Billion-Scale Vector Search. InProceedings of the 29th Symposium on Operating Systems Principles (SOSP). 545–561

work page 2023
[30]

Mingyu Yang, Liuchang Jing, Wentao Li, and Wei Wang. 2026. Quantization Meets Projection: A Happy Marriage for Approximate 𝑘-Nearest Neighbor Search. Proceedings of the VLDB Endowment19, 6 (2026)

work page 2026
[31]

Amir Zandieh, Majid Daliri, Majid Hadian, and Vahab Mirrokni. 2025. Turbo- Quant: Online Vector Quantization with Near-optimal Distortion Rate.arXiv preprint arXiv:2504.19874(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[32]

No Stage 2

Amir Zandieh, Majid Daliri, and Insu Han. 2025. QJL: 1-Bit Quantized JL Trans- form for KV Cache Quantization with Zero Overhead.Proceedings of the AAAI Conference on Artificial Intelligence39, 24 (2025), 25805–25813. A SIGN-BIT REFINEMENT: FLAT-TQ DESCRIPTION AND ABLATION For each coordinate 𝑗 assigned to Lloyd–Max bin centroid 𝑐𝑖 with bin boundaries [𝑏𝑖...

work page 2025

[1] [1]

Ishaq Aden-Ali, Hakan Ferhatosmanoglu, Alexander Greaves-Tunnell, Nina Mishra, and Tal Wagner. 2025. Quantization for Vector Search under Streaming Updates.arXiv preprint arXiv:2512.18335(2025)

work page arXiv 2025

[2] [2]

Fabien André, Anne-Marie Kermarrec, and Nicolas Le Scouarnec. 2017. Acceler- ated Nearest Neighbor Search with Quick ADC. InProceedings of the 2017 ACM on International Conference on Multimedia Retrieval. 159–166

work page 2017

[3] [3]

Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2020. ANN- Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algo- rithms.Information Systems87 (2020), 101374

work page 2020

[4] [4]

Artem Babenko and Victor Lempitsky. 2014. Additive Quantization for Extreme Vector Compression. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 931–938

work page 2014

[5] [5]

Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. 2021. SPANN: Highly-efficient Billion-scale Ap- proximate Nearest Neighbor Search. InAdvances in Neural Information Processing Systems, Vol. 34

work page 2021

[6] [6]

Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2019. Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph.Proceedings of the VLDB Endowment12, 5 (2019), 461–474

work page 2019

[7] [7]

Jianyang Gao, Yutong Gou, Yuexuan Xu, Yongyi Yang, Cheng Long, and Raymond Chi-Wing Wong. 2025. Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Neighbor Search.Proceedings of the ACM on Management of Data3, 3, Article 202 (2025), 26 pages

work page 2025

[8] [8]

Jianyang Gao and Cheng Long. 2024. RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search.Proceedings of the ACM on Management of Data2, 3, Article 167 (2024), 27 pages

work page 2024

[9] [9]

Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2014. Optimized Product Quantization for Approximate Nearest Neighbor Search.IEEE Transactions on Pattern Analysis and Machine Intelligence36, 4 (2014), 744–755

work page 2014

[10] [10]

Siddharth Gollapudi, Neel Karia, Varun Sivashankar, Ravishankar Krishnaswamy, Nikit Begwani, Swapnil Raz, Yiyong Lin, Yin Zhang, Neelam Mahapatro, Premku- mar Srinivasan, Amit Singh, and Harsha Vardhan Simhadri. 2023. Filtered- DiskANN: Graph Algorithms for Approximate Nearest Neighbor Search with Filters. InProceedings of the ACM Web Conference 2023. 3406–3416

work page 2023

[11] [11]

Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating Large-Scale Inference with Anisotropic Vector Quantization. InInternational Conference on Machine Learning (ICML). 3887–3896

work page 2020

[12] [12]

Insu Han, Praneeth Kacham, Amin Karbasi, Vahab Mirrokni, and Amir Zandieh

work page

[13] [13]

PolarQuant: Quantizing KV Caches with Polar Transformation.arXiv preprint arXiv:2502.02617(2025)

work page arXiv 2025

[14] [14]

Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search.IEEE Transactions on Pattern Analysis and Machine Intelligence33, 1 (2011), 117–128

work page 2011

[15] [15]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2021. Billion-scale similarity search with GPUs.IEEE Transactions on Big Data7, 3 (2021), 535–547

work page 2021

[16] [16]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. InAdvances in Neural Informa- tion Processing Systems, Vol. 33. 9459–9474. 10

work page 2020

[17] [17]

Hui Li, Shiyuan Deng, Xiao Yan, Xiangyu Zhi, and James Cheng. 2025. SAQ: Pushing the Limits of Vector Quantization through Code Adjustment and Di- mension Segmentation.Proceedings of the ACM on Management of Data(2025). https://doi.org/10.1145/3769824

work page doi:10.1145/3769824 2025

[18] [18]

Stuart Lloyd. 1982. Least squares quantization in PCM.IEEE Transactions on Information Theory28, 2 (1982), 129–137

work page 1982

[19] [19]

Yu. A. Malkov and D. A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence42, 4 (2020), 824– 836

work page 2020

[20] [20]

Ilyas, Theodoros Rekatsinas, and Shivaram Venkataraman

Jason Mohoney, Anil Pacaci, Shihabur Rahman Chowdhury, Umar Farooq Minhas, Jeffrey Pound, Cedric Renggli, Nima Reyhani, Ihab F. Ilyas, Theodoros Rekatsinas, and Shivaram Venkataraman. 2024. Incremental IVF Index Maintenance for Streaming Vector Search.arXiv preprint arXiv:2411.00970(2024)

work page arXiv 2024

[21] [21]

Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Ma- jumder, and Li Deng. 2016. MS MARCO: A Human Generated Machine Reading Comprehension Dataset. InWorkshop on Cognitive Computation, NeurIPS

work page 2016

[22] [22]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1532–1543

work page 2014

[23] [23]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of EMNLP-IJCNLP 2019

work page 2019

[24] [24]

Jifan Shi, Jianyang Gao, James Xia, Tamás Béla Fehér, and Cheng Long. 2026. GPU-Native Approximate Nearest Neighbor Search with IVF-RaBitQ: Fast Index Build and Search.arXiv preprint arXiv:2602.23999(2026)

work page arXiv 2026

[25] [25]

Aditi Singh, Suhas Jayaram Subramanya, Ravishankar Krishnaswamy, and Har- sha Vardhan Simhadri. 2021. FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search.arXiv preprint arXiv:2105.09613 (2021)

work page arXiv 2021

[26] [26]

Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnaswamy, and Rohan Kadekodi. 2019. DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. InAdvances in Neural Information Processing Systems, Vol. 32

work page 2019

[27] [27]

Philip Sun, David Simcha, Dave Dopson, Ruiqi Guo, and Sanjiv Kumar. 2023. SOAR: Improved Indexing for Approximate Nearest Neighbor Search. InAd- vances in Neural Information Processing Systems, Vol. 36

work page 2023

[28] [28]

2018.High-Dimensional Probability: An Introduction with Applications in Data Science

Roman Vershynin. 2018.High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press

work page 2018

[29] [29]

Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, Peng Cheng, and Mao Yang. 2023. SPFresh: Incremental In-Place Update for Billion-Scale Vector Search. InProceedings of the 29th Symposium on Operating Systems Principles (SOSP). 545–561

work page 2023

[30] [30]

Mingyu Yang, Liuchang Jing, Wentao Li, and Wei Wang. 2026. Quantization Meets Projection: A Happy Marriage for Approximate 𝑘-Nearest Neighbor Search. Proceedings of the VLDB Endowment19, 6 (2026)

work page 2026

[31] [31]

Amir Zandieh, Majid Daliri, Majid Hadian, and Vahab Mirrokni. 2025. Turbo- Quant: Online Vector Quantization with Near-optimal Distortion Rate.arXiv preprint arXiv:2504.19874(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[32] [32]

No Stage 2

Amir Zandieh, Majid Daliri, and Insu Han. 2025. QJL: 1-Bit Quantized JL Trans- form for KV Cache Quantization with Zero Overhead.Proceedings of the AAAI Conference on Artificial Intelligence39, 24 (2025), 25805–25813. A SIGN-BIT REFINEMENT: FLAT-TQ DESCRIPTION AND ABLATION For each coordinate 𝑗 assigned to Lloyd–Max bin centroid 𝑐𝑖 with bin boundaries [𝑏𝑖...

work page 2025