LSM-VEC: A Large-Scale Disk-Based System for Dynamic Vector Search

Dingheng Mo; Shurui Zhong; Siqiang Luo

arxiv: 2505.17152 · v2 · pith:Y3HEGTFKnew · submitted 2025-05-22 · 💻 cs.DB

LSM-VEC: A Large-Scale Disk-Based System for Dynamic Vector Search

Shurui Zhong , Dingheng Mo , Siqiang Luo This is my paper

Pith reviewed 2026-05-22 02:29 UTC · model grok-4.3

classification 💻 cs.DB

keywords vector searchapproximate nearest neighbordisk-based indexdynamic updatesLSM-treehierarchical graphout-of-place updates

0 comments

The pith

LSM-VEC layers hierarchical graphs over LSM-tree storage to support out-of-place updates for billion-scale vector search.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents LSM-VEC to solve the tension between scale and mutability in approximate nearest-neighbor search. Memory-resident graphs run out of space at billion-vector sizes, while earlier disk methods either rebuild from scratch on every update or lose recall through coarse clustering. LSM-VEC spreads the proximity graph across LSM-tree levels so that insertions and deletions happen without rewriting the whole index. A sampling-based search with adaptive neighbor choice keeps most queries on fewer disk pages, and local reordering improves locality on the fly. The result is higher recall at lower latency and a memory footprint reduced by more than 66 percent on large dynamic workloads.

Core claim

LSM-VEC integrates hierarchical graph indexing with LSM-tree storage by distributing the proximity graph across multiple levels, which permits out-of-place vector updates. Search proceeds via a sampling-based probabilistic strategy with adaptive neighbor selection, while connectivity-aware graph reordering reduces I/O without global reconstruction. On billion-scale datasets the design delivers higher recall, lower query and update latency, and more than 66.2 percent memory savings compared with prior disk-based ANN systems.

What carries the argument

Hierarchical proximity graph distributed across LSM-tree levels, which carries out-of-place updates and supports probabilistic sampling during search.

If this is right

High-throughput dynamic workloads with frequent inserts and deletes become practical on disk without full index rebuilds.
Memory consumption drops enough to fit billion-scale indexes on hardware that previously required much larger RAM.
Query latency and recall improve simultaneously rather than trading one for the other.
No offline batch construction phase is required when the underlying vector collection changes continuously.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same layering idea could be applied to other graph indexes that currently demand static construction.
Continuous index maintenance might let retrieval-augmented generation pipelines update their knowledge bases in real time instead of periodic refreshes.
Streaming vector arrivals could be tested to see whether the LSM-tree levels absorb new data without periodic compaction pauses.

Load-bearing premise

The sampling-based probabilistic search with adaptive neighbors keeps recall close to full deterministic traversal, and the LSM-tree layout prevents excessive I/O amplification under high-rate mixed updates.

What would settle it

A head-to-head run on a billion-vector dataset that shows either recall falling below a competing disk system at matched latency or update throughput dropping sharply under sustained insert-delete traffic.

Figures

Figures reproduced from arXiv: 2505.17152 by Dingheng Mo, Shurui Zhong, Siqiang Luo.

**Figure 1.** Figure 1: An example of pipeline of approximate nearest neighbor (ANN) search, consisting of index construction, candidate selection, and distance computation. search is to efficiently retrieve the most similar vectors to 𝑞 based on a predefined distance metric. The exact nearest neighbor (NN) search problem is formally defined as: NN(𝑞, 𝑋) = arg min 𝑥𝑖 ∈𝑋 𝐷(𝑞, 𝑥𝑖), (1) where 𝐷(𝑞, 𝑥𝑖) represents a similarity functi… view at source ↗

**Figure 2.** Figure 2: LSM-VEC architecture. in sequentially written disk files through background compaction. This makes them particularly suitable for workloads with frequent insertions and deletions. By combining a hierarchical graph-based vector index with LSM-tree-based storage and layout-aware maintenance, LSM-VEC achieves high recall and robust support for dynamic updates with minimal I/O overhead. Building on this foun… view at source ↗

**Figure 3.** Figure 3: An illustration of vector insertion in LSM-VEC. The new node 𝑣𝑛 is connected to two bottom-layer neighbors 𝑣4 and 𝑣5, and the resulting edges are stored in the LSM-tree. in Algorithm 1, where NN(·) denotes the nearest neighbor search performed over either the in-memory graph or the disk-resident index. Algorithm 1 Insertion in LSM-VEC. Require: 𝑥 ∈ R 𝑑 : vector to insert; G: in-memory graph; D: disk-reside… view at source ↗

**Figure 4.** Figure 4: An example of graph ordering to improve I/O efficiency. In contrast, LSM-VEC introduces a sampling ratio 𝜌 ∈ (0, 1], which controls the fraction of neighbors to be accessed during traversal. A smaller 𝜌 implies more aggressive pruning of neighbor evaluations. The corresponding search cost is reduced to: Costsampling =𝑇 · (𝑡𝑛 + 𝜌 · 𝑑 · 𝑡𝑣 ). (8) Thus, the expected I/O cost saving brought by sampling is: Δ … view at source ↗

**Figure 5.** Figure 5: Evaluation of LSM-VEC under four update scenarios with different insert-delete ratios. We report recall, update latency, and search latency, simulating real-world dynamic workloads where the index continuously evolves. Each batch corresponds to 1% vector updates (1% insertion or 1% deletion). DiskANN2 and SPFresh3 , we use the official open-source implementations and follow the recommended settings from th… view at source ↗

**Figure 6.** Figure 6: Memory usage over time under different batch workloads. 5.0 7.5 10.0 Average Query Latency (ms) 80 90 Recall 10@10 (a) Query Latency vs Recall 5 10 15 Average Update Latency (ms) 80 90 (b) Update Latency vs Recall SPFRESH DISKANN LSMVEC [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Tradeoff between recall and latency. (a) Search latency vs Recall. (b) Update latency vs Recall. shows better update latency than DiskANN (6.1 ms- 9.4 ms), but suffers from limited recall improvements due to its coarsegrained cluster structure. Overall, these results highlight that LSM-VEC provides the most efficient and scalable search-update tradeoff among all methods. It achieves higher recall with sig… view at source ↗

read the original abstract

Vector search underpins modern AI applications by supporting approximate nearest neighbor (ANN) queries over high-dimensional embeddings in tasks like retrieval-augmented generation (RAG), recommendation systems, and multimodal search. Traditional ANN search indices (e.g., HNSW) are limited by memory constraints at large data scale. Disk-based indices such as DiskANN reduce memory overhead but rely on offline graph construction, resulting in costly and inefficient vector updates. The state-of-the-art clustering-based approach SPFresh offers better scalability but suffers from reduced recall due to coarse partitioning. Moreover, SPFresh employs in-place updates to maintain its index structure, limiting its efficiency in handling high-throughput insertions and deletions under dynamic workloads. This paper presents LSM-VEC, a disk-based dynamic vector index that integrates hierarchical graph indexing with LSM-tree storage. By distributing the proximity graph across multiple LSM-tree levels, LSM-VEC supports out-of-place vector updates. It enhances search efficiency via a sampling-based probabilistic search strategy with adaptive neighbor selection, and connectivity-aware graph reordering further reduces I/O without requiring global reconstruction. Experiments on billion-scale datasets demonstrate that LSM-VEC consistently outperforms existing disk-based ANN systems. It achieves higher recall, lower query and update latency, and reduces memory footprint by over 66.2%, making it well-suited for real-world large-scale vector search with dynamic updates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LSM-VEC layers hierarchical graphs on LSM-trees for out-of-place updates in disk vector search, but the billion-scale claims need verification on recall during active merges.

read the letter

The main thing to know is that LSM-VEC layers a hierarchical proximity graph on top of an LSM-tree to allow out-of-place updates for large-scale vector indexes on disk. This setup is new. It moves away from DiskANN's offline graph construction and SPFresh's in-place updates by distributing the graph across LSM levels. The sampling-based probabilistic search with adaptive neighbor selection and the connectivity-aware reordering are meant to keep search efficient and I/O low without global rebuilds. The paper does a solid job explaining how these components work together to reduce memory footprint while aiming for better query and update performance. The reported 66.2% memory reduction and gains in recall and latency on billion-scale data are the kind of practical improvements that matter for real systems. The soft spots center on the experiments. The abstract claims consistent outperformance but does not spell out the baselines, the mix of queries and updates, or error bars. More importantly, it is not clear how much the tests stressed the system with ongoing merges and high-throughput dynamics. The sampling strategy might not keep recall as close to full traversal once the graph is split and merges start happening, which is the risk flagged in the stress-test note. If the full paper shows results from workloads that include realistic merge traffic and still maintains the recall and latency advantages, that would make the contribution stronger. Without that, the central claims rest on assumptions that could be sensitive to the dynamic setting. This paper is for systems people working on vector databases and large AI retrieval pipelines that need to handle updates. Readers interested in disk-based ANN with dynamic support will find the design choices worth studying. It deserves a serious referee. The idea targets a genuine limitation in current approaches, and the work shows honest engagement with the tradeoffs even if the evaluation could be tightened. I recommend sending it for peer review.

Referee Report

2 major / 2 minor

Summary. The paper presents LSM-VEC, a disk-based dynamic vector index that integrates hierarchical graph indexing with LSM-tree storage to enable out-of-place updates for high-throughput insertions and deletions. It introduces a sampling-based probabilistic search strategy with adaptive neighbor selection and connectivity-aware graph reordering to reduce I/O without global reconstruction. Experiments on billion-scale datasets are claimed to show consistent outperformance over existing disk-based ANN systems (e.g., DiskANN, SPFresh) with higher recall, lower query/update latency, and over 66.2% memory footprint reduction.

Significance. If the central experimental claims hold under realistic dynamic workloads, the work would be significant for large-scale vector search in AI applications, as it directly addresses the update inefficiency of offline graph construction in DiskANN and the recall limitations of clustering-based approaches like SPFresh. The LSM-tree integration for dynamic graph maintenance is a novel angle in the disk-based ANN literature.

major comments (2)

[Abstract and Experimental Evaluation] Abstract and § on experimental evaluation: the headline claims of higher recall, lower latency, and 66.2% memory reduction on billion-scale data rest on unexamined experimental design. The manuscript provides no details on exact baselines, workload mixes (update/query ratio), error bars, exclusion criteria, or whether recall was measured after LSM merges rather than on static snapshots. This directly impacts the load-bearing assumption that the sampling strategy and LSM distribution preserve recall under sustained updates.
[Method (search strategy and LSM integration)] Method section on search strategy and LSM distribution: the claim that the sampling-based probabilistic search with adaptive neighbor selection preserves recall close to deterministic traversal is not shown to hold once the proximity graph is distributed across LSM levels and merges occur. No ablation or measurement under active merge traffic is described, leaving the I/O amplification bound unverified for the reported gains.

minor comments (2)

[Abstract] Abstract: the memory reduction figure of 'over 66.2%' should state the precise baseline memory usage and the exact configuration (e.g., which system and index size) for reproducibility.
[Introduction/Method] Notation: the distinction between 'hierarchical graph indexing' and standard HNSW-style graphs should be clarified early, as the LSM distribution is the key differentiator.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed review. The comments highlight important aspects of experimental rigor and methodological validation that we address below. We commit to revisions that strengthen the presentation without altering the core contributions.

read point-by-point responses

Referee: [Abstract and Experimental Evaluation] Abstract and § on experimental evaluation: the headline claims of higher recall, lower latency, and 66.2% memory reduction on billion-scale data rest on unexamined experimental design. The manuscript provides no details on exact baselines, workload mixes (update/query ratio), error bars, exclusion criteria, or whether recall was measured after LSM merges rather than on static snapshots. This directly impacts the load-bearing assumption that the sampling strategy and LSM distribution preserve recall under sustained updates.

Authors: We agree that the experimental section would benefit from greater explicitness. In the revised manuscript we will add a dedicated subsection detailing: the exact baseline implementations and parameter settings for DiskANN and SPFresh; the precise update-to-query ratios employed in the dynamic workloads; error bars computed over five independent runs with reported standard deviations; any exclusion criteria applied to queries; and explicit confirmation that all reported recall figures were obtained after LSM merges and compactions on the live index state. These additions will directly substantiate the claims under sustained dynamic conditions. revision: yes
Referee: [Method (search strategy and LSM integration)] Method section on search strategy and LSM distribution: the claim that the sampling-based probabilistic search with adaptive neighbor selection preserves recall close to deterministic traversal is not shown to hold once the proximity graph is distributed across LSM levels and merges occur. No ablation or measurement under active merge traffic is described, leaving the I/O amplification bound unverified for the reported gains.

Authors: The sampling-based search and its recall behavior are evaluated within the complete LSM-VEC system that includes concurrent merges, as reflected in the billion-scale dynamic experiments. Nevertheless, we recognize that an isolated ablation under active merge traffic would provide clearer verification of the I/O bounds. We will therefore include a new ablation in the revision that measures recall and I/O amplification specifically during periods of high merge activity, comparing the probabilistic sampler against deterministic traversal where possible. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical claims rest on external benchmarks, not self-referential derivations

full rationale

The paper describes LSM-VEC as an engineering system that combines hierarchical graphs with LSM-tree storage for dynamic updates, using sampling-based probabilistic search and connectivity-aware reordering. All performance claims (higher recall, lower latency, 66.2% memory reduction on billion-scale data) are presented as outcomes of experiments against DiskANN and SPFresh rather than quantities derived from fitted parameters or prior self-citations within the same work. No equations, uniqueness theorems, or ansatzes are introduced that reduce by construction to the paper's own inputs; the mechanisms are described as design choices whose validity is checked via independent workload measurements. This is a standard self-contained systems paper whose central results do not collapse into tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No explicit free parameters, axioms, or invented entities are stated in the abstract; the design relies on standard LSM-tree and graph-index assumptions from prior literature.

pith-pipeline@v0.9.0 · 5772 in / 1058 out tokens · 46814 ms · 2026-05-22T02:29:55.423590+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

integrates hierarchical graph indexing with LSM-tree storage... sampling-based probabilistic search strategy with adaptive neighbor selection... connectivity-aware graph reordering
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Experiments on billion-scale datasets demonstrate that LSM-VEC consistently outperforms existing disk-based ANN systems

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Opal: Private Memory for Personal AI
cs.CR 2026-04 unverdicted novelty 6.0

Opal enables private long-term memory for personal AI by decoupling reasoning to a trusted enclave with a lightweight knowledge graph and piggybacking reindexing on ORAM accesses.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Amazon. 2025. Opensearch.https://aws.amazon.com/cn/opensearch- service/serverless-vector-database/

work page 2025
[2]

Kazuo Aoyama, Kazumi Saito, Hiroshi Sawada, and Naonori Ueda

work page
[3]

InProceedings of the 17th ACM SIGKDD interna- tional conference on Knowledge discovery and data mining

Fast approximate similarity search based on degree-reduced neighborhood graphs. InProceedings of the 17th ACM SIGKDD interna- tional conference on Knowledge discovery and data mining. 1055–1063

work page
[4]

Akhil Arora, Sakshi Sinha, Piyush Kumar, and Arnab Bhattacharya

work page
[5]

Hd-index: Pushing the scalability-accuracy boundary for ap- proximate knn search in high-dimensional spaces.arXiv preprint arXiv:1804.06829(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2020. ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms.Information Systems87 (2020), 101374

work page 2020
[7]

Erik Bernhardsson. 2017. Annoy: approximate nearest neighbors in C++/Python optimized for memory usage and loading/saving to disk. GitHub https://github. com/spotify/annoy(2017), 0–5

work page 2017
[8]

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean- Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. 2022. Improving language models by retrieving from trillions of tokens. InInternational conference on machine learning. PMLR, 2206–2240

work page 2022
[9]

Yuan Cao, Heng Qi, Wenrui Zhou, Jien Kato, Keqiu Li, Xiulong Liu, and Jie Gui. 2017. Binary hashing for approximate nearest neighbor search on big data: A survey.IEEE Access6 (2017), 2039–2054

work page 2017
[10]

Moses S Charikar. 2002. Similarity estimation techniques from round- ing algorithms. InProceedings of the thiry-fourth annual ACM sympo- sium on Theory of computing. 380–388

work page 2002
[11]

Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. 2021. Spann: Highly- efficient billion-scale approximate nearest neighborhood search.Ad- vances in Neural Information Processing Systems34 (2021), 5199–5212

work page 2021
[12]

SpaceV Contributors. 2021. SPACEV1B: A billion-Scale vector dataset for text descriptors.Webpage. Retrieved March16 (2021), 2023

work page 2021
[13]

Scott Cost and Steven Salzberg. 1993. A weighted nearest neighbor algorithm for learning with symbolic features.Machine learning10 (1993), 57–78

work page 1993
[14]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. InProceedings of the 10th ACM conference on recommender systems. 191–198

work page 2016
[15]

Facebook. 2020. Faiss.https://github.com/facebookresearch/faiss. Accessed: 2025-03-28

work page 2020
[16]

Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2017. Fast approximate nearest neighbor search with the navigating spreading- out graph.arXiv preprint arXiv:1707.00143(2017)

work page arXiv 2017
[17]

Long Gong, Huayi Wang, Mitsunori Ogihara, and Jun Xu. 2020. iDEC: indexable distance estimating codes for approximate nearest neighbor search.Proceedings of the VLDB Endowment13, 9 (2020)

work page 2020
[18]

Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating large-scale inference with anisotropic vector quantization. InInternational Conference on Machine Learning. PMLR, 3887–3896

work page 2020
[19]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval augmented language model pre-training. In International conference on machine learning. PMLR, 3929–3938

work page 2020
[20]

Shayan Hassantabar, Zeyu Wang, and Niraj K Jha. 2021. SCANN: Synthesis of compact and accurate neural networks.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems41, 9 (2021), 3012–3025

work page 2021
[21]

Qiang Huang, Jianlin Feng, Yikai Zhang, Qiong Fang, and Wilfred Ng

work page
[22]

Query-aware locality-sensitive hashing for approximate nearest neighbor search.Proceedings of the VLDB Endowment9, 1 (2015), 1–12

work page 2015
[23]

Masajiro Iwasaki. 2015. Ngt: Neighborhood graph and tree for index- ing

work page 2015
[24]

Masajiro Iwasaki and Daisuke Miyazaki. 2018. Optimization of in- dexing based on k-nearest neighbor graph for proximity search in high-dimensional data.arXiv preprint arXiv:1810.07355(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[25]

Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. 2023. Atlas: Few-shot learning with re- trieval augmented language models.Journal of Machine Learning Research24, 251 (2023), 1–43

work page 2023
[26]

Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. 2019. Diskann: Fast accurate billion-point nearest neighbor search on a single node.Ad- vances in neural information processing Systems32 (2019)

work page 2019
[27]

Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search.IEEE transactions on pattern analysis and machine intelligence33, 1 (2010), 117–128

work page 2010
[28]

Saim Khan, Somesh Singh, Harsha Vardhan Simhadri, Jyothi Vedurada, et al. 2024. BANG: Billion-Scale Approximate Nearest Neighbor Search using a Single GPU.arXiv preprint arXiv:2401.11324(2024)

work page arXiv 2024
[29]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al . 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems33 (2020), 9459–9474

work page 2020
[30]

Wuchao Li, Chao Feng, Defu Lian, Yuxin Xie, Haifeng Liu, Yong Ge, and Enhong Chen. 2023. Learning balanced tree indexes for large-scale vector retrieval. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1353–1362

work page 2023
[31]

Kejing Lu, Chuan Xiao, and Yoshiharu Ishikawa. 2024. Probabilistic routing for graph-based approximate nearest neighbor search.arXiv preprint arXiv:2402.11354(2024)

work page arXiv 2024
[32]

Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov. 2014. Approximate nearest neighbor algorithm based on navigable small world graphs.Information Systems45 (2014), 61–68

work page 2014
[33]

Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE transactions on pattern analysis and machine intelligence42, 4 (2018), 824–836

work page 2018
[34]

Magdalen Dobson Manohar, Zheqi Shen, Guy Blelloch, Laxman Dhuli- pala, Yan Gu, Harsha Vardhan Simhadri, and Yihan Sun. 2024. Par- layann: Scalable and deterministic parallel graph-based approximate nearest neighbor search algorithms. InProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Pro- gramming. 270–285

work page 2024
[35]

Yitong Meng, Xinyan Dai, Xiao Yan, James Cheng, Weiwen Liu, Jun Guo, Benben Liao, and Guangyong Chen. 2020. Pmd: An optimal transportation-based user distance for recommender systems. InAd- vances in Information Retrieval: 42nd European Conference on IR Re- search, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part II 42. Springer, 272–280

work page 2020
[36]

Dingheng Mo, Junfeng Liu, Fan Wang, and Siqiang Luo. 2025. Aster: Enhancing LSM-structures for Scalable Graph Database.Proceedings of the ACM on Management of Data3, 1 (2025), 1–26

work page 2025
[37]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. InInternational conference on machine learning. PmLR, 8748–8763

work page 2021
[38]

Jie Ren, Minjia Zhang, and Dong Li. 2020. Hm-ann: Efficient billion- point nearest neighbor search on heterogeneous memory.Advances 13 SOSP ’25, October 13–16, 2025, Seoul, Republic of Korea Shurui Zhong, Dingheng Mo, and Siqiang Luo in Neural Information Processing Systems33 (2020), 10672–10684

work page 2020
[39]

Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. InPro- ceedings of the 10th international conference on World Wide Web. 285– 295

work page 2001
[40]

Chanop Silpa-Anan and Richard Hartley. 2008. Optimised KD-trees for fast image descriptor matching. In2008 IEEE conference on computer vision and pattern recognition. IEEE, 1–8

work page 2008
[41]

Aditi Singh, Suhas Jayaram Subramanya, Ravishankar Krishnaswamy, and Harsha Vardhan Simhadri. 2021. Freshdiskann: A fast and accurate graph-based ann index for streaming similarity search.arXiv preprint arXiv:2105.09613(2021)

work page arXiv 2021
[42]

Bing Tian, Haikun Liu, Yuhang Tang, Shihai Xiao, Zhuohui Duan, Xiaofei Liao, Hai Jin, Xuecang Zhang, Junhua Zhu, and Yu Zhang

work page
[43]

In23rd USENIX Conference on File and Storage Technologies (FAST 25)

Towards High-throughput and Low-latency Billion-scale Vector Search via CPU/GPU Collaborative Filtering and Re-ranking. In23rd USENIX Conference on File and Storage Technologies (FAST 25). 171–185

work page
[44]

Noam Touitou and Nissim Halabi. 2023. Approximate Nearest Neigh- bor Search through Modern Error-Correcting Codes. InThe Eleventh International Conference on Learning Representations

work page 2023
[45]

Hao Wei, Jeffrey Xu Yu, Can Lu, and Xuemin Lin. 2016. Speedup graph processing by graph ordering. InProceedings of the 2016 International Conference on Management of Data. 1813–1828

work page 2016
[46]

Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, et al. 2023. Spfresh: Incremental in-place update for billion-scale vector search. InProceed- ings of the 29th Symposium on Operating Systems Principles. 545–561

work page 2023
[47]

Jun Yang, Qingsong Wei, Cheng Chen, Chundong Wang, Khai Leong Yong, and Bingsheng He. 2015. NV-Tree: reducing consistency cost for NVM-based single level systems. In13th USENIX Conference on File and Storage Technologies (FAST 15). 167–181

work page 2015
[48]

Shulin Zeng, Zhenhua Zhu, Jun Liu, Haoyu Zhang, Guohao Dai, Zixuan Zhou, Shuangchen Li, Xuefei Ning, Yuan Xie, Huazhong Yang, et al

work page
[49]

In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture

DF-GAS: a distributed FPGA-as-a-service architecture towards billion-scale graph-based approximate nearest neighbor search. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. 283–296

work page
[50]

Minjia Zhang and Yuxiong He. 2018. Zoom: Ssd-based vector search for optimizing accuracy, latency and memory.arXiv preprint arXiv:1809.04067(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[51]

Minjia Zhang, Jie Ren, Zhen Peng, Ruoming Jin, Dong Li, and Bin Ren

work page
[52]

Bull.(2023)

iqan: Fast and accurate vector search with efficient intra-query parallelism on multi-core architectures.IEEE Data Eng. Bull.(2023)

work page 2023
[53]

Zili Zhang, Fangyue Liu, Gang Huang, Xuanzhe Liu, and Xin Jin. 2024. Fast Vector Query Processing for Large Datasets Beyond GPU Memory with Reordered Pipelining. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). 23–40

work page 2024
[54]

Wenhui Zhou, Chunfeng Yuan, Rong Gu, and Yihua Huang. 2013. Large scale nearest neighbors search based on neighborhood graph. In 2013 International Conference on Advanced Cloud and Big Data. IEEE, 181–186. 14

work page 2013

[1] [1]

Amazon. 2025. Opensearch.https://aws.amazon.com/cn/opensearch- service/serverless-vector-database/

work page 2025

[2] [2]

Kazuo Aoyama, Kazumi Saito, Hiroshi Sawada, and Naonori Ueda

work page

[3] [3]

InProceedings of the 17th ACM SIGKDD interna- tional conference on Knowledge discovery and data mining

Fast approximate similarity search based on degree-reduced neighborhood graphs. InProceedings of the 17th ACM SIGKDD interna- tional conference on Knowledge discovery and data mining. 1055–1063

work page

[4] [4]

Akhil Arora, Sakshi Sinha, Piyush Kumar, and Arnab Bhattacharya

work page

[5] [5]

Hd-index: Pushing the scalability-accuracy boundary for ap- proximate knn search in high-dimensional spaces.arXiv preprint arXiv:1804.06829(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[6] [6]

Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2020. ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms.Information Systems87 (2020), 101374

work page 2020

[7] [7]

Erik Bernhardsson. 2017. Annoy: approximate nearest neighbors in C++/Python optimized for memory usage and loading/saving to disk. GitHub https://github. com/spotify/annoy(2017), 0–5

work page 2017

[8] [8]

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean- Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. 2022. Improving language models by retrieving from trillions of tokens. InInternational conference on machine learning. PMLR, 2206–2240

work page 2022

[9] [9]

Yuan Cao, Heng Qi, Wenrui Zhou, Jien Kato, Keqiu Li, Xiulong Liu, and Jie Gui. 2017. Binary hashing for approximate nearest neighbor search on big data: A survey.IEEE Access6 (2017), 2039–2054

work page 2017

[10] [10]

Moses S Charikar. 2002. Similarity estimation techniques from round- ing algorithms. InProceedings of the thiry-fourth annual ACM sympo- sium on Theory of computing. 380–388

work page 2002

[11] [11]

Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. 2021. Spann: Highly- efficient billion-scale approximate nearest neighborhood search.Ad- vances in Neural Information Processing Systems34 (2021), 5199–5212

work page 2021

[12] [12]

SpaceV Contributors. 2021. SPACEV1B: A billion-Scale vector dataset for text descriptors.Webpage. Retrieved March16 (2021), 2023

work page 2021

[13] [13]

Scott Cost and Steven Salzberg. 1993. A weighted nearest neighbor algorithm for learning with symbolic features.Machine learning10 (1993), 57–78

work page 1993

[14] [14]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. InProceedings of the 10th ACM conference on recommender systems. 191–198

work page 2016

[15] [15]

Facebook. 2020. Faiss.https://github.com/facebookresearch/faiss. Accessed: 2025-03-28

work page 2020

[16] [16]

Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2017. Fast approximate nearest neighbor search with the navigating spreading- out graph.arXiv preprint arXiv:1707.00143(2017)

work page arXiv 2017

[17] [17]

Long Gong, Huayi Wang, Mitsunori Ogihara, and Jun Xu. 2020. iDEC: indexable distance estimating codes for approximate nearest neighbor search.Proceedings of the VLDB Endowment13, 9 (2020)

work page 2020

[18] [18]

Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating large-scale inference with anisotropic vector quantization. InInternational Conference on Machine Learning. PMLR, 3887–3896

work page 2020

[19] [19]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval augmented language model pre-training. In International conference on machine learning. PMLR, 3929–3938

work page 2020

[20] [20]

Shayan Hassantabar, Zeyu Wang, and Niraj K Jha. 2021. SCANN: Synthesis of compact and accurate neural networks.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems41, 9 (2021), 3012–3025

work page 2021

[21] [21]

Qiang Huang, Jianlin Feng, Yikai Zhang, Qiong Fang, and Wilfred Ng

work page

[22] [22]

Query-aware locality-sensitive hashing for approximate nearest neighbor search.Proceedings of the VLDB Endowment9, 1 (2015), 1–12

work page 2015

[23] [23]

Masajiro Iwasaki. 2015. Ngt: Neighborhood graph and tree for index- ing

work page 2015

[24] [24]

Masajiro Iwasaki and Daisuke Miyazaki. 2018. Optimization of in- dexing based on k-nearest neighbor graph for proximity search in high-dimensional data.arXiv preprint arXiv:1810.07355(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[25] [25]

Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. 2023. Atlas: Few-shot learning with re- trieval augmented language models.Journal of Machine Learning Research24, 251 (2023), 1–43

work page 2023

[26] [26]

Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. 2019. Diskann: Fast accurate billion-point nearest neighbor search on a single node.Ad- vances in neural information processing Systems32 (2019)

work page 2019

[27] [27]

Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search.IEEE transactions on pattern analysis and machine intelligence33, 1 (2010), 117–128

work page 2010

[28] [28]

Saim Khan, Somesh Singh, Harsha Vardhan Simhadri, Jyothi Vedurada, et al. 2024. BANG: Billion-Scale Approximate Nearest Neighbor Search using a Single GPU.arXiv preprint arXiv:2401.11324(2024)

work page arXiv 2024

[29] [29]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al . 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems33 (2020), 9459–9474

work page 2020

[30] [30]

Wuchao Li, Chao Feng, Defu Lian, Yuxin Xie, Haifeng Liu, Yong Ge, and Enhong Chen. 2023. Learning balanced tree indexes for large-scale vector retrieval. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1353–1362

work page 2023

[31] [31]

Kejing Lu, Chuan Xiao, and Yoshiharu Ishikawa. 2024. Probabilistic routing for graph-based approximate nearest neighbor search.arXiv preprint arXiv:2402.11354(2024)

work page arXiv 2024

[32] [32]

Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov. 2014. Approximate nearest neighbor algorithm based on navigable small world graphs.Information Systems45 (2014), 61–68

work page 2014

[33] [33]

Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE transactions on pattern analysis and machine intelligence42, 4 (2018), 824–836

work page 2018

[34] [34]

Magdalen Dobson Manohar, Zheqi Shen, Guy Blelloch, Laxman Dhuli- pala, Yan Gu, Harsha Vardhan Simhadri, and Yihan Sun. 2024. Par- layann: Scalable and deterministic parallel graph-based approximate nearest neighbor search algorithms. InProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Pro- gramming. 270–285

work page 2024

[35] [35]

Yitong Meng, Xinyan Dai, Xiao Yan, James Cheng, Weiwen Liu, Jun Guo, Benben Liao, and Guangyong Chen. 2020. Pmd: An optimal transportation-based user distance for recommender systems. InAd- vances in Information Retrieval: 42nd European Conference on IR Re- search, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part II 42. Springer, 272–280

work page 2020

[36] [36]

Dingheng Mo, Junfeng Liu, Fan Wang, and Siqiang Luo. 2025. Aster: Enhancing LSM-structures for Scalable Graph Database.Proceedings of the ACM on Management of Data3, 1 (2025), 1–26

work page 2025

[37] [37]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. InInternational conference on machine learning. PmLR, 8748–8763

work page 2021

[38] [38]

Jie Ren, Minjia Zhang, and Dong Li. 2020. Hm-ann: Efficient billion- point nearest neighbor search on heterogeneous memory.Advances 13 SOSP ’25, October 13–16, 2025, Seoul, Republic of Korea Shurui Zhong, Dingheng Mo, and Siqiang Luo in Neural Information Processing Systems33 (2020), 10672–10684

work page 2020

[39] [39]

Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. InPro- ceedings of the 10th international conference on World Wide Web. 285– 295

work page 2001

[40] [40]

Chanop Silpa-Anan and Richard Hartley. 2008. Optimised KD-trees for fast image descriptor matching. In2008 IEEE conference on computer vision and pattern recognition. IEEE, 1–8

work page 2008

[41] [41]

Aditi Singh, Suhas Jayaram Subramanya, Ravishankar Krishnaswamy, and Harsha Vardhan Simhadri. 2021. Freshdiskann: A fast and accurate graph-based ann index for streaming similarity search.arXiv preprint arXiv:2105.09613(2021)

work page arXiv 2021

[42] [42]

Bing Tian, Haikun Liu, Yuhang Tang, Shihai Xiao, Zhuohui Duan, Xiaofei Liao, Hai Jin, Xuecang Zhang, Junhua Zhu, and Yu Zhang

work page

[43] [43]

In23rd USENIX Conference on File and Storage Technologies (FAST 25)

Towards High-throughput and Low-latency Billion-scale Vector Search via CPU/GPU Collaborative Filtering and Re-ranking. In23rd USENIX Conference on File and Storage Technologies (FAST 25). 171–185

work page

[44] [44]

Noam Touitou and Nissim Halabi. 2023. Approximate Nearest Neigh- bor Search through Modern Error-Correcting Codes. InThe Eleventh International Conference on Learning Representations

work page 2023

[45] [45]

Hao Wei, Jeffrey Xu Yu, Can Lu, and Xuemin Lin. 2016. Speedup graph processing by graph ordering. InProceedings of the 2016 International Conference on Management of Data. 1813–1828

work page 2016

[46] [46]

Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, et al. 2023. Spfresh: Incremental in-place update for billion-scale vector search. InProceed- ings of the 29th Symposium on Operating Systems Principles. 545–561

work page 2023

[47] [47]

Jun Yang, Qingsong Wei, Cheng Chen, Chundong Wang, Khai Leong Yong, and Bingsheng He. 2015. NV-Tree: reducing consistency cost for NVM-based single level systems. In13th USENIX Conference on File and Storage Technologies (FAST 15). 167–181

work page 2015

[48] [48]

Shulin Zeng, Zhenhua Zhu, Jun Liu, Haoyu Zhang, Guohao Dai, Zixuan Zhou, Shuangchen Li, Xuefei Ning, Yuan Xie, Huazhong Yang, et al

work page

[49] [49]

In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture

DF-GAS: a distributed FPGA-as-a-service architecture towards billion-scale graph-based approximate nearest neighbor search. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. 283–296

work page

[50] [50]

Minjia Zhang and Yuxiong He. 2018. Zoom: Ssd-based vector search for optimizing accuracy, latency and memory.arXiv preprint arXiv:1809.04067(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[51] [51]

Minjia Zhang, Jie Ren, Zhen Peng, Ruoming Jin, Dong Li, and Bin Ren

work page

[52] [52]

Bull.(2023)

iqan: Fast and accurate vector search with efficient intra-query parallelism on multi-core architectures.IEEE Data Eng. Bull.(2023)

work page 2023

[53] [53]

Zili Zhang, Fangyue Liu, Gang Huang, Xuanzhe Liu, and Xin Jin. 2024. Fast Vector Query Processing for Large Datasets Beyond GPU Memory with Reordered Pipelining. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). 23–40

work page 2024

[54] [54]

Wenhui Zhou, Chunfeng Yuan, Rong Gu, and Yihua Huang. 2013. Large scale nearest neighbors search based on neighborhood graph. In 2013 International Conference on Advanced Cloud and Big Data. IEEE, 181–186. 14

work page 2013