LAANN: I/O-Aware Look-Ahead Search for Disk-Based Approximate Nearest Neighbor Search

Bingzhe Li; Dingyi Kang; Juncheng Yang

arxiv: 2606.02784 · v1 · pith:252XKMTAnew · submitted 2026-06-01 · 💻 cs.DB

LAANN: I/O-Aware Look-Ahead Search for Disk-Based Approximate Nearest Neighbor Search

Dingyi Kang , Juncheng Yang , Bingzhe Li This is my paper

Pith reviewed 2026-06-28 11:46 UTC · model grok-4.3

classification 💻 cs.DB

keywords disk-based ANNSapproximate nearest neighbor searchI/O optimizationgraph searchlook-ahead searchpriority pipelinevector retrieval

0 comments

The pith

LAANN makes graph search in disk-based ANNS explicitly I/O-aware by processing cached candidates before committing to disk reads.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents LAANN as a way to reduce the I/O dominance in large-scale disk-based approximate nearest neighbor search. It claims that treating computation and I/O as separate misses the chance to use already-cached candidates to avoid unnecessary disk accesses. LAANN integrates look-ahead search that adapts across stages, a priority pipeline that fills I/O wait time with useful computation on cached items, and a lightweight in-memory index for better starting points. A sympathetic reader would care because vector datasets now reach billions and I/O latency sets the practical limit on retrieval speed. The reported outcome is 1.41x-4.66x higher throughput and 1.59x-6.34x fewer I/O operations at Recall@10 = 0.9.

Core claim

LAANN is a disk-based ANNS system that co-optimizes CPU computation and I/O access during graph search. It achieves this through three techniques: look-ahead search that changes strategy across query stages to balance I/O reduction with timely disk issuance; a priority I/O-CPU pipeline that processes memory-cached candidates during I/O waits according to their expected impact on future decisions; and a fast lightweight in-memory graph index that supplies high-quality initial candidates. On million- and billion-scale datasets this yields 1.41x-4.66x higher throughput, 29%-79% lower latency, and 1.59x-6.34x fewer I/O operations than prior disk-based systems at Recall@10 = 0.9.

What carries the argument

I/O-aware look-ahead graph search that adapts traversal while using a priority pipeline to compute on cached candidates during I/O waits, supported by a lightweight in-memory index for initial candidates.

If this is right

Fewer disk accesses translate directly into lower query latency when storage is the bottleneck.
The priority pipeline allows useful CPU work to overlap with I/O without postponing necessary reads.
High-quality initial candidates from the in-memory index accelerate convergence and cut later disk accesses.
The gains hold on both million-scale and billion-scale vector datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same co-optimization pattern could be applied to other storage-bound graph algorithms beyond ANNS.
Workloads with very small memory caches might see diminished returns unless the pipeline cost is further reduced.
Tighter CPU-I/O coupling might improve other database operations such as joins or index builds that also mix computation and disk access.

Load-bearing premise

Selectively processing cached candidates before I/O decisions will reduce unnecessary disk accesses without the added computation delaying critical I/O operations or providing negligible benefit.

What would settle it

A direct measurement on a workload with low cache-hit rates showing that the priority pipeline's extra computation increases end-to-end query latency relative to a pure I/O-optimized baseline.

Figures

Figures reproduced from arXiv: 2606.02784 by Bingzhe Li, Dingyi Kang, Juncheng Yang.

**Figure 3.** Figure 3: Illustration of processing vectors cached in memory [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Normalized latency and I/O reduction (relative to [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: System overview of LAANN. The DRAM region holds three purpose-built structures; the SSD stores the page-level [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: I/O composition across the approach phase and con [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 8.** Figure 8: I/O composition at each hop during the convergence [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗

**Figure 9.** Figure 9: Priority I/O-CPU pipeline across two consecutive [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗

**Figure 10.** Figure 10: Candidate pool with overflow area showing [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗

**Figure 11.** Figure 11: Lightweight in-memory graph index search and [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗

**Figure 12.** Figure 12: Recall@10 vs. latency (top row) and recall@10 vs. throughput (bottom row) on three 100M-scale datasets under a [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗

**Figure 13.** Figure 13: Recall@10 vs. latency (top row) and recall@10 vs. [PITH_FULL_IMAGE:figures/full_fig_p011_13.png] view at source ↗

**Figure 14.** Figure 14: QPS (K) and latency at Recall@10 = 0.9 under vary [PITH_FULL_IMAGE:figures/full_fig_p011_14.png] view at source ↗

read the original abstract

Approximate nearest neighbor search (ANNS) is a fundamental primitive in large-scale retrieval, recommendation, and AI systems. As vector datasets grow to billions or even trillions of items, disk-based ANNS systems have emerged to handle this scale by storing vector data and index structures on storage systems, but their query performance remains dominated by I/O latency. Existing disk-based ANNS systems primarily optimize I/O efficiency or overlap I/O with computation, but they treat CPU computation and I/O access as largely separate components. This separation misses a critical opportunity: selectively processing candidates already cached in memory before making I/O decisions can reduce unnecessary disk accesses and improve search quality. However, exploiting this opportunity is challenging because excessive computation can delay critical I/O operations, while poorly chosen computation provides little benefit, potentially increasing overall query latency. In this paper, we present LAANN, a disk-based ANNS system that makes graph search explicitly I/O-aware by co-optimizing CPU computation and I/O access. LAANN combines three techniques: look-ahead search, which adapts the search strategy across query stages to balance I/O reduction and timely I/O issuance; a priority I/O-CPU pipeline, which uses I/O waiting time to process candidates cached in memory according to their expected impact on upcoming I/O decisions; and a fast lightweight in-memory graph index, which provides high-quality initial candidates to accelerate convergence and reduce disk accesses. Experiments on million- and billion-scale datasets demonstrate that LAANN substantially outperforms state-of-the-art disk-based ANNS systems. At Recall@10 = 0.9, LAANN achieves 1.41x-4.66x higher throughput, 29%-79% lower latency, and 1.59x-6.34x fewer I/O operations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LAANN combines look-ahead search and a priority pipeline to make disk ANNS more I/O-aware, but the abstract supplies no data to check whether the extra CPU work actually cuts I/O without extending the critical path.

read the letter

The core idea is to stop treating CPU work and disk I/O as separate in graph-based ANNS. LAANN adds look-ahead that changes search behavior by stage, a pipeline that fills I/O wait time by scoring cached candidates for their effect on future I/O choices, and a small in-memory graph for quick starting points. That combination is presented as new; earlier systems either chased I/O efficiency alone or tried simple overlap without using cached data to prune disk reads ahead of time.

The paper does a clean job naming the practical problem: extra CPU can delay I/O issuance, so any benefit has to be selective and cheap. The lightweight index and stage-adaptive look-ahead are reasonable ways to keep the added work small.

The main weakness is that nothing in the description lets us verify the central claim. We get aggregate numbers (1.41x-4.66x throughput, 1.59x-6.34x fewer I/Os at Recall@10=0.9) but no cache-hit rates, per-stage latency breakdown, or timing traces showing that the pipeline overlaps rather than serializes. Without those, it is impossible to tell whether the reported savings come from the new logic or from the in-memory graph alone. The abstract itself flags the risk of extra computation hurting latency, yet supplies no evidence that the risk was avoided.

This is aimed at people who build or tune disk-based vector search for billion-scale data. A systems researcher or engineer working on retrieval would find the techniques worth reading even if the numbers need independent checking.

The work deserves peer review. The problem is real, the approach is concrete, and the missing details are exactly what referees can request.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces LAANN, a disk-based ANNS system that co-optimizes CPU computation and I/O access via three techniques: look-ahead search (adapting strategy across query stages), a priority I/O-CPU pipeline (processing cached candidates during I/O waits according to expected impact), and a fast lightweight in-memory graph index (providing high-quality initial candidates). It claims that at Recall@10 = 0.9, LAANN delivers 1.41x-4.66x higher throughput, 29%-79% lower latency, and 1.59x-6.34x fewer I/O operations than prior disk-based ANNS systems on million- and billion-scale datasets.

Significance. If the experimental claims are reproducible and the pipeline demonstrably overlaps computation with I/O waits without net latency penalty, the work would advance disk-based ANNS by closing the gap between separate CPU and I/O optimizations, with potential impact on large-scale retrieval and AI systems. The reported I/O reductions and latency improvements are substantial enough to matter at billion-scale, but the current manuscript supplies only aggregate numbers without supporting methodology or traces.

major comments (2)

[Abstract] Abstract: the central empirical claims (specific throughput, latency, and I/O multipliers at Recall@10 = 0.9) rest on experimental comparisons, yet the manuscript provides no description of datasets, baseline implementations, hardware, measurement methodology, or verification procedures, rendering the reported speedups and I/O counts impossible to assess or reproduce.
[Technical description of the priority I/O-CPU pipeline] Technical description of the priority I/O-CPU pipeline: the abstract itself notes the risk that added computation can delay critical I/O, but no timing breakdown, per-stage latency traces, cache-hit statistics, or analysis is supplied to show that processing cached candidates actually shortens the critical path or produces the claimed 1.59x-6.34x I/O reduction rather than extending total latency.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and outline the revisions we will make to improve clarity and reproducibility.

read point-by-point responses

Referee: [Abstract] Abstract: the central empirical claims (specific throughput, latency, and I/O multipliers at Recall@10 = 0.9) rest on experimental comparisons, yet the manuscript provides no description of datasets, baseline implementations, hardware, measurement methodology, or verification procedures, rendering the reported speedups and I/O counts impossible to assess or reproduce.

Authors: We agree that the abstract does not contain methodological details, as abstracts are typically concise. The full manuscript contains dedicated experimental sections describing the datasets (million- and billion-scale), baselines, hardware, and measurement procedures. To address the concern, we will revise the abstract to include a brief high-level description of the evaluation setup and add an explicit reproducibility subsection in the experimental evaluation. revision: yes
Referee: [Technical description of the priority I/O-CPU pipeline] Technical description of the priority I/O-CPU pipeline: the abstract itself notes the risk that added computation can delay critical I/O, but no timing breakdown, per-stage latency traces, cache-hit statistics, or analysis is supplied to show that processing cached candidates actually shortens the critical path or produces the claimed 1.59x-6.34x I/O reduction rather than extending total latency.

Authors: We acknowledge that the current description of the priority I/O-CPU pipeline lacks supporting timing analysis. The manuscript explains the design rationale but does not provide the requested breakdowns or traces. In the revision, we will add per-stage latency traces, cache-hit statistics, and a timing breakdown demonstrating that candidate processing overlaps with I/O waits without extending the critical path, thereby supporting the reported I/O reductions. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on experimental benchmarks, not equations or self-referential fits.

full rationale

The paper presents LAANN as an engineering system combining look-ahead search, priority I/O-CPU pipeline, and a lightweight in-memory index. All performance claims (throughput, latency, I/O counts at fixed Recall@10) are supported solely by direct measurements on million- and billion-scale datasets against prior systems. No derivation chain, equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central argument is empirical comparison, not a mathematical reduction that collapses to its own inputs. This is the normal, non-circular case for systems papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The central claim rests on the empirical effectiveness of the three named techniques.

pith-pipeline@v0.9.1-grok · 5865 in / 1077 out tokens · 29080 ms · 2026-06-28T11:46:53.436520+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 5 canonical work pages

[1]

Akari Asai, Sewon Min, Zexuan Zhong, and Danqi Chen. 2023. Retrieval-based language models and applications. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 6: Tutorial Abstracts). 41–46

2023
[2]

Artem Babenko and Victor Lempitsky. 2016. Efficient Indexing of Billion-Scale Datasets of Deep Descriptors. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2055–2063

2016
[3]

Vishwanath Bijalwan, Vinay Kumar, Pinki Kumari, and Jordan Pascual. 2014. KNN based machine learning approach for text and document mining.Interna- tional Journal of Database Theory and Application7, 1 (2014), 61–70

2014
[4]

Wout Bittremieux, Pieter Meysman, William Stafford Noble, and Kris Laukens
[5]

Fast open modification spectral library searching through approximate nearest neighbor indexing.Journal of proteome research17, 10 (2018), 3463–3474

2018
[6]

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Ruther- ford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bog- dan Damoc, Aidan Clark, et al. 2022. Improving language models by retrieving from trillions of tokens. InInternational conference on machine learning. PMLR, 2206–2240

2022
[7]

Yuan Cao, Heng Qi, Wenrui Zhou, Jien Kato, Keqiu Li, Xiulong Liu, and Jie Gui
[8]

Binary hashing for approximate nearest neighbor search on big data: A survey.IEEE Access6 (2017), 2039–2054

2017
[9]

Qi Chen, Haidong Wang, Mingqin Li, Gang Ren, Scarlett Li, Jeffery Zhu, Jason Li, Chuanjie Liu, Lintao Zhang, and Jingdong Wang. 2018. SPTAG: A library for fast approximate nearest neighbor search

2018
[10]

Thomas Cover and Peter Hart. 1967. Nearest neighbor pattern classification. IEEE transactions on information theory13, 1 (1967), 21–27

1967
[11]

Shiyuan Deng, Xiao Yan, KW Ng Kelvin, Chenyu Jiang, and James Cheng. 2019. Pyramid: A general framework for distributed similarity search on large-scale datasets. In2019 IEEE International Conference on Big Data (Big Data). IEEE, 1066–1071

2019
[12]

Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, et al. 1995. Query by image and video content: The QBIC system.computer28, 9 (1995), 23–32. LAANN: I/O-Aware Look-Ahead Search for Disk-Based Approximate Nearest Neighbor Search

1995
[13]

Cong Fu and Deng Cai. 2019. Efficient Approximate Nearest Neighbor Search with the Navigable Small World Graph. InIEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41. 1102–1116

2019
[14]

Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2017. Fast approximate nearest neighbor search with the navigating spreading-out graph.arXiv preprint arXiv:1707.00143(2017)

work page arXiv 2017
[15]

Hao Guo and Youyou Lu. 2025. Achieving Low-Latency Graph-Based Vec- tor Search via Aligning Best-First Search Algorithm with SSD (PipeANN). In Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’25). Boston, MA, USA. Available as Open Access

2025
[16]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang
[17]

InInternational confer- ence on machine learning

Retrieval augmented language model pre-training. InInternational confer- ence on machine learning. PMLR, 3929–3938
[18]

Jui-Ting Huang, Ashish Sharma, Shuying Sun, Li Xia, David Zhang, Philip Pronin, Janani Padmanabhan, Giuseppe Ottaviano, and Linjun Yang. 2020. Embedding- based retrieval in facebook search. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2553–2561

2020
[19]

Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. InProceedings of the thirtieth annual ACM symposium on Theory of computing. 604–613

1998
[20]

INRIA. [n.d.]. Datasets for Approximate Nearest Neighbor Search. http://corpus- texmex.irisa.fr/. Accessed: 2025-08-12

2025
[21]

Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. 2023. Atlas: Few-shot learning with retrieval augmented language models. Journal of Machine Learning Research24, 251 (2023), 1–43

2023
[22]

Dingyi Kang, Dongming Jiang, Hanshen Yang, Hang Liu, and Bingzhe Li. 2025. Scalable Disk-Based Approximate Nearest Neighbor Search with Page-Aligned Graph.arXiv preprint arXiv:2509.25487(2025)

work page arXiv 2025
[23]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rock- täschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems33 (2020), 9459–9474

2020
[24]

Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Mingjie Li, Wenjie Zhang, and Xuemin Lin. 2019. Approximate nearest neighbor search on high dimensional data—experiments, analyses, and improvement.IEEE Transactions on Knowledge and Data Engineering32, 8 (2019), 1475–1488

2019
[25]

Haotian Liu, Kilho Son, Jianwei Yang, Ce Liu, Jianfeng Gao, Yong Jae Lee, and Chunyuan Li. 2023. Learning customized visual models with retrieval-augmented knowledge. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15148–15158

2023
[26]

Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov
[27]

Approximate nearest neighbor algorithm based on navigable small world graphs.Information Systems45 (2014), 61–68

2014
[28]

Yu. A. Malkov and D. A. Yashunin. 2016. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. In IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40. 1249– 1265

2016
[29]

Microsoft Research. [n.d.]. Spacev1b: A Billion-Scale Vector Dataset for Text De- scriptors. https://github.com/microsoft/SPTAG/tree/master/datasets/SPACEV1B. Accessed: 2025-08-12

2025
[30]

Jiongkang Ni, Xiaoliang Xu, Yuxiang Wang, Can Li, Jiajie Yao, Shihai Xiao, and Xuecang Zhang. 2023. Diskann++: Efficient page-based search over isomor- phic mapped graph index using query-sensitivity entry vertex.arXiv preprint arXiv:2310.00402(2023)

work page arXiv 2023
[31]

Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, and Yoav Shoham. 2023. In-context retrieval-augmented language models.Transactions of the Association for Computational Linguistics11 (2023), 1316–1331

2023
[32]

Hao Ren, Feng Liu, Pengfei Zhao, Jianliang Wang, and Rong Jin. 2020. HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory. In Advances in Neural Information Processing Systems (NeurIPS)

2020
[33]

Konstantin Schütze, Michael Heinzinger, Martin Steinegger, and Burkhard Rost
[34]

Nearest neighbor search on embeddings rapidly identifies distant protein relations.Frontiers in Bioinformatics2 (2022), 1033775

2022
[35]

Jayaram Subramanya, Devvrit Lovekar, Harsha Vardhan Simhadri, and An- shumali Shrivastava. 2019. DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. InAdvances in Neural Information Process- ing Systems (NeurIPS). 13748–13758. https://papers.nips.cc/paper/2019/file/ 09853c7fb1dc7d9fd451d4980d70d63c-Paper.pdf

2019
[36]

Yukihiro Tagami. 2017. Annexml: Approximate nearest neighbor search for extreme multi-label classification. InProceedings of the 23rd ACM SIGKDD inter- national conference on knowledge discovery and data mining. 455–464

2017
[37]

Bing Tian, Haikun Liu, Yuhang Tang, Shihai Xiao, Zhuohui Duan, Xiaofei Liao, Hai Jin, Xuecang Zhang, Junhua Zhu, and Yu Zhang. 2025. Towards High- throughput and Low-latency Billion-scale Vector Search via{CPU/GPU } Collab- orative Filtering and Re-ranking. In23rd USENIX Conference on File and Storage Technologies (FAST 25). 171–185

2025
[38]

Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xi- angyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, et al. 2021. Milvus: A purpose-built vector data management system. InProceedings of the 2021 international conference on management of data. 2614–2627

2021
[39]

Mengzhao Wang, Weizhi Xu, Xiaomeng Yi, Songlin Wu, Zhangyang Peng, Xi- angyu Ke, Yunjun Gao, Xiaoliang Xu, Rentong Guo, and Charles Xie. 2024. Star- ling: An i/o-efficient disk-resident graph index framework for high-dimensional vector similarity search on data segment.Proceedings of the ACM on Management of Data2, 1 (2024), 1–27

2024
[40]

Chuangxian Wei, Bin Wu, Sheng Wang, Renjie Lou, Chaoqun Zhan, Feifei Li, and Yuanzhe Cai. 2020. Analyticdb-v: A hybrid analytical engine towards query fusion for structured and unstructured data.Proceedings of the VLDB Endowment 13, 12 (2020), 3152–3165

2020
[41]

Ziyang Yue, Bolong Zheng, Ling Xu, Kanru Xu, Shuhao Zhang, Yajuan Du, Yunjun Gao, Xiaofang Zhou, and Christian S Jensen. 2025. Select Edges Wisely: Monotonic Path Aware Graph Layout Optimization for Disk-Based ANN Search. Proceedings of the VLDB Endowment18, 11 (2025), 4337–4349

2025
[42]

Jianjin Zhang, Zheng Liu, Weihao Han, Shitao Xiao, Ruicheng Zheng, Yingxia Shao, Hao Sun, Hanqing Zhu, Premkumar Srinivasan, Weiwei Deng, et al. 2022. Uni-retriever: Towards learning the unified embedding based retriever in bing sponsored search. InProceedings of the 28th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining. 4493–4501

2022
[43]

Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Yingya Zhang, Xiaofeng Ren, and Rong Jin. 2018. Visual search at alibaba. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 993– 1001

2018
[44]

Pengfei Zhao, Jianliang Wang, Hao Ren, and Rong Jin. 2024. SmartANNS: Near- Data Processing for Billion-scale Approximate Nearest Neighbor Search. In USENIX Annual Technical Conference (ATC)

2024
[45]

Xiangyu Zhi, Meng Chen, Xiao Yan, Baotong Lu, Hui Li, Qianxi Zhang, Qi Chen, and James Cheng. 2025. Towards Efficient and Scalable Distributed Vector Search with RDMA.arXiv preprint arXiv:2507.06653(2025)

work page arXiv 2025
[46]

Yijie Zhou, Shengyuan Lin, Shufeng Gong, Song Yu, Shuhao Fan, Yanfeng Zhang, and Ge Yu. 2025. GoVector: An I/O-Efficient Caching Strategy for High- Dimensional Vector Nearest Neighbor Search.arXiv preprint arXiv:2508.15694 (2025)

work page arXiv 2025
[47]

Chun Jiang Zhu, Tan Zhu, Haining Li, Jinbo Bi, and Minghu Song. 2019. Accel- erating large-scale molecular similarity search through exploiting high perfor- mance computing. In2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 330–333

2019

[1] [1]

Akari Asai, Sewon Min, Zexuan Zhong, and Danqi Chen. 2023. Retrieval-based language models and applications. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 6: Tutorial Abstracts). 41–46

2023

[2] [2]

Artem Babenko and Victor Lempitsky. 2016. Efficient Indexing of Billion-Scale Datasets of Deep Descriptors. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2055–2063

2016

[3] [3]

Vishwanath Bijalwan, Vinay Kumar, Pinki Kumari, and Jordan Pascual. 2014. KNN based machine learning approach for text and document mining.Interna- tional Journal of Database Theory and Application7, 1 (2014), 61–70

2014

[4] [4]

Wout Bittremieux, Pieter Meysman, William Stafford Noble, and Kris Laukens

[5] [5]

Fast open modification spectral library searching through approximate nearest neighbor indexing.Journal of proteome research17, 10 (2018), 3463–3474

2018

[6] [6]

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Ruther- ford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bog- dan Damoc, Aidan Clark, et al. 2022. Improving language models by retrieving from trillions of tokens. InInternational conference on machine learning. PMLR, 2206–2240

2022

[7] [7]

Yuan Cao, Heng Qi, Wenrui Zhou, Jien Kato, Keqiu Li, Xiulong Liu, and Jie Gui

[8] [8]

Binary hashing for approximate nearest neighbor search on big data: A survey.IEEE Access6 (2017), 2039–2054

2017

[9] [9]

Qi Chen, Haidong Wang, Mingqin Li, Gang Ren, Scarlett Li, Jeffery Zhu, Jason Li, Chuanjie Liu, Lintao Zhang, and Jingdong Wang. 2018. SPTAG: A library for fast approximate nearest neighbor search

2018

[10] [10]

Thomas Cover and Peter Hart. 1967. Nearest neighbor pattern classification. IEEE transactions on information theory13, 1 (1967), 21–27

1967

[11] [11]

Shiyuan Deng, Xiao Yan, KW Ng Kelvin, Chenyu Jiang, and James Cheng. 2019. Pyramid: A general framework for distributed similarity search on large-scale datasets. In2019 IEEE International Conference on Big Data (Big Data). IEEE, 1066–1071

2019

[12] [12]

Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, et al. 1995. Query by image and video content: The QBIC system.computer28, 9 (1995), 23–32. LAANN: I/O-Aware Look-Ahead Search for Disk-Based Approximate Nearest Neighbor Search

1995

[13] [13]

Cong Fu and Deng Cai. 2019. Efficient Approximate Nearest Neighbor Search with the Navigable Small World Graph. InIEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41. 1102–1116

2019

[14] [14]

Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2017. Fast approximate nearest neighbor search with the navigating spreading-out graph.arXiv preprint arXiv:1707.00143(2017)

work page arXiv 2017

[15] [15]

Hao Guo and Youyou Lu. 2025. Achieving Low-Latency Graph-Based Vec- tor Search via Aligning Best-First Search Algorithm with SSD (PipeANN). In Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’25). Boston, MA, USA. Available as Open Access

2025

[16] [16]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang

[17] [17]

InInternational confer- ence on machine learning

Retrieval augmented language model pre-training. InInternational confer- ence on machine learning. PMLR, 3929–3938

[18] [18]

Jui-Ting Huang, Ashish Sharma, Shuying Sun, Li Xia, David Zhang, Philip Pronin, Janani Padmanabhan, Giuseppe Ottaviano, and Linjun Yang. 2020. Embedding- based retrieval in facebook search. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2553–2561

2020

[19] [19]

Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. InProceedings of the thirtieth annual ACM symposium on Theory of computing. 604–613

1998

[20] [20]

INRIA. [n.d.]. Datasets for Approximate Nearest Neighbor Search. http://corpus- texmex.irisa.fr/. Accessed: 2025-08-12

2025

[21] [21]

Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. 2023. Atlas: Few-shot learning with retrieval augmented language models. Journal of Machine Learning Research24, 251 (2023), 1–43

2023

[22] [22]

Dingyi Kang, Dongming Jiang, Hanshen Yang, Hang Liu, and Bingzhe Li. 2025. Scalable Disk-Based Approximate Nearest Neighbor Search with Page-Aligned Graph.arXiv preprint arXiv:2509.25487(2025)

work page arXiv 2025

[23] [23]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rock- täschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems33 (2020), 9459–9474

2020

[24] [24]

Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Mingjie Li, Wenjie Zhang, and Xuemin Lin. 2019. Approximate nearest neighbor search on high dimensional data—experiments, analyses, and improvement.IEEE Transactions on Knowledge and Data Engineering32, 8 (2019), 1475–1488

2019

[25] [25]

Haotian Liu, Kilho Son, Jianwei Yang, Ce Liu, Jianfeng Gao, Yong Jae Lee, and Chunyuan Li. 2023. Learning customized visual models with retrieval-augmented knowledge. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15148–15158

2023

[26] [26]

Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov

[27] [27]

Approximate nearest neighbor algorithm based on navigable small world graphs.Information Systems45 (2014), 61–68

2014

[28] [28]

Yu. A. Malkov and D. A. Yashunin. 2016. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. In IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40. 1249– 1265

2016

[29] [29]

Microsoft Research. [n.d.]. Spacev1b: A Billion-Scale Vector Dataset for Text De- scriptors. https://github.com/microsoft/SPTAG/tree/master/datasets/SPACEV1B. Accessed: 2025-08-12

2025

[30] [30]

Jiongkang Ni, Xiaoliang Xu, Yuxiang Wang, Can Li, Jiajie Yao, Shihai Xiao, and Xuecang Zhang. 2023. Diskann++: Efficient page-based search over isomor- phic mapped graph index using query-sensitivity entry vertex.arXiv preprint arXiv:2310.00402(2023)

work page arXiv 2023

[31] [31]

Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, and Yoav Shoham. 2023. In-context retrieval-augmented language models.Transactions of the Association for Computational Linguistics11 (2023), 1316–1331

2023

[32] [32]

Hao Ren, Feng Liu, Pengfei Zhao, Jianliang Wang, and Rong Jin. 2020. HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory. In Advances in Neural Information Processing Systems (NeurIPS)

2020

[33] [33]

Konstantin Schütze, Michael Heinzinger, Martin Steinegger, and Burkhard Rost

[34] [34]

Nearest neighbor search on embeddings rapidly identifies distant protein relations.Frontiers in Bioinformatics2 (2022), 1033775

2022

[35] [35]

Jayaram Subramanya, Devvrit Lovekar, Harsha Vardhan Simhadri, and An- shumali Shrivastava. 2019. DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. InAdvances in Neural Information Process- ing Systems (NeurIPS). 13748–13758. https://papers.nips.cc/paper/2019/file/ 09853c7fb1dc7d9fd451d4980d70d63c-Paper.pdf

2019

[36] [36]

Yukihiro Tagami. 2017. Annexml: Approximate nearest neighbor search for extreme multi-label classification. InProceedings of the 23rd ACM SIGKDD inter- national conference on knowledge discovery and data mining. 455–464

2017

[37] [37]

Bing Tian, Haikun Liu, Yuhang Tang, Shihai Xiao, Zhuohui Duan, Xiaofei Liao, Hai Jin, Xuecang Zhang, Junhua Zhu, and Yu Zhang. 2025. Towards High- throughput and Low-latency Billion-scale Vector Search via{CPU/GPU } Collab- orative Filtering and Re-ranking. In23rd USENIX Conference on File and Storage Technologies (FAST 25). 171–185

2025

[38] [38]

Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xi- angyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, et al. 2021. Milvus: A purpose-built vector data management system. InProceedings of the 2021 international conference on management of data. 2614–2627

2021

[39] [39]

Mengzhao Wang, Weizhi Xu, Xiaomeng Yi, Songlin Wu, Zhangyang Peng, Xi- angyu Ke, Yunjun Gao, Xiaoliang Xu, Rentong Guo, and Charles Xie. 2024. Star- ling: An i/o-efficient disk-resident graph index framework for high-dimensional vector similarity search on data segment.Proceedings of the ACM on Management of Data2, 1 (2024), 1–27

2024

[40] [40]

Chuangxian Wei, Bin Wu, Sheng Wang, Renjie Lou, Chaoqun Zhan, Feifei Li, and Yuanzhe Cai. 2020. Analyticdb-v: A hybrid analytical engine towards query fusion for structured and unstructured data.Proceedings of the VLDB Endowment 13, 12 (2020), 3152–3165

2020

[41] [41]

Ziyang Yue, Bolong Zheng, Ling Xu, Kanru Xu, Shuhao Zhang, Yajuan Du, Yunjun Gao, Xiaofang Zhou, and Christian S Jensen. 2025. Select Edges Wisely: Monotonic Path Aware Graph Layout Optimization for Disk-Based ANN Search. Proceedings of the VLDB Endowment18, 11 (2025), 4337–4349

2025

[42] [42]

Jianjin Zhang, Zheng Liu, Weihao Han, Shitao Xiao, Ruicheng Zheng, Yingxia Shao, Hao Sun, Hanqing Zhu, Premkumar Srinivasan, Weiwei Deng, et al. 2022. Uni-retriever: Towards learning the unified embedding based retriever in bing sponsored search. InProceedings of the 28th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining. 4493–4501

2022

[43] [43]

Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Yingya Zhang, Xiaofeng Ren, and Rong Jin. 2018. Visual search at alibaba. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 993– 1001

2018

[44] [44]

Pengfei Zhao, Jianliang Wang, Hao Ren, and Rong Jin. 2024. SmartANNS: Near- Data Processing for Billion-scale Approximate Nearest Neighbor Search. In USENIX Annual Technical Conference (ATC)

2024

[45] [45]

Xiangyu Zhi, Meng Chen, Xiao Yan, Baotong Lu, Hui Li, Qianxi Zhang, Qi Chen, and James Cheng. 2025. Towards Efficient and Scalable Distributed Vector Search with RDMA.arXiv preprint arXiv:2507.06653(2025)

work page arXiv 2025

[46] [46]

Yijie Zhou, Shengyuan Lin, Shufeng Gong, Song Yu, Shuhao Fan, Yanfeng Zhang, and Ge Yu. 2025. GoVector: An I/O-Efficient Caching Strategy for High- Dimensional Vector Nearest Neighbor Search.arXiv preprint arXiv:2508.15694 (2025)

work page arXiv 2025

[47] [47]

Chun Jiang Zhu, Tan Zhu, Haining Li, Jinbo Bi, and Minghu Song. 2019. Accel- erating large-scale molecular similarity search through exploiting high perfor- mance computing. In2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 330–333

2019