Passing the Baton: High Throughput Distributed Disk-Based Vector Search with BatANN

Ben Landrum (1); Ken Birman (1) ((1) Cornell University); Nam Anh Dang (1)

arxiv: 2512.09331 · v2 · submitted 2025-12-10 · 💻 cs.DC · cs.IR

Passing the Baton: High Throughput Distributed Disk-Based Vector Search with BatANN

Nam Anh Dang (1) , Ben Landrum (1) , Ken Birman (1) ((1) Cornell University) This is my paper

Pith reviewed 2026-05-16 23:57 UTC · model grok-4.3

classification 💻 cs.DC cs.IR

keywords distributed vector searchapproximate nearest neighbordisk-based ANNhigh throughputglobal graphquery forwardingscalabilitythroughput scaling

0 comments

The pith

BatANN distributes vector search by forwarding full query states to keep one global graph across servers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces BatANN as a distributed system for disk-based approximate nearest neighbor search on datasets too large for a single server. It preserves the efficiency of one unified graph by sending the complete current query state to whichever server stores the next graph neighborhood, allowing the search to continue locally there instead of scattering partial work. This yields substantially higher throughput than prior distributed approaches while holding latency low, all over ordinary TCP links. A sympathetic reader would care because vector search powers retrieval in large AI systems, and single-machine limits are already being hit on billion-scale collections.

Core claim

BatANN is a distributed disk-based approximate nearest neighbor system that retains the logarithmic search efficiency of a single global graph. When a query needs data stored on another machine, the full state of the query is forwarded so execution continues locally on that machine, delivering near-linear throughput scaling with the number of servers.

What carries the argument

Full-state query forwarding that sends the entire current search state to the remote server holding the next graph neighborhood, preserving locality during distributed execution.

If this is right

On 1B-point datasets at 0.95 recall using 10 servers, BatANN reaches 3.5-5.59x the throughput of scatter-gather baselines.
It reaches 1.44-2.09x the throughput of DistributedANN on the same workloads.
Mean latency remains below 3 ms.
All results are obtained over standard TCP without specialized networking hardware.
It is the first open-source distributed disk-based vector search system that uses one global graph.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The forwarding approach could support datasets in the tens of billions simply by adding more standard servers without rebuilding the index.
Similar state-passing ideas might apply to other graph-based retrieval methods that currently rely on separate per-server indexes.
Network costs for full-state forwarding appear lower than expected for these workloads, opening room for further scaling experiments on wider clusters.

Load-bearing premise

Partitioning a single global graph across disks on multiple servers preserves recall and search efficiency without meaningful loss.

What would settle it

Running the system on a 1B-vector dataset with 10 servers and observing either recall below 0.95 or throughput scaling below linear with added servers would show the central claim does not hold.

Figures

Figures reproduced from arXiv: 2512.09331 by Ben Landrum (1), Ken Birman (1) ((1) Cornell University), Nam Anh Dang (1).

**Figure 1.** Figure 1: Scatter–Gather vs. Global Index Many distributed vector search systems employ some variant of a scatter–gather paradigm. As seen in [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: System overview exotic today. We target a more common case, where nodes are connected by commodity networking hardware and communicate using TCP, but in doing so must embrace the sweet spot for this protocol, which can sustain very high throughput but is not ideal for round-trip interactions. Accordingly, our design prioritizes reducing the amount of time queries spend waiting on communication, in much t… view at source ↗

**Figure 3.** Figure 3: Number of hops vs. inter-partition hops for BI [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of inter-partition hops between [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 6.** Figure 6: The system runs a dedicated ZeroMQ receiver thread [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 5.** Figure 5: Comparison of Distance Computations and Disk [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 7.** Figure 7: QPS recall graph of different search methods on [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: QPS recall curve of BIGANN, MSSPACEV on 5 and [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: QPS recall curve of BIGANN, MSSPACEV, DEEP on 5 and 10 servers for 100M. BatANN outperforms ScatterGather on [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

**Figure 10.** Figure 10: Distance comparisons and I/O vs. recall plot of [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗

**Figure 12.** Figure 12: Latency Recall curve of BIGANN 1B on 5, 10 server [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗

**Figure 14.** Figure 14: Throughput and latency comparison for 𝑊 = 1, 8 for BIGANN 1B on 10 servers. 𝑊 = 8 has higher performance across regimes with respect to both metrics. heuristic to detect which phase the search is currently in and adjust the I/O pipeline accordingly. Further optimizations for single-node DiskANN focus on reorganizing vector layout on disk. DiskANN stores vectors contiguously by ID, which provides no local… view at source ↗

read the original abstract

Vector search underpins modern information-retrieval systems, including retrieval-augmented generation (RAG) pipelines and search engines over unstructured text and images. As datasets scale to billions of vectors, disk-based vector search has emerged as a practical solution. However, looking to the future, we must anticipate datasets too large for any single server and throughput demands that exceed the limits of locally attached SSDs. We present BatANN, a distributed disk-based approximate nearest neighbor (ANN) system that retains the logarithmic search efficiency of a single global graph while achieving near-linear throughput scaling in the number of servers. Our core innovation is that when accessing a neighborhood which is stored on another machine, we send the full state of the query to the other machine to continue executing there for improved locality. On 1B-point datasets at 0.95 recall using 10 servers, BatANN achieves 3.5-5.59x of the scatter-gather baseline and 1.44-2.09x the throughput of DistributedANN, respectively, while maintaining mean latency below 3 ms. Moreover, we get these results on standard TCP. To our knowledge, BatANN is the first open-source distributed disk-based vector search system to operate over a single global graph.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BatANN's baton-passing keeps a single global graph across servers for disk-based ANN and reports clear throughput gains on 1B data, but the overhead from hops and state transfers stays unmeasured.

read the letter

This paper from the Cornell group presents BatANN, a system for high-throughput distributed disk-based vector search. The key new piece is the baton-passing method: when a query needs to access a neighborhood on another server, they send the full current state of the search — candidates, visited nodes, distance bounds — over to continue there. This keeps the search on a single global graph rather than breaking it up or using multiple copies. They report solid gains. On 1B-point datasets at 0.95 recall with 10 servers, BatANN gets 3.5-5.59x the throughput of scatter-gather approaches and 1.44-2.09x over DistributedANN, with mean latency under 3 ms using plain TCP. Being the first open-source system to do single-global-graph distributed disk ANN is a plus for the field. The strength here is the practical focus on scaling I/O-bound searches across servers while trying to hold onto the efficiency of a unified graph structure. Forwarding the state instead of just aggregating results at the end is a straightforward way to improve locality. Where it is softer is in the lack of supporting measurements for the assumptions. The stress-test point is on target: we need to know how often queries cross partitions and how much data moves with each baton pass. Without average hop counts, state transfer sizes, or partition quality metrics from the 1B experiments, it's difficult to see how general the scaling will be or whether latency stays low under different network or partitioning conditions. The abstract also skips error bars and full experimental methodology, which makes the numbers harder to evaluate independently. This paper is aimed at researchers and engineers working on large-scale approximate nearest neighbor search for applications like RAG and web search. Anyone building or tuning distributed vector databases would get concrete ideas from the baton mechanism and the comparisons. I would recommend sending it for peer review. The idea addresses a real bottleneck, the results are promising for the scale they tested, and a review process could push for the missing details on overheads to make the contribution clearer.

Referee Report

2 major / 1 minor

Summary. The manuscript presents BatANN, a distributed disk-based approximate nearest neighbor search system that partitions a single global graph across multiple servers. The core mechanism forwards the full query state (candidates, visited set, distance bounds) to the remote server when a neighborhood access crosses a partition boundary, aiming to preserve logarithmic search efficiency while achieving near-linear throughput scaling. On 1B-point datasets at 0.95 recall with 10 servers, it reports 3.5-5.59× throughput over a scatter-gather baseline and 1.44-2.09× over DistributedANN, with mean latency below 3 ms using standard TCP; it claims to be the first open-source distributed disk-based vector search system operating over a single global graph.

Significance. If the reported speedups and latency hold under the baton-passing model, the work would provide a practical path to scaling disk-based ANN beyond single-server SSD limits while retaining the efficiency advantages of a unified graph. The empirical gains on commodity TCP and the open-source positioning are concrete strengths that could influence large-scale retrieval pipelines in RAG and search engines.

major comments (2)

[Abstract] Abstract: The headline throughput claims (3.5-5.59× over scatter-gather and 1.44-2.09× over DistributedANN) and sub-3 ms mean latency rest on the assumption that cross-partition traversals remain infrequent and that full-state forwarding incurs negligible overhead relative to SSD reads; however, the manuscript provides no measurements of average hops per query, serialized state size transferred per hop, or partition-cut sizes for the 1B-point experiments.
[Experimental results] Experimental results: Performance numbers are reported for specific recall targets and server counts without error bars, variance across runs, or detailed methodology for how the global graph is partitioned and assigned to disks, which leaves the near-linear scaling claim difficult to evaluate.

minor comments (1)

[Abstract] Abstract: The claim that BatANN is 'the first open-source' system of its kind should be supported by a more explicit comparison table or discussion in the related-work section rather than left as a knowledge claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to incorporate additional measurements and methodological details.

read point-by-point responses

Referee: [Abstract] Abstract: The headline throughput claims (3.5-5.59× over scatter-gather and 1.44-2.09× over DistributedANN) and sub-3 ms mean latency rest on the assumption that cross-partition traversals remain infrequent and that full-state forwarding incurs negligible overhead relative to SSD reads; however, the manuscript provides no measurements of average hops per query, serialized state size transferred per hop, or partition-cut sizes for the 1B-point experiments.

Authors: We agree that explicit measurements of cross-partition behavior would strengthen the claims. In the revised version we will add a dedicated paragraph and table in the experimental evaluation section reporting, for the 1B-point workloads: (1) average cross-partition hops per query, (2) average serialized state size transferred per hop, and (3) partition-cut statistics (number of edges crossing partitions). These quantities will be collected under the exact same configuration and recall targets used for the throughput results, allowing readers to quantify the overhead of baton passing relative to SSD reads. revision: yes
Referee: [Experimental results] Experimental results: Performance numbers are reported for specific recall targets and server counts without error bars, variance across runs, or detailed methodology for how the global graph is partitioned and assigned to disks, which leaves the near-linear scaling claim difficult to evaluate.

Authors: We acknowledge that the current presentation lacks error bars and a sufficiently detailed partitioning description. In the revision we will (1) add standard-deviation error bars to all throughput and latency plots, computed over at least five independent runs, and (2) expand the experimental setup subsection to describe the graph partitioning algorithm, the objective used to minimize cut size, and the mapping of partitions to disks/servers. These additions will make the near-linear scaling results easier to reproduce and evaluate. revision: yes

Circularity Check

0 steps flagged

No circularity: performance claims are purely empirical benchmarks against external baselines

full rationale

The paper describes a distributed ANN system and reports measured throughput and latency numbers on 1B-point datasets (3.5-5.59x over scatter-gather, 1.44-2.09x over DistributedANN, <3 ms mean latency). These are direct experimental comparisons with no equations, fitted parameters, self-referential definitions, or derivation steps that reduce to the paper's own inputs. The single-global-graph partitioning assumption is stated as a design choice and evaluated empirically; it is not justified by any self-citation chain or ansatz that collapses into the claimed result. No load-bearing mathematical step exists to inspect for circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard distributed-systems assumptions about network latency, disk I/O, and graph partitioning; no free parameters or invented entities are introduced beyond the new system design itself.

axioms (1)

domain assumption Network and disk performance characteristics allow effective full-query-state forwarding without prohibitive overhead
Invoked to support near-linear scaling and sub-3ms latency claims on standard TCP

pith-pipeline@v0.9.0 · 5539 in / 1249 out tokens · 51075 ms · 2026-05-16T23:57:00.918084+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

core innovation is that when accessing a neighborhood which is stored on another machine, we send the full state of the query to the other machine

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 1 internal anchor

[1]

M., and Simhadri, H

Adams, P., Li, M., Zhang, S., Tan, L., Chen, Q., Li, M., Li, Z., Risvik, K. M., and Simhadri, H. V.DistributedANN: Efficient Scaling of a Single DiskANN Graph Across Thousands of Computers. InThe 1st Workshop on Vector Databases(July 2025)

work page 2025
[2]

C., Krishnamurthy, A

Ahalt, S. C., Krishnamurthy, A. K., Chen, P., and Melton, D. E.Competitive learning algorithms for vector quantization.Neural Networks 3, 3 (Jan. 1990), 277–290. 11

work page 1990
[3]

L.Quicker ADC : Unlocking the hidden potential of Product Quantization with SIMD.IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 5 (May 2021), 1666–1677

André, F., Kermarrec, A.-M., and Scouarnec, N. L.Quicker ADC : Unlocking the hidden potential of Product Quantization with SIMD.IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 5 (May 2021), 1666–1677

work page 2021
[4]

M.Approximate nearest neighbor queries in fixed dimensions

Arya, S., and Mount, D. M.Approximate nearest neighbor queries in fixed dimensions. InProceedings of the fourth annual ACM-SIAM symposium on discrete algorithms(Austin, Texas, USA, 1993), Soda ’93, Society for Industrial and Applied Mathematics, pp. 271–280. Number of pages: 10 tex.address: USA

work page 1993
[5]

2020), 101374

Aumüller, M., Bernhardsson, E., and Faithfull, A.ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms.Information Systems 87(Jan. 2020), 101374

work page 2020
[6]

L.Searching in metric spaces.Acm Computing Surveys 33, 3 (Sept

Chávez, E., Navarro, G., Baeza-Y ates, R., and Marroqín, J. L.Searching in metric spaces.Acm Computing Surveys 33, 3 (Sept. 2001), 273–321

work page 2001
[7]

Deng, S., Yan, X., Ng, K. K. W., Jiang, C., and Cheng, J.Pyramid: A General Framework for Distributed Similarity Search, June 2019. arXiv:1906.10602 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2019
[8]

Desrochers, C.A Fast General Purpose Lock-Free Queue for C++.moody- camel.com(2014)

work page 2014
[9]

S., Rodriguez, A., Kulaw- iak, D

Dilocker, E., van Luijt, B., Voorbach, B., Hasan, M. S., Rodriguez, A., Kulaw- iak, D. A., Antas, M., and Duckworth, P.Weaviate, Nov. 2025. original-date: 2016-03-30T15:03:17Z

work page 2025
[10]

InProceedings of the 20th International Conference on World Wide Web(2011), ACM, pp

Dong, W., Moses, C., and Li, K.Efficient k-nearest neighbor graph construction for generic similarity measures. InProceedings of the 20th International Conference on World Wide Web(2011), ACM, pp. 577–586

work page 2011
[11]

InProceedings of the USENIX annual technical conference (ATC)(July 2019), pp

Duplyakin, D., Ricci, R., Maricq, A., Wong, G., Duerig, J., Eide, E., Stoller, L., Hibler, M., Johnson, D., Webb, K., Akella, A., W ang, K., Ricart, G., Landwe- ber, L., Elliott, C., Zink, M., and Cecchet, E.The design and operation of CloudLab. InProceedings of the USENIX annual technical conference (ATC)(July 2019), pp. 1–14

work page 2019
[12]

Gottesbüren, L., Dhulipala, L., Jayaram, R., and Lacki, J.Unleash- ing Graph Partitioning for Large-Scale Nearest Neighbor Search, Mar. 2024. arXiv:2403.01797

work page arXiv 2024
[13]

In19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25)(2025), pp

Guo, H., and Lu, Y.Achieving {Low-Latency} {Graph-Based} Vector Search via Aligning {Best-First} Search Algorithm with {SSD}. In19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25)(2025), pp. 171–186

work page 2025
[14]

Guo, H., and Lu, Y.OdinANN: Direct Insert for Consistently Stable Performance in Billion-Scale Graph-Based Vector Search.24th USENIX Conference on File and Storage Technologies (FAST ’26)(2026)

work page 2026
[15]

In Proceedings of the 22nd ACM international conference on Information & Knowledge Management(San Francisco California USA, Oct

Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., and Heck, L.Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management(San Francisco California USA, Oct. 2013), ACM, pp. 2333–2338

work page 2013
[16]

InThe 1st workshop on vector databases (2025)

Ingber, A., and Liberty, E.Accurate and efficient metadata filtering in pinecone’s serverless vector database. InThe 1st workshop on vector databases (2025)

work page 2025
[17]

In2023 USENIX Annual Technical Conference (USENIX ATC 23)(2023), USENIX Association, pp

Jang, J., Choi, H., Bae, H., Lee, S., Kwon, M., and Jung, M.CXL-ANNS: Software- Hardware collaborative memory disaggregation and computation for Billion- Scale approximate nearest neighbor search. In2023 USENIX Annual Technical Conference (USENIX ATC 23)(2023), USENIX Association, pp. 585–600

work page 2023
[18]

V., Krishnawamy, R., and Kadekodi, R.DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node

Jayaram Subramanya, S., Devvrit, F., Simhadri, H. V., Krishnawamy, R., and Kadekodi, R.DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. InAdvances in Neural Information Processing Systems(2019), vol. 32, Curran Associates, Inc

work page 2019
[19]

V., Zink, S., and Birman, K

Jha, S., Behrens, J., Gkountouvas, T., Milano, M., Song, W., Tremel, E., Re- nesse, R. V., Zink, S., and Birman, K. P.Derecho: Fast state machine replication for cloud services.ACM Trans. Comput. Syst. 36, 2 (Apr. 2019)

work page 2019
[20]

Johnson, J., Douze, M., and Jégou, H.Billion-Scale Similarity Search with GPUs.IEEE Transactions on Big Data 7, 3 (July 2021), 535–547

work page 2021
[21]

2011), 117–128

Jégou, H., Douze, M., and Schmid, C.Product Quantization for Nearest Neighbor Search.IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (Jan. 2011), 117–128

work page 2011
[22]

InAdvances in neural information processing systems(2020), H

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., Riedel, S., and Kiela, D.Retrieval- augmented generation for knowledge-intensive NLP tasks. InAdvances in neural information processing systems(2020), H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33, Cur...

work page 2020
[23]

D., Shen, Z., Blelloch, G., Dhulipala, L., Gu, Y., Simhadri, H

Manohar, M. D., Shen, Z., Blelloch, G., Dhulipala, L., Gu, Y., Simhadri, H. V., and Sun, Y.ParlayANN: Scalable and deterministic parallel graph-based approximate nearest neighbor search algorithms. InProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (New York, NY, USA, 2024), PPoPP ’24, Association for...

work page 2024
[24]

Oakley, J., and Ferhatosmanoglu, H.SQUASH: Serverless and Dis- tributed Quantization-based Attributed Vector Similarity Search, Feb. 2025. arXiv:2502.01528 [cs]

work page arXiv 2025
[25]

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., and Sutskever, I.Learning Transferable Visual Models From Natural Language Supervision. InProceedings of the 38th International Conference on Machine Learning(July 2021), PMLR, pp. 8748–8763. ISSN: 2640-3498

work page 2021
[26]

V., Aumüller, M., Ingber, A., Douze, M., Williams, G., Manohar, M

Simhadri, H. V., Aumüller, M., Ingber, A., Douze, M., Williams, G., Manohar, M. D., Baranchuk, D., Liberty, E., Liu, F., Landrum, B., Karjikar, M., Dhuli- pala, L., Chen, M., Chen, Y., Ma, R., Zhang, K., Cai, Y., Shi, J., Chen, Y., Zheng, W., W an, Z., Yin, J., and Huang, B.Results of the big ANN: NeurIPS’23 compe- tition, 2024. arXiv: 2409.17424 [cs.IR]

work page arXiv 2024
[27]

FreshDiskANN: A fast and accurate graph-based ANN index for streaming similarity search.arXiv preprint arXiv:2105.09613,

Singh, A., Subramanya, S. J., Krishnaswamy, R., and Simhadri, H. V. FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search, May 2021. arXiv:2105.09613 [cs]

work page arXiv 2021
[28]

Tatsuno, K., Miyashita, D., Ikeda, T., Ishiyama, K., Sumiyoshi, K., and Deguchi, J.AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval, Feb. 2025. arXiv:2404.06004 [cs]. [29]The ZeroMQ Community. ZeroMQ: The intelligent transport layer, 2025

work page arXiv 2025
[29]

V., Sundar, H

Upreti, N., Simhadri, H. V., Sundar, H. S., Sundaram, K., Boshra, S., Perumal- swamy, B., Atri, S., Chisholm, M., Singh, R. R., Y ang, G., Hass, T., Dudhey, N., Pattipaka, S., Hildebrand, M., Manohar, M., Moffitt, J., Xu, H., Datha, N., Gupta, S., Krishnaswamy, R., Gupta, P., Sahu, A., V arada, H., Barthwal, S., Mor, R., Codella, J., Cooper, S., Pilch, K....

work page 2025
[30]

InProceedings of the 2021 International Conference on Management of Data(Virtual Event China, June 2021), ACM, pp

Wang, J., Yi, X., Guo, R., Jin, H., Xu, P., Li, S., Wang, X., Guo, X., Li, C., Xu, X., Yu, K., Yuan, Y., Zou, Y., Long, J., Cai, Y., Li, Z., Zhang, Z., Mo, Y., Gu, J., Jiang, R., Wei, Y., and Xie, C.Milvus: A Purpose-Built Vector Data Management System. InProceedings of the 2021 International Conference on Management of Data(Virtual Event China, June 2021...

work page 2021
[31]

Starling: An i/o- efficient disk-resident graph index framework for high- dimensional vector similarity search on data segment

W ang, M., Xu, W., Yi, X., Wu, S., Peng, Z., Ke, X., Gao, Y., Xu, X., Guo, R., and Xie, C.Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment.Proceedings of the ACM on Management of Data 2, 1 (Mar. 2024), 1–27. arXiv:2401.02116 [cs]

work page arXiv 2024
[32]

In-place updates of a graph in- dex for streaming approximate nearest neighbor search

Xu, H., Manohar, M. D., Bernstein, P. A., Chandramouli, B., Wen, R., and Simhadri, H. V.In-place updates of a graph index for streaming approximate nearest neighbor search.CoRR abs/2502.13826(Feb. 2025)

work page arXiv 2025
[33]

Xu, X., Wang, M., Wang, Y., and Ma, D.Two-stage routing with optimized guided search and greedy algorithm on proximity graph.Knowledge-Based Systems 229(2021), 107305

work page 2021
[34]

SPFresh: Incremental in-place update for billion-scale vector search

Xu, Y., Liang, H., Li, J., Xu, S., Chen, Q., Zhang, Q., Li, C., Yang, Z., Yang, F., Yang, Y., and others. SPFresh: Incremental in-place update for billion-scale vector search. InProceedings of the 29th symposium on operating systems principles (2023), pp. 545–561

work page 2023
[35]

Yu. A. Malkov, Yu. A. Malkov, Malkov, Y. A., D. A. Y ashunin, and Y ashunin, D. A.Efficient and Robust Approximate Nearest Neighbor Search Using Hierar- chical Navigable Small World Graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 4 (Apr. 2020), 824–836

work page 2020
[36]

In17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)(Boston, MA, July 2023), USENIX Association, pp

Zhang, Q., Xu, S., Chen, Q., Sui, G., Xie, J., Cai, Z., Chen, Y., He, Y., Yang, Y., Y ang, F., Y ang, M., and Zhou, L.VBASE: Unifying online vector similarity search and relational queries via relaxed monotonicity. In17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)(Boston, MA, July 2023), USENIX Association, pp. 377–395

work page 2023
[37]

Towards Efficient and Scalable Distributed Vector Search with RDMA, July 2025

Zhi, X., Chen, M., Yan, X., Lu, B., Li, H., Zhang, Q., Chen, Q., and Cheng, J. Towards Efficient and Scalable Distributed Vector Search with RDMA, July 2025. arXiv:2507.06653 [cs]. 12

work page arXiv 2025

[1] [1]

M., and Simhadri, H

Adams, P., Li, M., Zhang, S., Tan, L., Chen, Q., Li, M., Li, Z., Risvik, K. M., and Simhadri, H. V.DistributedANN: Efficient Scaling of a Single DiskANN Graph Across Thousands of Computers. InThe 1st Workshop on Vector Databases(July 2025)

work page 2025

[2] [2]

C., Krishnamurthy, A

Ahalt, S. C., Krishnamurthy, A. K., Chen, P., and Melton, D. E.Competitive learning algorithms for vector quantization.Neural Networks 3, 3 (Jan. 1990), 277–290. 11

work page 1990

[3] [3]

L.Quicker ADC : Unlocking the hidden potential of Product Quantization with SIMD.IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 5 (May 2021), 1666–1677

André, F., Kermarrec, A.-M., and Scouarnec, N. L.Quicker ADC : Unlocking the hidden potential of Product Quantization with SIMD.IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 5 (May 2021), 1666–1677

work page 2021

[4] [4]

M.Approximate nearest neighbor queries in fixed dimensions

Arya, S., and Mount, D. M.Approximate nearest neighbor queries in fixed dimensions. InProceedings of the fourth annual ACM-SIAM symposium on discrete algorithms(Austin, Texas, USA, 1993), Soda ’93, Society for Industrial and Applied Mathematics, pp. 271–280. Number of pages: 10 tex.address: USA

work page 1993

[5] [5]

2020), 101374

Aumüller, M., Bernhardsson, E., and Faithfull, A.ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms.Information Systems 87(Jan. 2020), 101374

work page 2020

[6] [6]

L.Searching in metric spaces.Acm Computing Surveys 33, 3 (Sept

Chávez, E., Navarro, G., Baeza-Y ates, R., and Marroqín, J. L.Searching in metric spaces.Acm Computing Surveys 33, 3 (Sept. 2001), 273–321

work page 2001

[7] [7]

Deng, S., Yan, X., Ng, K. K. W., Jiang, C., and Cheng, J.Pyramid: A General Framework for Distributed Similarity Search, June 2019. arXiv:1906.10602 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2019

[8] [8]

Desrochers, C.A Fast General Purpose Lock-Free Queue for C++.moody- camel.com(2014)

work page 2014

[9] [9]

S., Rodriguez, A., Kulaw- iak, D

Dilocker, E., van Luijt, B., Voorbach, B., Hasan, M. S., Rodriguez, A., Kulaw- iak, D. A., Antas, M., and Duckworth, P.Weaviate, Nov. 2025. original-date: 2016-03-30T15:03:17Z

work page 2025

[10] [10]

InProceedings of the 20th International Conference on World Wide Web(2011), ACM, pp

Dong, W., Moses, C., and Li, K.Efficient k-nearest neighbor graph construction for generic similarity measures. InProceedings of the 20th International Conference on World Wide Web(2011), ACM, pp. 577–586

work page 2011

[11] [11]

InProceedings of the USENIX annual technical conference (ATC)(July 2019), pp

Duplyakin, D., Ricci, R., Maricq, A., Wong, G., Duerig, J., Eide, E., Stoller, L., Hibler, M., Johnson, D., Webb, K., Akella, A., W ang, K., Ricart, G., Landwe- ber, L., Elliott, C., Zink, M., and Cecchet, E.The design and operation of CloudLab. InProceedings of the USENIX annual technical conference (ATC)(July 2019), pp. 1–14

work page 2019

[12] [12]

Gottesbüren, L., Dhulipala, L., Jayaram, R., and Lacki, J.Unleash- ing Graph Partitioning for Large-Scale Nearest Neighbor Search, Mar. 2024. arXiv:2403.01797

work page arXiv 2024

[13] [13]

In19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25)(2025), pp

Guo, H., and Lu, Y.Achieving {Low-Latency} {Graph-Based} Vector Search via Aligning {Best-First} Search Algorithm with {SSD}. In19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25)(2025), pp. 171–186

work page 2025

[14] [14]

Guo, H., and Lu, Y.OdinANN: Direct Insert for Consistently Stable Performance in Billion-Scale Graph-Based Vector Search.24th USENIX Conference on File and Storage Technologies (FAST ’26)(2026)

work page 2026

[15] [15]

In Proceedings of the 22nd ACM international conference on Information & Knowledge Management(San Francisco California USA, Oct

Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., and Heck, L.Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management(San Francisco California USA, Oct. 2013), ACM, pp. 2333–2338

work page 2013

[16] [16]

InThe 1st workshop on vector databases (2025)

Ingber, A., and Liberty, E.Accurate and efficient metadata filtering in pinecone’s serverless vector database. InThe 1st workshop on vector databases (2025)

work page 2025

[17] [17]

In2023 USENIX Annual Technical Conference (USENIX ATC 23)(2023), USENIX Association, pp

Jang, J., Choi, H., Bae, H., Lee, S., Kwon, M., and Jung, M.CXL-ANNS: Software- Hardware collaborative memory disaggregation and computation for Billion- Scale approximate nearest neighbor search. In2023 USENIX Annual Technical Conference (USENIX ATC 23)(2023), USENIX Association, pp. 585–600

work page 2023

[18] [18]

V., Krishnawamy, R., and Kadekodi, R.DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node

Jayaram Subramanya, S., Devvrit, F., Simhadri, H. V., Krishnawamy, R., and Kadekodi, R.DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. InAdvances in Neural Information Processing Systems(2019), vol. 32, Curran Associates, Inc

work page 2019

[19] [19]

V., Zink, S., and Birman, K

Jha, S., Behrens, J., Gkountouvas, T., Milano, M., Song, W., Tremel, E., Re- nesse, R. V., Zink, S., and Birman, K. P.Derecho: Fast state machine replication for cloud services.ACM Trans. Comput. Syst. 36, 2 (Apr. 2019)

work page 2019

[20] [20]

Johnson, J., Douze, M., and Jégou, H.Billion-Scale Similarity Search with GPUs.IEEE Transactions on Big Data 7, 3 (July 2021), 535–547

work page 2021

[21] [21]

2011), 117–128

Jégou, H., Douze, M., and Schmid, C.Product Quantization for Nearest Neighbor Search.IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (Jan. 2011), 117–128

work page 2011

[22] [22]

InAdvances in neural information processing systems(2020), H

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., Riedel, S., and Kiela, D.Retrieval- augmented generation for knowledge-intensive NLP tasks. InAdvances in neural information processing systems(2020), H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33, Cur...

work page 2020

[23] [23]

D., Shen, Z., Blelloch, G., Dhulipala, L., Gu, Y., Simhadri, H

Manohar, M. D., Shen, Z., Blelloch, G., Dhulipala, L., Gu, Y., Simhadri, H. V., and Sun, Y.ParlayANN: Scalable and deterministic parallel graph-based approximate nearest neighbor search algorithms. InProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (New York, NY, USA, 2024), PPoPP ’24, Association for...

work page 2024

[24] [24]

Oakley, J., and Ferhatosmanoglu, H.SQUASH: Serverless and Dis- tributed Quantization-based Attributed Vector Similarity Search, Feb. 2025. arXiv:2502.01528 [cs]

work page arXiv 2025

[25] [25]

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., and Sutskever, I.Learning Transferable Visual Models From Natural Language Supervision. InProceedings of the 38th International Conference on Machine Learning(July 2021), PMLR, pp. 8748–8763. ISSN: 2640-3498

work page 2021

[26] [26]

V., Aumüller, M., Ingber, A., Douze, M., Williams, G., Manohar, M

Simhadri, H. V., Aumüller, M., Ingber, A., Douze, M., Williams, G., Manohar, M. D., Baranchuk, D., Liberty, E., Liu, F., Landrum, B., Karjikar, M., Dhuli- pala, L., Chen, M., Chen, Y., Ma, R., Zhang, K., Cai, Y., Shi, J., Chen, Y., Zheng, W., W an, Z., Yin, J., and Huang, B.Results of the big ANN: NeurIPS’23 compe- tition, 2024. arXiv: 2409.17424 [cs.IR]

work page arXiv 2024

[27] [27]

FreshDiskANN: A fast and accurate graph-based ANN index for streaming similarity search.arXiv preprint arXiv:2105.09613,

Singh, A., Subramanya, S. J., Krishnaswamy, R., and Simhadri, H. V. FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search, May 2021. arXiv:2105.09613 [cs]

work page arXiv 2021

[28] [28]

Tatsuno, K., Miyashita, D., Ikeda, T., Ishiyama, K., Sumiyoshi, K., and Deguchi, J.AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval, Feb. 2025. arXiv:2404.06004 [cs]. [29]The ZeroMQ Community. ZeroMQ: The intelligent transport layer, 2025

work page arXiv 2025

[29] [29]

V., Sundar, H

Upreti, N., Simhadri, H. V., Sundar, H. S., Sundaram, K., Boshra, S., Perumal- swamy, B., Atri, S., Chisholm, M., Singh, R. R., Y ang, G., Hass, T., Dudhey, N., Pattipaka, S., Hildebrand, M., Manohar, M., Moffitt, J., Xu, H., Datha, N., Gupta, S., Krishnaswamy, R., Gupta, P., Sahu, A., V arada, H., Barthwal, S., Mor, R., Codella, J., Cooper, S., Pilch, K....

work page 2025

[30] [30]

InProceedings of the 2021 International Conference on Management of Data(Virtual Event China, June 2021), ACM, pp

Wang, J., Yi, X., Guo, R., Jin, H., Xu, P., Li, S., Wang, X., Guo, X., Li, C., Xu, X., Yu, K., Yuan, Y., Zou, Y., Long, J., Cai, Y., Li, Z., Zhang, Z., Mo, Y., Gu, J., Jiang, R., Wei, Y., and Xie, C.Milvus: A Purpose-Built Vector Data Management System. InProceedings of the 2021 International Conference on Management of Data(Virtual Event China, June 2021...

work page 2021

[31] [31]

Starling: An i/o- efficient disk-resident graph index framework for high- dimensional vector similarity search on data segment

W ang, M., Xu, W., Yi, X., Wu, S., Peng, Z., Ke, X., Gao, Y., Xu, X., Guo, R., and Xie, C.Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment.Proceedings of the ACM on Management of Data 2, 1 (Mar. 2024), 1–27. arXiv:2401.02116 [cs]

work page arXiv 2024

[32] [32]

In-place updates of a graph in- dex for streaming approximate nearest neighbor search

Xu, H., Manohar, M. D., Bernstein, P. A., Chandramouli, B., Wen, R., and Simhadri, H. V.In-place updates of a graph index for streaming approximate nearest neighbor search.CoRR abs/2502.13826(Feb. 2025)

work page arXiv 2025

[33] [33]

Xu, X., Wang, M., Wang, Y., and Ma, D.Two-stage routing with optimized guided search and greedy algorithm on proximity graph.Knowledge-Based Systems 229(2021), 107305

work page 2021

[34] [34]

SPFresh: Incremental in-place update for billion-scale vector search

Xu, Y., Liang, H., Li, J., Xu, S., Chen, Q., Zhang, Q., Li, C., Yang, Z., Yang, F., Yang, Y., and others. SPFresh: Incremental in-place update for billion-scale vector search. InProceedings of the 29th symposium on operating systems principles (2023), pp. 545–561

work page 2023

[35] [35]

Yu. A. Malkov, Yu. A. Malkov, Malkov, Y. A., D. A. Y ashunin, and Y ashunin, D. A.Efficient and Robust Approximate Nearest Neighbor Search Using Hierar- chical Navigable Small World Graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 4 (Apr. 2020), 824–836

work page 2020

[36] [36]

In17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)(Boston, MA, July 2023), USENIX Association, pp

Zhang, Q., Xu, S., Chen, Q., Sui, G., Xie, J., Cai, Z., Chen, Y., He, Y., Yang, Y., Y ang, F., Y ang, M., and Zhou, L.VBASE: Unifying online vector similarity search and relational queries via relaxed monotonicity. In17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)(Boston, MA, July 2023), USENIX Association, pp. 377–395

work page 2023

[37] [37]

Towards Efficient and Scalable Distributed Vector Search with RDMA, July 2025

Zhi, X., Chen, M., Yan, X., Lu, B., Li, H., Zhang, Q., Chen, Q., and Cheng, J. Towards Efficient and Scalable Distributed Vector Search with RDMA, July 2025. arXiv:2507.06653 [cs]. 12

work page arXiv 2025