Passing the Baton: High Throughput Distributed Disk-Based Vector Search with BatANN
Pith reviewed 2026-05-16 23:57 UTC · model grok-4.3
The pith
BatANN distributes vector search by forwarding full query states to keep one global graph across servers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BatANN is a distributed disk-based approximate nearest neighbor system that retains the logarithmic search efficiency of a single global graph. When a query needs data stored on another machine, the full state of the query is forwarded so execution continues locally on that machine, delivering near-linear throughput scaling with the number of servers.
What carries the argument
Full-state query forwarding that sends the entire current search state to the remote server holding the next graph neighborhood, preserving locality during distributed execution.
If this is right
- On 1B-point datasets at 0.95 recall using 10 servers, BatANN reaches 3.5-5.59x the throughput of scatter-gather baselines.
- It reaches 1.44-2.09x the throughput of DistributedANN on the same workloads.
- Mean latency remains below 3 ms.
- All results are obtained over standard TCP without specialized networking hardware.
- It is the first open-source distributed disk-based vector search system that uses one global graph.
Where Pith is reading between the lines
- The forwarding approach could support datasets in the tens of billions simply by adding more standard servers without rebuilding the index.
- Similar state-passing ideas might apply to other graph-based retrieval methods that currently rely on separate per-server indexes.
- Network costs for full-state forwarding appear lower than expected for these workloads, opening room for further scaling experiments on wider clusters.
Load-bearing premise
Partitioning a single global graph across disks on multiple servers preserves recall and search efficiency without meaningful loss.
What would settle it
Running the system on a 1B-vector dataset with 10 servers and observing either recall below 0.95 or throughput scaling below linear with added servers would show the central claim does not hold.
Figures
read the original abstract
Vector search underpins modern information-retrieval systems, including retrieval-augmented generation (RAG) pipelines and search engines over unstructured text and images. As datasets scale to billions of vectors, disk-based vector search has emerged as a practical solution. However, looking to the future, we must anticipate datasets too large for any single server and throughput demands that exceed the limits of locally attached SSDs. We present BatANN, a distributed disk-based approximate nearest neighbor (ANN) system that retains the logarithmic search efficiency of a single global graph while achieving near-linear throughput scaling in the number of servers. Our core innovation is that when accessing a neighborhood which is stored on another machine, we send the full state of the query to the other machine to continue executing there for improved locality. On 1B-point datasets at 0.95 recall using 10 servers, BatANN achieves 3.5-5.59x of the scatter-gather baseline and 1.44-2.09x the throughput of DistributedANN, respectively, while maintaining mean latency below 3 ms. Moreover, we get these results on standard TCP. To our knowledge, BatANN is the first open-source distributed disk-based vector search system to operate over a single global graph.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents BatANN, a distributed disk-based approximate nearest neighbor search system that partitions a single global graph across multiple servers. The core mechanism forwards the full query state (candidates, visited set, distance bounds) to the remote server when a neighborhood access crosses a partition boundary, aiming to preserve logarithmic search efficiency while achieving near-linear throughput scaling. On 1B-point datasets at 0.95 recall with 10 servers, it reports 3.5-5.59× throughput over a scatter-gather baseline and 1.44-2.09× over DistributedANN, with mean latency below 3 ms using standard TCP; it claims to be the first open-source distributed disk-based vector search system operating over a single global graph.
Significance. If the reported speedups and latency hold under the baton-passing model, the work would provide a practical path to scaling disk-based ANN beyond single-server SSD limits while retaining the efficiency advantages of a unified graph. The empirical gains on commodity TCP and the open-source positioning are concrete strengths that could influence large-scale retrieval pipelines in RAG and search engines.
major comments (2)
- [Abstract] Abstract: The headline throughput claims (3.5-5.59× over scatter-gather and 1.44-2.09× over DistributedANN) and sub-3 ms mean latency rest on the assumption that cross-partition traversals remain infrequent and that full-state forwarding incurs negligible overhead relative to SSD reads; however, the manuscript provides no measurements of average hops per query, serialized state size transferred per hop, or partition-cut sizes for the 1B-point experiments.
- [Experimental results] Experimental results: Performance numbers are reported for specific recall targets and server counts without error bars, variance across runs, or detailed methodology for how the global graph is partitioned and assigned to disks, which leaves the near-linear scaling claim difficult to evaluate.
minor comments (1)
- [Abstract] Abstract: The claim that BatANN is 'the first open-source' system of its kind should be supported by a more explicit comparison table or discussion in the related-work section rather than left as a knowledge claim.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to incorporate additional measurements and methodological details.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline throughput claims (3.5-5.59× over scatter-gather and 1.44-2.09× over DistributedANN) and sub-3 ms mean latency rest on the assumption that cross-partition traversals remain infrequent and that full-state forwarding incurs negligible overhead relative to SSD reads; however, the manuscript provides no measurements of average hops per query, serialized state size transferred per hop, or partition-cut sizes for the 1B-point experiments.
Authors: We agree that explicit measurements of cross-partition behavior would strengthen the claims. In the revised version we will add a dedicated paragraph and table in the experimental evaluation section reporting, for the 1B-point workloads: (1) average cross-partition hops per query, (2) average serialized state size transferred per hop, and (3) partition-cut statistics (number of edges crossing partitions). These quantities will be collected under the exact same configuration and recall targets used for the throughput results, allowing readers to quantify the overhead of baton passing relative to SSD reads. revision: yes
-
Referee: [Experimental results] Experimental results: Performance numbers are reported for specific recall targets and server counts without error bars, variance across runs, or detailed methodology for how the global graph is partitioned and assigned to disks, which leaves the near-linear scaling claim difficult to evaluate.
Authors: We acknowledge that the current presentation lacks error bars and a sufficiently detailed partitioning description. In the revision we will (1) add standard-deviation error bars to all throughput and latency plots, computed over at least five independent runs, and (2) expand the experimental setup subsection to describe the graph partitioning algorithm, the objective used to minimize cut size, and the mapping of partitions to disks/servers. These additions will make the near-linear scaling results easier to reproduce and evaluate. revision: yes
Circularity Check
No circularity: performance claims are purely empirical benchmarks against external baselines
full rationale
The paper describes a distributed ANN system and reports measured throughput and latency numbers on 1B-point datasets (3.5-5.59x over scatter-gather, 1.44-2.09x over DistributedANN, <3 ms mean latency). These are direct experimental comparisons with no equations, fitted parameters, self-referential definitions, or derivation steps that reduce to the paper's own inputs. The single-global-graph partitioning assumption is stated as a design choice and evaluated empirically; it is not justified by any self-citation chain or ansatz that collapses into the claimed result. No load-bearing mathematical step exists to inspect for circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Network and disk performance characteristics allow effective full-query-state forwarding without prohibitive overhead
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinctionreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
core innovation is that when accessing a neighborhood which is stored on another machine, we send the full state of the query to the other machine
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Adams, P., Li, M., Zhang, S., Tan, L., Chen, Q., Li, M., Li, Z., Risvik, K. M., and Simhadri, H. V.DistributedANN: Efficient Scaling of a Single DiskANN Graph Across Thousands of Computers. InThe 1st Workshop on Vector Databases(July 2025)
work page 2025
-
[2]
Ahalt, S. C., Krishnamurthy, A. K., Chen, P., and Melton, D. E.Competitive learning algorithms for vector quantization.Neural Networks 3, 3 (Jan. 1990), 277–290. 11
work page 1990
-
[3]
André, F., Kermarrec, A.-M., and Scouarnec, N. L.Quicker ADC : Unlocking the hidden potential of Product Quantization with SIMD.IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 5 (May 2021), 1666–1677
work page 2021
-
[4]
M.Approximate nearest neighbor queries in fixed dimensions
Arya, S., and Mount, D. M.Approximate nearest neighbor queries in fixed dimensions. InProceedings of the fourth annual ACM-SIAM symposium on discrete algorithms(Austin, Texas, USA, 1993), Soda ’93, Society for Industrial and Applied Mathematics, pp. 271–280. Number of pages: 10 tex.address: USA
work page 1993
-
[5]
Aumüller, M., Bernhardsson, E., and Faithfull, A.ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms.Information Systems 87(Jan. 2020), 101374
work page 2020
-
[6]
L.Searching in metric spaces.Acm Computing Surveys 33, 3 (Sept
Chávez, E., Navarro, G., Baeza-Y ates, R., and Marroqín, J. L.Searching in metric spaces.Acm Computing Surveys 33, 3 (Sept. 2001), 273–321
work page 2001
-
[7]
Deng, S., Yan, X., Ng, K. K. W., Jiang, C., and Cheng, J.Pyramid: A General Framework for Distributed Similarity Search, June 2019. arXiv:1906.10602 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[8]
Desrochers, C.A Fast General Purpose Lock-Free Queue for C++.moody- camel.com(2014)
work page 2014
-
[9]
S., Rodriguez, A., Kulaw- iak, D
Dilocker, E., van Luijt, B., Voorbach, B., Hasan, M. S., Rodriguez, A., Kulaw- iak, D. A., Antas, M., and Duckworth, P.Weaviate, Nov. 2025. original-date: 2016-03-30T15:03:17Z
work page 2025
-
[10]
InProceedings of the 20th International Conference on World Wide Web(2011), ACM, pp
Dong, W., Moses, C., and Li, K.Efficient k-nearest neighbor graph construction for generic similarity measures. InProceedings of the 20th International Conference on World Wide Web(2011), ACM, pp. 577–586
work page 2011
-
[11]
InProceedings of the USENIX annual technical conference (ATC)(July 2019), pp
Duplyakin, D., Ricci, R., Maricq, A., Wong, G., Duerig, J., Eide, E., Stoller, L., Hibler, M., Johnson, D., Webb, K., Akella, A., W ang, K., Ricart, G., Landwe- ber, L., Elliott, C., Zink, M., and Cecchet, E.The design and operation of CloudLab. InProceedings of the USENIX annual technical conference (ATC)(July 2019), pp. 1–14
work page 2019
- [12]
-
[13]
In19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25)(2025), pp
Guo, H., and Lu, Y.Achieving {Low-Latency} {Graph-Based} Vector Search via Aligning {Best-First} Search Algorithm with {SSD}. In19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25)(2025), pp. 171–186
work page 2025
-
[14]
Guo, H., and Lu, Y.OdinANN: Direct Insert for Consistently Stable Performance in Billion-Scale Graph-Based Vector Search.24th USENIX Conference on File and Storage Technologies (FAST ’26)(2026)
work page 2026
-
[15]
Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., and Heck, L.Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management(San Francisco California USA, Oct. 2013), ACM, pp. 2333–2338
work page 2013
-
[16]
InThe 1st workshop on vector databases (2025)
Ingber, A., and Liberty, E.Accurate and efficient metadata filtering in pinecone’s serverless vector database. InThe 1st workshop on vector databases (2025)
work page 2025
-
[17]
In2023 USENIX Annual Technical Conference (USENIX ATC 23)(2023), USENIX Association, pp
Jang, J., Choi, H., Bae, H., Lee, S., Kwon, M., and Jung, M.CXL-ANNS: Software- Hardware collaborative memory disaggregation and computation for Billion- Scale approximate nearest neighbor search. In2023 USENIX Annual Technical Conference (USENIX ATC 23)(2023), USENIX Association, pp. 585–600
work page 2023
-
[18]
Jayaram Subramanya, S., Devvrit, F., Simhadri, H. V., Krishnawamy, R., and Kadekodi, R.DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. InAdvances in Neural Information Processing Systems(2019), vol. 32, Curran Associates, Inc
work page 2019
-
[19]
Jha, S., Behrens, J., Gkountouvas, T., Milano, M., Song, W., Tremel, E., Re- nesse, R. V., Zink, S., and Birman, K. P.Derecho: Fast state machine replication for cloud services.ACM Trans. Comput. Syst. 36, 2 (Apr. 2019)
work page 2019
-
[20]
Johnson, J., Douze, M., and Jégou, H.Billion-Scale Similarity Search with GPUs.IEEE Transactions on Big Data 7, 3 (July 2021), 535–547
work page 2021
-
[21]
Jégou, H., Douze, M., and Schmid, C.Product Quantization for Nearest Neighbor Search.IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (Jan. 2011), 117–128
work page 2011
-
[22]
InAdvances in neural information processing systems(2020), H
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., Riedel, S., and Kiela, D.Retrieval- augmented generation for knowledge-intensive NLP tasks. InAdvances in neural information processing systems(2020), H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33, Cur...
work page 2020
-
[23]
D., Shen, Z., Blelloch, G., Dhulipala, L., Gu, Y., Simhadri, H
Manohar, M. D., Shen, Z., Blelloch, G., Dhulipala, L., Gu, Y., Simhadri, H. V., and Sun, Y.ParlayANN: Scalable and deterministic parallel graph-based approximate nearest neighbor search algorithms. InProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (New York, NY, USA, 2024), PPoPP ’24, Association for...
work page 2024
- [24]
-
[25]
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., and Sutskever, I.Learning Transferable Visual Models From Natural Language Supervision. InProceedings of the 38th International Conference on Machine Learning(July 2021), PMLR, pp. 8748–8763. ISSN: 2640-3498
work page 2021
-
[26]
V., Aumüller, M., Ingber, A., Douze, M., Williams, G., Manohar, M
Simhadri, H. V., Aumüller, M., Ingber, A., Douze, M., Williams, G., Manohar, M. D., Baranchuk, D., Liberty, E., Liu, F., Landrum, B., Karjikar, M., Dhuli- pala, L., Chen, M., Chen, Y., Ma, R., Zhang, K., Cai, Y., Shi, J., Chen, Y., Zheng, W., W an, Z., Yin, J., and Huang, B.Results of the big ANN: NeurIPS’23 compe- tition, 2024. arXiv: 2409.17424 [cs.IR]
-
[27]
Singh, A., Subramanya, S. J., Krishnaswamy, R., and Simhadri, H. V. FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search, May 2021. arXiv:2105.09613 [cs]
- [28]
-
[29]
Upreti, N., Simhadri, H. V., Sundar, H. S., Sundaram, K., Boshra, S., Perumal- swamy, B., Atri, S., Chisholm, M., Singh, R. R., Y ang, G., Hass, T., Dudhey, N., Pattipaka, S., Hildebrand, M., Manohar, M., Moffitt, J., Xu, H., Datha, N., Gupta, S., Krishnaswamy, R., Gupta, P., Sahu, A., V arada, H., Barthwal, S., Mor, R., Codella, J., Cooper, S., Pilch, K....
work page 2025
-
[30]
Wang, J., Yi, X., Guo, R., Jin, H., Xu, P., Li, S., Wang, X., Guo, X., Li, C., Xu, X., Yu, K., Yuan, Y., Zou, Y., Long, J., Cai, Y., Li, Z., Zhang, Z., Mo, Y., Gu, J., Jiang, R., Wei, Y., and Xie, C.Milvus: A Purpose-Built Vector Data Management System. InProceedings of the 2021 International Conference on Management of Data(Virtual Event China, June 2021...
work page 2021
-
[31]
W ang, M., Xu, W., Yi, X., Wu, S., Peng, Z., Ke, X., Gao, Y., Xu, X., Guo, R., and Xie, C.Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment.Proceedings of the ACM on Management of Data 2, 1 (Mar. 2024), 1–27. arXiv:2401.02116 [cs]
-
[32]
In-place updates of a graph in- dex for streaming approximate nearest neighbor search
Xu, H., Manohar, M. D., Bernstein, P. A., Chandramouli, B., Wen, R., and Simhadri, H. V.In-place updates of a graph index for streaming approximate nearest neighbor search.CoRR abs/2502.13826(Feb. 2025)
-
[33]
Xu, X., Wang, M., Wang, Y., and Ma, D.Two-stage routing with optimized guided search and greedy algorithm on proximity graph.Knowledge-Based Systems 229(2021), 107305
work page 2021
-
[34]
SPFresh: Incremental in-place update for billion-scale vector search
Xu, Y., Liang, H., Li, J., Xu, S., Chen, Q., Zhang, Q., Li, C., Yang, Z., Yang, F., Yang, Y., and others. SPFresh: Incremental in-place update for billion-scale vector search. InProceedings of the 29th symposium on operating systems principles (2023), pp. 545–561
work page 2023
-
[35]
Yu. A. Malkov, Yu. A. Malkov, Malkov, Y. A., D. A. Y ashunin, and Y ashunin, D. A.Efficient and Robust Approximate Nearest Neighbor Search Using Hierar- chical Navigable Small World Graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 4 (Apr. 2020), 824–836
work page 2020
-
[36]
Zhang, Q., Xu, S., Chen, Q., Sui, G., Xie, J., Cai, Z., Chen, Y., He, Y., Yang, Y., Y ang, F., Y ang, M., and Zhou, L.VBASE: Unifying online vector similarity search and relational queries via relaxed monotonicity. In17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)(Boston, MA, July 2023), USENIX Association, pp. 377–395
work page 2023
-
[37]
Towards Efficient and Scalable Distributed Vector Search with RDMA, July 2025
Zhi, X., Chen, M., Yan, X., Lu, B., Li, H., Zhang, Q., Chen, Q., and Cheng, J. Towards Efficient and Scalable Distributed Vector Search with RDMA, July 2025. arXiv:2507.06653 [cs]. 12
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.