Pyramid: A General Framework for Distributed Similarity Search
Pith reviewed 2026-05-25 15:57 UTC · model grok-4.3
The pith
Pyramid partitions datasets into similar-item subsets, builds HNSW indexes on each, and routes queries to only a few subsets for distributed similarity search.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Pyramid partitions a dataset into sub-datasets containing similar items for index building and assigns a query to only some of the sub-datasets for query processing while supporting failure recovery and straggler mitigation, all on top of HNSW graphs.
What carries the argument
Partitioning of the dataset into similarity-based sub-datasets combined with selective query routing to a subset of those sub-datasets.
If this is right
- Query throughput rises because each query touches far fewer machines than the total number of partitions.
- The system continues to answer queries after individual machine failures without restarting the whole index.
- Slow machines during a query can be ignored so that response time is determined by the faster machines.
- The same partitioning and routing logic works for Euclidean, angular, and inner-product similarity without code changes.
- Users interact only through a small set of APIs and do not manage the distributed execution themselves.
Where Pith is reading between the lines
- The same partitioning idea could be tested with other single-machine indexes that support early termination or pruning.
- Load balance across partitions may vary with data distribution, so experiments that measure per-partition query counts would be useful.
- If data arrives continuously, a mechanism to move items between partitions over time could keep routing accuracy from degrading.
- For very high-dimensional vectors the cost of deciding which partitions a query should visit may itself become a bottleneck worth measuring.
Load-bearing premise
That items grouped together by similarity in the partitions will allow most queries to reach acceptable recall by examining only a small fraction of the partitions.
What would settle it
Run the same set of queries once with full search across all partitions and once with Pyramid routing; if recall drops by more than a few percent while throughput stays flat, the routing premise does not hold.
Figures
read the original abstract
Similarity search is a core component in various applications such as image matching, product recommendation and low-shot classification. However, single machine solutions are usually insufficient due to the large cardinality of modern datasets and stringent latency requirement of on-line query processing. We present Pyramid, a general and efficient framework for distributed similarity search. Pyramid supports search with popular similarity functions including Euclidean distance, angular distance and inner product. Different from existing distributed solutions that are based on KD-tree or locality sensitive hashing (LSH), Pyramid is based on Hierarchical Navigable Small World graph (HNSW), which is the state of the art similarity search algorithm on a single machine. To achieve high query processing throughput, Pyramid partitions a dataset into sub-datasets containing similar items for index building and assigns a query to only some of the sub-datasets for query processing. To provide the robustness required by production deployment, Pyramid also supports failure recovery and straggler mitigation. Pyramid offers a set of concise API such that users can easily use Pyramid without knowing the details of distributed execution. Experiments on large-scale datasets show that Pyramid produces quality results for similarity search, achieves high query processing throughput and is robust under node failure and straggler.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents Pyramid, a distributed framework for similarity search supporting Euclidean, angular, and inner-product distances. It builds HNSW indexes on a dataset partitioned into sub-datasets of similar items, routes each query to only a subset of those partitions, and adds failure recovery plus straggler mitigation. The authors claim that the approach yields high query throughput while preserving result quality on large-scale datasets and that a concise API hides the distributed execution details from users.
Significance. If the partitioning and routing preserve recall at the levels claimed, Pyramid would be a useful systems contribution: it leverages the current single-machine state-of-the-art (HNSW) rather than older KD-tree or LSH techniques, adds production-oriented robustness features, and exposes a simple API. The combination of data locality, selective query routing, and fault tolerance could improve throughput in distributed settings where single-machine HNSW is already memory- or latency-bound.
major comments (2)
- [§3] §3 (Partitioning and query routing): The central efficiency claim rests on the assertion that partitioning into similar-item sub-datasets allows a query to be sent to only a subset of partitions while HNSW indexes on those partitions still return high-recall results. No analysis, probability bound, or even empirical measurement is supplied showing that nearest neighbors remain inside the routed subset with high probability; without this, the distributed search reduces to an uncontrolled approximation whose recall cannot be guaranteed relative to single-machine HNSW.
- [§5] §5 (Experimental evaluation): Throughput and quality claims are made for large-scale datasets, yet the reported results do not include (a) recall@K curves comparing routed-subset search against full-dataset HNSW, (b) the fraction of partitions actually contacted per query, or (c) head-to-head numbers against existing distributed baselines (e.g., LSH- or KD-tree-based systems). These omissions make it impossible to verify that the partitioning strategy delivers the advertised quality-throughput trade-off.
minor comments (2)
- [Abstract] Abstract: 'a set of concise API' should read 'a set of concise APIs'.
- [§3] The paper would benefit from an explicit statement of the clustering or routing rule used to decide which partitions receive a given query; the current description is too high-level for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.
read point-by-point responses
-
Referee: [§3] §3 (Partitioning and query routing): The central efficiency claim rests on the assertion that partitioning into similar-item sub-datasets allows a query to be sent to only a subset of partitions while HNSW indexes on those partitions still return high-recall results. No analysis, probability bound, or even empirical measurement is supplied showing that nearest neighbors remain inside the routed subset with high probability; without this, the distributed search reduces to an uncontrolled approximation whose recall cannot be guaranteed relative to single-machine HNSW.
Authors: We agree that the manuscript would benefit from explicit validation of recall preservation under the routing scheme. Section 3 describes a partitioning approach that groups similar items via clustering to promote locality. In the revision we will add empirical measurements (e.g., the fraction of true nearest neighbors retained within routed partitions) and direct recall@K comparisons between routed-subset search and full-dataset HNSW. revision: yes
-
Referee: [§5] §5 (Experimental evaluation): Throughput and quality claims are made for large-scale datasets, yet the reported results do not include (a) recall@K curves comparing routed-subset search against full-dataset HNSW, (b) the fraction of partitions actually contacted per query, or (c) head-to-head numbers against existing distributed baselines (e.g., LSH- or KD-tree-based systems). These omissions make it impossible to verify that the partitioning strategy delivers the advertised quality-throughput trade-off.
Authors: We acknowledge these omissions. The revised evaluation section will include (a) recall@K curves for routed versus full search, (b) the average fraction of partitions contacted per query, and (c) head-to-head comparisons against representative distributed LSH- and KD-tree-based systems (either via new experiments or reference to published results where direct re-implementation is impractical). revision: yes
Circularity Check
No circularity: system architecture with no derivations or self-referential predictions
full rationale
The paper presents Pyramid as a distributed framework that partitions data into similar-item sub-datasets, builds per-partition HNSW indexes, and routes queries to subsets of partitions. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described content. The partitioning and routing strategy is introduced as an engineering design choice rather than a result derived from prior results by the same authors. Claims rest on system implementation and experimental throughput/robustness rather than any reduction of outputs to inputs by construction. This is the expected non-finding for a systems paper without mathematical modeling.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
Passing the Baton: High Throughput Distributed Disk-Based Vector Search with BatANN
BatANN delivers near-linear throughput scaling for distributed disk-based approximate nearest neighbor search on a single global graph, with 3.5-5.59x gains over scatter-gather baselines on 1B-point datasets at 0.95 recall.
Reference graph
Works this paper leans on
-
[1]
Jingdong Wang, Ting Zhang, Nicu Sebe, Heng Tao Shen, et al. A survey on learning to hash. IEEE transactions on pattern analysis and machine intelligence, 40(4):769–790, 2017
work page 2017
-
[2]
Locality-sensitive hashing scheme based on p-stable distributions
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual symposium on Computational geometry, pages 253–262. ACM, 2004
work page 2004
-
[3]
Similarity estimation techniques from rounding algorithms
Moses S Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing , pages 380–388. ACM, 2002
work page 2002
-
[4]
Asymmetric lsh (alsh) for sublinear time maximum inner product search (mips)
Anshumali Shrivastava and Ping Li. Asymmetric lsh (alsh) for sublinear time maximum inner product search (mips). In Advances in Neural Information Processing Systems , pages 2321–2329, 2014
work page 2014
-
[5]
Object retrieval with large vocabularies and fast spatial matching
James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. Object retrieval with large vocabularies and fast spatial matching. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8. IEEE, 2007
work page 2007
-
[6]
Low-shot learning with large-scale diffusion
Matthijs Douze, Arthur Szlam, Bharath Hariharan, and Herv ´e J ´egou. Low-shot learning with large-scale diffusion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 3349–3358, 2018
work page 2018
-
[7]
Matrix factorization techniques for recommender systems
Yehuda Koren, Robert Bell, and Chris V olinsky. Matrix factorization techniques for recommender systems. Computer, (8):30–37, 2009
work page 2009
-
[8]
Assembling large genomes with single-molecule sequencing and locality-sensitive hashing
Konstantin Berlin, Sergey Koren, Chen-Shan Chin, James P Drake, Jane M Landolin, and Adam M Phillippy. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nature biotechnology, 33(6):623, 2015
work page 2015
-
[9]
Kore: keyphrase overlap relatedness for entity disambiguation
Johannes Hoffart, Stephan Seufert, Dat Ba Nguyen, Martin Theobald, and Gerhard Weikum. Kore: keyphrase overlap relatedness for entity disambiguation. In Proceedings of the 21st ACM international confer- ence on Information and knowledge management, pages 545–554. ACM, 2012
work page 2012
-
[10]
Sarath Chandar, Sungjin Ahn, Hugo Larochelle, Pascal Vincent, Gerald Tesauro, and Yoshua Bengio. Hierarchical memory networks. arXiv preprint arXiv:1605.07427, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[11]
Scalable generalized linear bandits: Online computation and hashing
Kwang-Sung Jun, Aniruddha Bhargava, Robert Nowak, and Rebecca Willett. Scalable generalized linear bandits: Online computation and hashing. In Advances in Neural Information Processing Systems , pages 99–109, 2017
work page 2017
- [12]
-
[13]
idistance: An adaptive b+-tree based indexing method for nearest neighbor search
Hosagrahar V Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, and Rui Zhang. idistance: An adaptive b+-tree based indexing method for nearest neighbor search. ACM Transactions on Database Systems (TODS) , 30(2):364–397, 2005
work page 2005
-
[14]
Maximum inner-product search using cone trees
Parikshit Ram and Alexander G Gray. Maximum inner-product search using cone trees. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining , pages 931–939. ACM, 2012
work page 2012
-
[15]
Approximate nearest neighbors: towards removing the curse of dimensionality
Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing , pages 604–
-
[16]
On Symmetric and Asymmetric LSHs for Inner Product Search
Behnam Neyshabur and Nathan Srebro. On symmetric and asymmetric lshs for inner product search. arXiv preprint arXiv:1410.5518 , 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[17]
Product quantiza- tion for nearest neighbor search
Herve Jegou, Matthijs Douze, and Cordelia Schmid. Product quantiza- tion for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence , 33(1):117–128, 2010
work page 2010
-
[18]
Optimized product quantization for approximate nearest neighbor search
Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. Optimized product quantization for approximate nearest neighbor search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 2946–2953, 2013
work page 2013
-
[19]
Artem Babenko and Victor Lempitsky. The inverted multi-index. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(6):1247– 1260, 2014
work page 2014
-
[20]
Fast approximate nearest-neighbor search with k-nearest neigh- bor graph
Kiana Hajebi, Yasin Abbasi-Yadkori, Hossein Shahbazi, and Hong Zhang. Fast approximate nearest-neighbor search with k-nearest neigh- bor graph. In Twenty-Second International Joint Conference on Artificial Intelligence, 2011
work page 2011
-
[21]
Fanng: Fast approximate nearest neighbour graphs
Ben Harwood and Tom Drummond. Fanng: Fast approximate nearest neighbour graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 5713–5722, 2016
work page 2016
-
[22]
Yury A Malkov and Dmitry A Yashunin. Efficient and robust approxi- mate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence, 2018
work page 2018
-
[23]
Fast neighborhood graph search using cartesian concatena- tion
Jing Wang, Jingdong Wang, Gang Zeng, Rui Gan, Shipeng Li, and Baining Guo. Fast neighborhood graph search using cartesian concatena- tion. In Proceedings of the IEEE International Conference on Computer Vision, pages 2128–2135, 2013
work page 2013
-
[24]
Fast approxi- mate nearest neighbor search with the navigating spreading-out graph
Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. Fast approxi- mate nearest neighbor search with the navigating spreading-out graph. Proceedings of the VLDB Endowment , 12(5):461–474, 2019
work page 2019
-
[25]
Scalable nearest neighbor algorithms for high dimensional data
Marius Muja and David G Lowe. Scalable nearest neighbor algorithms for high dimensional data. IEEE transactions on pattern analysis and machine intelligence, 36(11):2227–2240, 2014
work page 2014
-
[26]
Streaming similarity search over one billion tweets using parallel locality-sensitive hashing
Narayanan Sundaram, Aizana Turmukhametova, Nadathur Satish, Todd Mostak, Piotr Indyk, Samuel Madden, and Pradeep Dubey. Streaming similarity search over one billion tweets using parallel locality-sensitive hashing. Proceedings of the VLDB Endowment, 6(14):1930–1941, 2013
work page 1930
-
[27]
Shuffle- efficient distributed locality sensitive hashing on spark
Wanxin Zhang, Dongsheng Li, Ying Xu, and Yiming Zhang. Shuffle- efficient distributed locality sensitive hashing on spark. In 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WK- SHPS), pages 766–767. IEEE, 2016
work page 2016
-
[29]
Billion-scale similarity search with GPUs
Jeff Johnson, Matthijs Douze, and Herv ´e J´egou. Billion-scale similarity search with gpus. arXiv preprint arXiv:1702.08734 , 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[30]
Link and code: Fast indexing with graphs and compact regression codes
Matthijs Douze, Alexandre Sablayrolles, and Herv ´e J ´egou. Link and code: Fast indexing with graphs and compact regression codes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3646–3654, 2018
work page 2018
-
[31]
To index or not to index: Optimizing exact maximum inner product search
Firas Abuzaid, Geet Sethi, Peter Bailis, and Matei Zaharia. To index or not to index: Optimizing exact maximum inner product search
-
[32]
Revisiting the inverted indices for billion-scale approximate nearest neighbors
Dmitry Baranchuk, Artem Babenko, and Yury Malkov. Revisiting the inverted indices for billion-scale approximate nearest neighbors. In Proceedings of the European Conference on Computer Vision (ECCV) , pages 202–216, 2018
work page 2018
-
[33]
Non-metric similarity graphs for maximum inner product search
Stanislav Morozov and Artem Babenko. Non-metric similarity graphs for maximum inner product search. In Advances in Neural Information Processing Systems, pages 4721–4730, 2018
work page 2018
-
[34]
Think Locally, Act Globally: Perfectly Balanced Graph Partitioning
Peter Sanders and Christian Schulz. Think locally, act globally: Perfectly balanced graph partitioning. arXiv preprint arXiv:1210.0477 , 2012
work page internal anchor Pith review Pith/arXiv arXiv 2012
-
[35]
Clustering is Efficient for Approximate Maximum Inner Product Search
Alex Auvolat, Sarath Chandar, Pascal Vincent, Hugo Larochelle, and Yoshua Bengio. Clustering is efficient for approximate maximum inner product search. arXiv preprint arXiv:1507.05910 , 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[36]
Architectural styles and the design of network-based software architectures , volume 7
Roy T Fielding and Richard N Taylor. Architectural styles and the design of network-based software architectures , volume 7. University of California, Irvine Doctoral dissertation, 2000
work page 2000
-
[37]
Efficient indexing of billion-scale datasets of deep descriptors
Artem Babenko and Victor Lempitsky. Efficient indexing of billion-scale datasets of deep descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 2055–2063, 2016
work page 2055
-
[38]
Searching in one billion vectors: re-rank with source coding
Herv ´e J ´egou, Romain Tavenard, Matthijs Douze, and Laurent Amsaleg. Searching in one billion vectors: re-rank with source coding. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 861–864. IEEE, 2011
work page 2011
-
[39]
Qi Chen, Haidong Wang, Mingqin Li, Gang Ren, Scarlett Li, Jeffery Zhu, Jason Li, Chuanjie Liu, Lintao Zhang, and Jingdong Wang.SPTAG: A library for fast approximate nearest neighbor search , 2018
work page 2018
-
[40]
Roger Weber, Hans-J ¨org Schek, and Stephen Blott. A quantitative analysis and performance study for similarity-search methods in high- dimensional spaces. In VLDB, volume 98, pages 194–205, 1998
work page 1998
-
[41]
A general and efficient querying method for learning to hash
Jinfeng Li, Xiao Yan, Jian Zhang, An Xu, James Cheng, Jie Liu, Kelvin KW Ng, and Ti-chung Cheng. A general and efficient querying method for learning to hash. In Proceedings of the 2018 International Conference on Management of Data , pages 1333–1347. ACM, 2018
work page 2018
-
[42]
Composite quantization for approximate nearest neighbor search
Ting Zhang, Chao Du, and Jingdong Wang. Composite quantization for approximate nearest neighbor search. In ICML, volume 2, page 3, 2014
work page 2014
-
[43]
Norm-ranging lsh for maximum inner product search
Xiao Yan, Jinfeng Li, Xinyan Dai, Hongzhi Chen, and James Cheng. Norm-ranging lsh for maximum inner product search. In Advances in Neural Information Processing Systems , pages 2952–2961, 2018
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.