pith. sign in

arxiv: 2509.15531 · v5 · submitted 2025-09-19 · 💻 cs.DS

Sparse Neighborhood Graph-Based Approximate Nearest Neighbor Search Revisited: Theoretical Analysis and Optimization

Pith reviewed 2026-05-18 16:36 UTC · model grok-4.3

classification 💻 cs.DS
keywords approximate nearest neighbor searchsparse neighborhood graphmartingale modelgraph pruningtruncation parameterindex constructiondegree boundsearch path length
0
0 comments X

The pith

A martingale model of pruning proves sparse neighborhood graphs have O(n^{2/3+ε}) maximum out-degree and O(log n) expected search paths, which together supply a closed-form rule for the optimal truncation parameter R.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a theoretical model for the Sparse Neighborhood Graph in approximate nearest neighbor search by treating the pruning step as a martingale stochastic process that tracks how candidate sets evolve. From this model the authors derive an upper bound on maximum out-degree and a bound on search path length, then convert those bounds into an explicit formula for choosing the sparsification parameter R. A sympathetic reader cares because the formula removes the need to run repeated construction trials to find a good R, which is expensive on large vector collections. If the model is accurate, index build time drops sharply while search quality stays the same or improves.

Core claim

We introduce a martingale-based model of the pruning process that characterizes the stochastic evolution of candidate sets during graph construction. Using this framework, we prove that SNG has a maximum out-degree of O(n^{2/3+ε}), where ε>0 is an arbitrarily small constant, and an expected search path length of O(log n). Building on these insights, we derive a closed-form rule for selecting the optimal truncation parameter R, thereby eliminating the need for costly parameter sweeping.

What carries the argument

Martingale-based stochastic model of the pruning process that tracks the evolution of candidate sets during SNG construction.

If this is right

  • The truncation parameter R can be computed in closed form from dataset size alone, removing all parameter sweeps during index construction.
  • Average index build time drops by a factor of 5.9, with observed peaks of 15.4 times faster construction.
  • Search performance is preserved or improved when the theoretically derived R is used.
  • Maximum out-degree remains O(n^{2/3+ε}) and expected search paths remain O(log n) under the model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same martingale modeling approach could be applied to pruning rules in other graph-based ANNS methods such as HNSW or NSG.
  • The sub-linear degree bound implies that SNG memory usage grows more slowly than fully connected or denser graphs when n becomes very large.
  • The closed-form rule can be tested directly on datasets orders of magnitude larger than those in the current experiments to check whether the asymptotic predictions continue to hold.
  • Similar stochastic analysis might be extended to dynamic or streaming settings where points arrive incrementally and the graph must be updated.

Load-bearing premise

The pruning process during SNG construction can be accurately characterized by a martingale-based stochastic model that tracks the evolution of candidate sets.

What would settle it

Build SNG indexes for successively larger values of n on both synthetic and real data, measure the realized maximum out-degree and the empirically best R, and check whether they remain consistent with the O(n^{2/3+ε}) bound and the closed-form prediction; systematic deviation at large n would falsify the claims.

Figures

Figures reproduced from arXiv: 2509.15531 by Chuan Zhou, Guoliang Li, Xinran Ma, Zaijiu Shang, Zhaoqi Zhou, Zhiming Ma.

Figure 1
Figure 1. Figure 1: Comparison of total graph construction time [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Query performance across varying α values on SIFT1M [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Query performance across varying α values on GIST1M [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Query performance across varying α values on DEEP1M points) as the base set to construct an SNG under parameter α = 1.2. Results under other values of α are also provided in Appendix H.4 [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Degree distribution of GMM dataset . The degree distribution of all nodes is visual￾ized in [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Given that p ∗ is the nearest point to p in t-th iteration and ∥p − p ∗∥ = ρt. The light grey region represents all p ′ satisfied pruning condition ∥p − p ′∥ > ∥p ∗ − p ′∥. Proof of Lemma 6. Assuming a uniform distribution over the search space, the probability that a randomly sampled point falls within a region is proportional to the region’s volume. Without loss of generality, we place the center point p… view at source ↗
Figure 7
Figure 7. Figure 7: Path and neighbor distribution diagram: Blue lines represent edges from [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Left Figure (General Pruning Process): The pruning process for a sample point from SIFT1M is illustrated in the left figure. The horizontal axis represents the number of iteration steps, while the vertical axis indicates the number of processed points. Notably, only 4 steps are required to exceed the 80% threshold. However, the remaining pruning iterations are significantly prolonged, with the final 20% of… view at source ↗
Figure 9
Figure 9. Figure 9: Degree distribution of GMM when α = 1.00 [PITH_FULL_IMAGE:figures/full_fig_p025_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Degree distribution of GMM when α = 1.25 25 [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Degree distribution of GMM when α = 1.50 26 [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗
read the original abstract

Graph-based approaches to approximate nearest neighbor search (ANNS) enable fast, high-recall retrieval on billion-scale vector datasets. Among them, the Sparse Neighborhood Graph (SNG) is widely used due to its strong search performance. However, the lack of theoretical understanding of SNG leads to expensive tuning of the truncation parameter that controls graph sparsification. In this work, we present OPT-SNG, a principled framework for analyzing and optimizing SNG construction. We introduce a martingale-based model of the pruning process that characterizes the stochastic evolution of candidate sets during graph construction. Using this framework, we prove that SNG has a maximum out-degree of \(O(n^{2/3+\epsilon})\), where \(\epsilon>0\) is an arbitrarily small constant, and an expected search path length of \(O(\log n)\). Building on these insights, we derive a closed-form rule for selecting the optimal truncation parameter \(R\), thereby eliminating the need for costly parameter sweeping. Extensive experiments on real-world datasets demonstrate that OPT-SNG achieves an average \(5.9\times\) speedup in index construction time, with peak improvements reaching \(15.4\times\), while consistently maintaining or improving search performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces OPT-SNG, a framework for analyzing and optimizing Sparse Neighborhood Graphs (SNG) for approximate nearest neighbor search. It proposes a martingale-based stochastic model of the pruning process during SNG construction, uses this model to prove a maximum out-degree bound of O(n^{2/3+ε}) and an expected search path length of O(log n), derives a closed-form rule for the optimal truncation parameter R, and reports experimental results showing average 5.9× (peak 15.4×) speedup in index construction time while maintaining or improving search performance on real-world datasets.

Significance. If the martingale model is valid and the proofs rigorous, the work supplies the first theoretical characterization of SNG degree and path length, replacing heuristic tuning of R with a closed-form rule. This would be a useful contribution to the theory of graph-based ANNS, with direct practical impact on construction efficiency. The reported speedups provide concrete evidence of utility when the parameter rule is applied.

major comments (1)
  1. The O(n^{2/3+ε}) out-degree bound, O(log n) path-length claim, and closed-form rule for R all rest on the martingale model of candidate-set evolution during pruning. The model assumes a filtration under which the process satisfies the martingale property and admits the required concentration bounds. However, pruning decisions for different vertices share distance comparisons and global candidate orderings, introducing dependencies that may violate conditional-expectation equality or inflate variance. The paper must explicitly define the filtration, prove the martingale property holds despite these correlations, and verify that the tail bounds remain valid on the actual data distributions used in experiments.
minor comments (1)
  1. The abstract states that the closed-form rule is 'derived from the martingale model' but does not clarify whether any model parameters were fitted to the target datasets; an explicit statement of independence would strengthen the claim of a parameter-free rule.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thorough review and valuable feedback on the martingale model underlying OPT-SNG. The concern about dependencies and the need for explicit definitions is well-taken; we address it directly below and will strengthen the presentation in the revision.

read point-by-point responses
  1. Referee: The O(n^{2/3+ε}) out-degree bound, O(log n) path-length claim, and closed-form rule for R all rest on the martingale model of candidate-set evolution during pruning. The model assumes a filtration under which the process satisfies the martingale property and admits the required concentration bounds. However, pruning decisions for different vertices share distance comparisons and global candidate orderings, introducing dependencies that may violate conditional-expectation equality or inflate variance. The paper must explicitly define the filtration, prove the martingale property holds despite these correlations, and verify that the tail bounds remain valid on the actual data distributions used in experiments.

    Authors: We appreciate the referee highlighting the importance of rigorously justifying the martingale construction. In the revised manuscript we will add an explicit definition of the filtration: let F_t be the sigma-algebra generated by all pairwise distance comparisons and the induced global ordering of candidates revealed up to pruning step t for the vertex under construction. The process is a Doob martingale with respect to this filtration because the next candidate-set size is a function of the remaining unqueried distances, whose conditional expectation given F_t equals the current size under the uniform random permutation model of the ordering. Shared comparisons across vertices are accounted for by conditioning on the common revealed distances; the resulting martingale differences remain bounded, allowing the Azuma-Hoeffding inequality to yield the stated tail bounds. We will also include a short empirical verification subsection that replays the pruning process on the experimental datasets and confirms that the observed concentration matches the theoretical predictions. These additions clarify the argument without changing any stated theorems or experimental results. revision: yes

Circularity Check

0 steps flagged

No circularity: martingale model provides independent analytical foundation for bounds and closed-form rule

full rationale

The paper introduces a martingale-based stochastic model of candidate-set evolution during SNG pruning as a modeling assumption. From this framework it derives the O(n^{2/3+ε}) out-degree bound via concentration arguments, the O(log n) expected search path length, and a closed-form expression for the truncation parameter R. These steps are presented as consequences of the model rather than tautological re-statements or statistical fits to the target quantities. No self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear in the derivation chain. The analysis remains self-contained because the model parameters and martingale filtration are chosen independently of the final performance metrics on specific datasets.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central contribution rests on introducing a martingale model for the stochastic pruning process; no free parameters are introduced because the truncation rule is derived in closed form, and no new entities are postulated.

axioms (1)
  • domain assumption The pruning process in SNG construction can be modeled as a martingale that characterizes the stochastic evolution of candidate sets.
    This modeling assumption is the foundation for both the degree bound and the closed-form derivation of R.

pith-pipeline@v0.9.0 · 5757 in / 1451 out tokens · 54515 ms · 2026-05-18T16:36:54.152650+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

  1. [1]

    Sunil Arya and David M. Mount. Approximate nearest neighbor queries in fixed dimensions. In Proceedings of the 4th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 271–280, Philadelphia, PA, USA, 1993. SIAM

  2. [2]

    ROBERT B. ASH. 2 - further results in measure and integration theory. InReal Analysis and Probability, Probability and Mathematical Statistics: A Series of Monographs and Textbooks, pages 66–68. Academic Press, 1972

  3. [3]

    Elpis: Graph-based similarity search for scalable data science.Proc

    Ilias Azizi, Karima Echihabi, and Themis Palpanas. Elpis: Graph-based similarity search for scalable data science.Proc. VLDB Endow., 16(6):1548–1559, 2023

  4. [4]

    Lempitsky

    Artem Babenko and Victor S. Lempitsky. Efficient indexing of billion-scale datasets of deep descriptors. In2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV , USA, June 27-30, 2016, pages 2055–2063. IEEE Computer Society, 2016

  5. [5]

    Multidimensional binary search trees used for associative searching.Com- mun

    Jon Louis Bentley. Multidimensional binary search trees used for associative searching.Com- mun. ACM, 18(9):509–517, 1975

  6. [6]

    A revisit of hashing algorithms for approximate nearest neighbor search.IEEE Trans

    Deng Cai. A revisit of hashing algorithms for approximate nearest neighbor search.IEEE Trans. Knowl. Data Eng., 33(6):2337–2348, 2021

  7. [7]

    Sean Wang

    Meng Chen, Kai Zhang, Zhenying He, Yinan Jing, and X. Sean Wang. Roargraph: A projected bipartite graph for efficient cross-modal approximate nearest neighbor search.Proc. VLDB Endow., 17(11):2735–2749, 2024

  8. [8]

    Mirrokni

    Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. InProceedings of the 20th ACM Symposium on Computational Geometry, Brooklyn, New York, USA, June 8-11, 2004, pages 253–262. ACM, 2004

  9. [9]

    Navigable graphs for high-dimensional nearest neighbor search: Constructions and limits

    Haya Diwan, Jinrui Gou, Cameron Musco, Christopher Musco, and Torsten Suel. Navigable graphs for high-dimensional nearest neighbor search: Constructions and limits. InAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024, 2024

  10. [10]

    Flexible product quantization for fast approximate nearest neighbor search.Multim

    Jingya Fan, Yang Wang, Wenwen Song, and Zhibin Pan. Flexible product quantization for fast approximate nearest neighbor search.Multim. Tools Appl., 83(18):53243–53261, 2024

  11. [11]

    High dimensional similarity search with satellite system graph: Efficiency, scalability, and unindexed query compatibility.IEEE Trans

    Cong Fu, Changxu Wang, and Deng Cai. High dimensional similarity search with satellite system graph: Efficiency, scalability, and unindexed query compatibility.IEEE Trans. Pattern Anal. Mach. Intell., 44(8):4139–4150, 2022

  12. [12]

    Fast approximate nearest neighbor search with the navigating spreading-out graph.Proc

    Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. Fast approximate nearest neighbor search with the navigating spreading-out graph.Proc. VLDB Endow., 12(5):461–474, 2019

  13. [13]

    Optimizing the number of clusters for billion-scale quantization-based nearest neighbor search.IEEE Trans

    Yujian Fu, Cheng Chen, Xiaohui Chen, Weng-Fai Wong, and Bingsheng He. Optimizing the number of clusters for billion-scale quantization-based nearest neighbor search.IEEE Trans. Knowl. Data Eng., 36(11):6786–6800, 2024

  14. [14]

    Rabitq: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search.Proc

    Jianyang Gao and Cheng Long. Rabitq: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search.Proc. ACM Manag. Data, 2(3):167, 2024

  15. [15]

    MIT press Cambridge, 2016

    Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio.Deep learning, volume 1. MIT press Cambridge, 2016

  16. [16]

    Lightweight-yet-efficient: Revitalizing ball-tree for point-to-hyperplane nearest neighbor search

    Qiang Huang and Anthony Kum Hoe Tung. Lightweight-yet-efficient: Revitalizing ball-tree for point-to-hyperplane nearest neighbor search. In39th IEEE International Conference on Data Engineering, ICDE 2023, Anaheim, CA, USA, April 3-7, 2023, pages 436–449. IEEE, 2023. 10

  17. [17]

    Approximate nearest neighbors: towards removing the curse of dimensionality

    Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. InProceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604–613, 1998

  18. [18]

    Piotr Indyk and Haike Xu. Worst-case performance of popular approximate nearest neigh- bor search implementations: Guarantees and limitations.Advances in Neural Information Processing Systems, 36:66239–66256, 2023

  19. [19]

    Ood-diskann: Efficient and scalable graph anns for out-of-distribution queries.arXiv preprint arXiv:2211.12850, 2022

    Shikhar Jaiswal, Ravishankar Krishnaswamy, Ankit Garg, Harsha Vardhan Simhadri, and Sheshansh Agrawal. Ood-diskann: Efficient and scalable graph anns for out-of-distribution queries.arXiv preprint arXiv:2211.12850, 2022

  20. [20]

    Product quantization for nearest neighbor search.IEEE Trans

    Hervé Jégou, Matthijs Douze, and Cordelia Schmid. Product quantization for nearest neighbor search.IEEE Trans. Pattern Anal. Mach. Intell., 33(1):117–128, 2011

  21. [21]

    Kankanhalli, and Anthony K

    Yifan Lei, Qiang Huang, Mohan S. Kankanhalli, and Anthony K. H. Tung. Locality-sensitive hashing scheme based on longest circular co-substring. InProceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, pages 2589–2599. ACM, 2020

  22. [22]

    Retrieval-augmented generation for knowledge-intensive NLP tasks

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems 33: Annual Conference on Neural Information Proce...

  23. [23]

    Concise formulas for the area and volume of a hyperspherical cap.Asian Journal of Mathematics & Statistics, 4(1):66–70, 2010

    Shengqiao Li. Concise formulas for the area and volume of a hyperspherical cap.Asian Journal of Mathematics & Statistics, 4(1):66–70, 2010

  24. [24]

    Malkov and Dmitry A

    Yury A. Malkov and Dmitry A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE Trans. Pattern Anal. Mach. Intell., 42(4):824–836, 2020

  25. [25]

    Fast and exact nearest neighbor search in hamming space on full-text search engines

    Cun Matthew Mu, Jun Raymond Zhao, Guang Yang, Binwei Yang, and Zheng John Yan. Fast and exact nearest neighbor search in hamming space on full-text search engines. InSimilarity Search and Applications - 12th International Conference, SISAP 2019, Newark, NJ, USA, October 2-4, 2019, Proceedings, volume 11807 ofLecture Notes in Computer Science, pages 49–56....

  26. [26]

    Graph-based nearest neighbor search: From practice to theory

    Liudmila Prokhorenkova and Aleksandr Shekhovtsov. Graph-based nearest neighbor search: From practice to theory. InProceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 ofProceedings of Machine Learning Research, pages 7803–7813. PMLR, 2020

  27. [27]

    M. M. Mahabubur Rahman and Jelena Tesic. Evaluating hybrid approximate nearest neighbor indexing and search (HANNIS) for high-dimensional image feature search. InIEEE Interna- tional Conference on Big Data, Big Data 2022, Osaka, Japan, December 17-20, 2022, pages 6802–6804. IEEE, 2022

  28. [28]

    ESPN: memory-efficient multi-vector information retrieval

    Susav Shrestha, Narasimha Reddy, and Zongwang Li. ESPN: memory-efficient multi-vector information retrieval. InProceedings of the 2024 ACM SIGPLAN International Symposium on Memory Management, ISMM 2024, Copenhagen, Denmark, 25 June 2024, pages 95–107. ACM, 2024

  29. [29]

    Distance distributions in finite uniformly random networks: Theory and applications.IEEE Trans

    Sunil Srinivasa and Martin Haenggi. Distance distributions in finite uniformly random networks: Theory and applications.IEEE Trans. Veh. Technol., 59(2):940–949, 2010

  30. [30]

    Diskann: Fast accurate billion-point nearest neighbor search on a single node

    Suhas Jayaram Subramanya, Devvrit Fnu, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. Diskann: Fast accurate billion-point nearest neighbor search on a single node. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Curran Associ...

  31. [31]

    Annexml: Approximate nearest neighbor search for extreme multi-label clas- sification

    Yukihiro Tagami. Annexml: Approximate nearest neighbor search for extreme multi-label clas- sification. InProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017, pages 455–464. ACM, 2017

  32. [32]

    Quality and efficiency in high dimensional nearest neighbor search

    Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis. Quality and efficiency in high dimensional nearest neighbor search. InProceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009, pages 563–576. ACM, 2009

  33. [33]

    An efficient and robust framework for approximate nearest neighbor search with attribute constraint

    Mengzhao Wang, Lingwei Lv, Xiaoliang Xu, Yuxiang Wang, Qiang Yue, and Jiongkang Ni. An efficient and robust framework for approximate nearest neighbor search with attribute constraint. InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 1...

  34. [34]

    DET-LSH: A locality-sensitive hashing scheme with dynamic encoding tree for approximate nearest neighbor search.Proc

    Jiuqi Wei, Botao Peng, Xiaodong Lee, and Themis Palpanas. DET-LSH: A locality-sensitive hashing scheme with dynamic encoding tree for approximate nearest neighbor search.Proc. VLDB Endow., 17(9):2241–2254, 2024

  35. [35]

    Differential equations for random processes and random graphs.The annals of applied probability, pages 1217–1235, 1995

    Nicholas Wormald. Differential equations for random processes and random graphs.The annals of applied probability, pages 1217–1235, 1995

  36. [36]

    Routing-guided learned product quantization for graph-based approximate nearest neighbor search

    Qiang Yue, Xiaoliang Xu, Yuxiang Wang, Yikun Tao, and Xuliyuan Luo. Routing-guided learned product quantization for graph-based approximate nearest neighbor search. In40th IEEE International Conference on Data Engineering, ICDE 2024, Utrecht, The Netherlands, May 13-16, 2024, pages 4870–4883. IEEE, 2024

  37. [37]

    Serf: Segment graph for range-filtering approximate nearest neighbor search.Proc

    Chaoji Zuo, Miao Qiao, Wenchao Zhou, Feifei Li, and Dong Deng. Serf: Segment graph for range-filtering approximate nearest neighbor search.Proc. ACM Manag. Data, 2(1):69:1–69:26, 2024. 12 A Supplementary Lemmas We first introduce a commonly used lemma in probability theory, which determines the probability of the limsup of events by analyzing the converge...

  38. [38]

    as a powerful tool for analyzing the discrete-time randomized graph processes and algorithms. Given a discrete-time stochastic process, in particular, consider a submartingale, if the one-step differences are bounded and the expected differences are well-approximated by a Lipschitz function, it follows that the process closely aligns with the correspondin...

  39. [39]

    Thus, the pruning probability is bounded below by 1 3 in the two-dimensional setting. C.2 Conservative Probability Estimation Applicability of Pruning Probability to General Points.Our probability calculations are specifically designed for the scenarios where the reference point is located at the center of the sphere, ensuring uniform density in all direc...