MCGI: Manifold-Consistent Graph Indexing for Billion-Scale Disk-Resident Vector Search

Dongfang Zhao

arxiv: 2601.01930 · v3 · submitted 2026-01-05 · 💻 cs.IR · cs.AI

MCGI: Manifold-Consistent Graph Indexing for Billion-Scale Disk-Resident Vector Search

Dongfang Zhao This is my paper

Pith reviewed 2026-05-16 18:15 UTC · model grok-4.3

classification 💻 cs.IR cs.AI

keywords manifold-consistent graph indexinglocal intrinsic dimensionalityapproximate nearest neighbor searchdisk-resident vector searchbeam search adaptationbillion-scale indexinggraph-based ANN

0 comments

The pith

MCGI adapts beam budgets using local intrinsic dimensionality to preserve manifold connectivity in graph-based vector search.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Manifold-Consistent Graph Indexing to fix the mismatch between Euclidean distances and geodesic paths on data manifolds during approximate nearest neighbor search. Standard graph methods use fixed beam widths that cause routing errors in high dimensions because they ignore local geometry. MCGI computes in-situ local intrinsic dimensionality estimates to adjust the beam search range dynamically, replacing a single hyperparameter with a stable geometry-informed interval. This adjustment is claimed to maintain topological connectivity that aligns with the underlying manifold structure. The method targets disk-resident indexes at billion-scale without increasing sensitivity to dataset-specific tuning.

Core claim

MCGI provides robust approximation by preserving manifold-consistent topological connectivity through dynamic modulation of beam search budgets informed by in-situ local intrinsic dimensionality estimates, which reduces divergence from the data manifold compared to uniform treatment of dimensions.

What carries the argument

Local intrinsic dimensionality modulated beam search, which replaces a fixed scalar budget with a geometry-aware range computed during search to adapt to local manifold properties.

If this is right

Search performance becomes more stable across datasets with different dimensionalities and scales.
Manifold connectivity preservation leads to fewer routing failures in high-dimensional spaces.
Disk-resident graphs can support billion-scale queries with lower hyperparameter tuning effort.
The approach generalizes beyond specific datasets by using local geometry rather than global assumptions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same LID modulation principle could be tested in memory-resident graph indexes to isolate disk effects.
Combining MCGI with curvature-aware sampling might improve connectivity preservation on manifolds with varying local properties.
Direct measurement of geodesic versus Euclidean path lengths on synthetic manifolds would provide a targeted test of the connectivity claim.

Load-bearing premise

In-situ local intrinsic dimensionality estimates can be computed accurately enough during search to modulate beam budgets without introducing new errors or excessive overhead.

What would settle it

Measure whether approximation error rises when the LID-based modulation is disabled and replaced by fixed beam widths in controlled tests on datasets with known manifold structure.

Figures

Figures reproduced from arXiv: 2601.01930 by Dongfang Zhao.

**Figure 2.** Figure 2: Billion scale scalability and resource efficiency evaluations. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Performance on T2I-1B. during query execution to capture the Recall-QPS trade-off. Threads specifies the number of parallel threads utilized during search to maximize hardware concurrency [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗

read the original abstract

Graph-based Approximate Nearest Neighbor (ANN) search often suffers from performance degradation in high-dimensional spaces due to the Euclidean-Geodesic mismatch, where greedy routing diverges from the underlying data manifold. To address this challenge, this paper presents Manifold-Consistent Graph Indexing (MCGI), a geometry-aware and disk-resident indexing method that leverages Local Intrinsic Dimensionality (LID) to dynamically adapt search strategies to the intrinsic geometry of data. Unlike conventional algorithms that treat dimensions uniformly, MCGI modulates its beam search budget based on in-situ geometric analysis, which reduces sensitivity to data-specific hyperparameters by replacing a single scalar with a geometry-informed range that remains stable across datasets of varying dimensionality. Theoretical analysis demonstrates that MCGI provides robust approximation by preserving manifold-consistent topological connectivity. Extensive evaluations against five industry-standard baselines across five datasets up to billion scales confirm the advantages of the proposed approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MCGI adds LID-based dynamic beam modulation to disk-resident graph indexes to better respect data manifolds, but the theoretical connectivity claims rest on unverified assumptions about runtime LID accuracy.

read the letter

The main takeaway is that MCGI adapts beam search ranges in graph-based ANN using on-the-fly local intrinsic dimensionality estimates, aiming to reduce the Euclidean-geodesic mismatch for billion-scale disk data instead of relying on a single fixed hyperparameter. This replaces uniform dimension treatment with a geometry-informed range that the authors say stays stable across datasets. The approach targets a practical pain point in high-dimensional retrieval where greedy routing drifts from the underlying manifold. They report theoretical analysis showing preserved manifold-consistent topological connectivity plus tests against five baselines on five datasets up to billion scale. The shift to in-situ LID modulation is presented as the concrete novelty, not just a restatement of prior graph methods. That part is useful for anyone tuning large indexes, since it could cut down on dataset-specific hyperparameter fiddling. The disk-resident focus also aligns with real constraints in AI retrieval systems where data exceeds memory. The evaluations claim advantages, which is worth checking for practitioners. The soft spots center on the missing details around the LID estimates themselves. The guarantee of robust approximation depends on those estimates being accurate enough during search to set beam budgets without adding distortion or noticeable overhead, yet the abstract gives no error bounds, derivation steps, or overhead measurements. If the in-situ LID varies or costs too much, the manifold consistency could weaken. The experimental section would need to show error bars, baseline tuning protocols, and controls to confirm the gains come from the adaptation rather than implementation artifacts. This work suits engineers and researchers focused on scalable vector search and disk-based ANN. A reader building production retrieval systems could extract the adaptation idea and test it, even if the theory needs more grounding. I would send it to peer review because the problem is concrete and the method is specific enough to generate useful referee comments on the LID assumptions and experimental controls.

Referee Report

3 major / 2 minor

Summary. The paper proposes Manifold-Consistent Graph Indexing (MCGI), a disk-resident graph index for billion-scale vector search that uses in-situ Local Intrinsic Dimensionality (LID) estimates to modulate per-query beam budgets. This is intended to mitigate the Euclidean-geodesic mismatch by preserving manifold-consistent topological connectivity during greedy routing, replacing a single scalar hyperparameter with a geometry-informed range claimed to be more stable across datasets. Theoretical analysis is asserted to demonstrate robust approximation guarantees, and empirical results are reported against five baselines on five datasets reaching billion scale.

Significance. If the unstated error bounds on in-situ LID estimation hold and the modulation preserves connectivity without prohibitive overhead, the approach could reduce hyperparameter sensitivity in high-dimensional graph ANN and improve robustness for disk-resident indexes. The scale of the claimed evaluations is a positive factor, but significance is limited by the absence of explicit derivations or overhead analysis for the LID component.

major comments (3)

[Theoretical Analysis] Theoretical Analysis section: the assertion that MCGI 'provides robust approximation by preserving manifold-consistent topological connectivity' is load-bearing for the central claim, yet no derivation, error bound, or distortion analysis is supplied for the in-situ LID estimates used to set beam budgets. Without these, the guarantee reduces to an unverified assumption that LID computation introduces negligible error relative to manifold curvature.
[Method] Method description (beam-modulation paragraph): the replacement of a single scalar hyperparameter by a 'geometry-informed range' is presented as reducing sensitivity, but the paper does not demonstrate that determining or applying this range is itself parameter-free or that its stability across dimensionalities has been proven rather than observed post-hoc.
[Experimental Evaluation] Experimental Evaluation section: claims of advantage on billion-scale datasets rest on comparisons whose statistical reliability cannot be assessed because error bars, number of runs, and exact tuning protocols for both MCGI and the five baselines are not reported.

minor comments (2)

[Abstract] Abstract and introduction: the phrase 'replaces a single scalar with a geometry-informed range' is repeated without a concrete definition or pseudocode showing how the range is computed from LID values.
[Figures] Figure captions and tables: axis labels and legend entries for the billion-scale timing and recall plots should explicitly state the beam-budget modulation formula used in each curve.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the theoretical foundations, clarify the method, and improve the experimental reporting.

read point-by-point responses

Referee: [Theoretical Analysis] Theoretical Analysis section: the assertion that MCGI 'provides robust approximation by preserving manifold-consistent topological connectivity' is load-bearing for the central claim, yet no derivation, error bound, or distortion analysis is supplied for the in-situ LID estimates used to set beam budgets. Without these, the guarantee reduces to an unverified assumption that LID computation introduces negligible error relative to manifold curvature.

Authors: We agree that the current presentation of the theoretical analysis is high-level and would benefit from explicit derivations. In the revised manuscript, we will expand the Theoretical Analysis section to include: (i) a derivation of the approximation guarantee based on the preservation of manifold-consistent edges under LID-modulated beam search, (ii) error bounds on in-situ LID estimates derived from concentration results for local dimensionality estimators, and (iii) a distortion analysis quantifying the maximum deviation from geodesic distances introduced by the modulation. These additions will replace the implicit assumption of negligible error with quantitative bounds. revision: yes
Referee: [Method] Method description (beam-modulation paragraph): the replacement of a single scalar hyperparameter by a 'geometry-informed range' is presented as reducing sensitivity, but the paper does not demonstrate that determining or applying this range is itself parameter-free or that its stability across dimensionalities has been proven rather than observed post-hoc.

Authors: The geometry-informed range is obtained directly from the per-point LID values computed during index construction and query processing, without introducing additional tunable scalars beyond the base graph parameters. While the cross-dataset stability was demonstrated empirically, we acknowledge that a formal proof of invariance across dimensionalities would be stronger. In the revision we will add a dedicated paragraph clarifying the parameter-free computation of the range and include a brief theoretical argument (leveraging the scale-invariance properties of LID) together with expanded empirical validation across a wider range of dimensionalities. revision: partial
Referee: [Experimental Evaluation] Experimental Evaluation section: claims of advantage on billion-scale datasets rest on comparisons whose statistical reliability cannot be assessed because error bars, number of runs, and exact tuning protocols for both MCGI and the five baselines are not reported.

Authors: We fully agree that statistical details are required for assessing reliability. In the revised manuscript we will report: (i) results averaged over 10 independent runs with error bars showing standard deviation, (ii) the exact number of runs performed, and (iii) the complete tuning protocols, including the hyperparameter search grids and selection criteria used for MCGI and all five baselines. These additions will be placed in the Experimental Evaluation section and the supplementary material. revision: yes

Circularity Check

0 steps flagged

No circularity: theoretical claim rests on proposed LID modulation without reducing to self-defined inputs

full rationale

The paper's strongest claim is that 'Theoretical analysis demonstrates that MCGI provides robust approximation by preserving manifold-consistent topological connectivity.' No equations, derivations, or self-citations are exhibited in the abstract or described text that reduce this preservation to a fitted parameter, renamed input, or self-citation chain. The approach modulates beam budgets via in-situ LID estimates as a new geometry-aware strategy, supported by external baselines rather than internal redefinition. No self-definitional, fitted-prediction, or uniqueness-imported patterns appear. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are introduced beyond the standard concept of Local Intrinsic Dimensionality from prior literature.

pith-pipeline@v0.9.0 · 5440 in / 971 out tokens · 36937 ms · 2026-05-16T18:15:19.419814+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Proposition 4.3(Connectivity Preservation)... EEMST ⊆ ERNG ⊆ EMCGI... α(u)≥1.0
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

α(u) ≜ Φ(dLID(u))... logistic... z(u)=(dLID(u)−μ)/σ

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages

[1]

Houle, Ken-ichi Kawarabayashi, and Michael Nett

Laurent Amsaleg, Oussama Chelly, Teddy Furon, Stéphane Girard, Michael E. Houle, Ken-ichi Kawarabayashi, and Michael Nett. Estimating local intrinsic dimensionality. InProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, page 29–38, New York, NY , USA, 2015. Association for Computing Machinery

work page 2015
[2]

Princeton University Press, 2010

RICHARD BELLMAN and Stuart Dreyfus.Dynamic Programming, volume 33. Princeton University Press, 2010

work page 2010
[3]

Multidimensional binary search trees used for associative searching.Com- mun

Jon Louis Bentley. Multidimensional binary search trees used for associative searching.Com- mun. ACM, 18(9):509–517, September 1975

work page 1975
[4]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwi...

work page 2020
[5]

Spann: highly-efficient billion-scale approximate nearest neighbor search

Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. Spann: highly-efficient billion-scale approximate nearest neighbor search. In Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ’21, Red Hook, NY , USA, 2021. Curran Associates Inc

work page 2021
[6]

Random projection trees and low dimensional manifolds

Sanjoy Dasgupta and Yoav Freund. Random projection trees and low dimensional manifolds. InProceedings of the Fortieth Annual ACM Symposium on Theory of Computing, STOC ’08, page 537–546, New York, NY , USA, 2008. Association for Computing Machinery

work page 2008
[7]

Mirrokni

Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. InProceedings of the Twentieth Annual Symposium on Computational Geometry, SCG ’04, page 253–262, New York, NY , USA, 2004. Association for Computing Machinery

work page 2004
[8]

Navigable graphs for high-dimensional nearest neighbor search: Constructions and limits

Haya Diwan, Jinrui Gou, Cameron Musco, Christopher Musco, and Torsten Suel. Navigable graphs for high-dimensional nearest neighbor search: Constructions and limits. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 59513–59531. Curran Associates...

work page 2024
[9]

Fast approximate nearest neighbor search with the navigating spreading-out graph.Proc

Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. Fast approximate nearest neighbor search with the navigating spreading-out graph.Proc. VLDB Endow., 12(5):461–474, January 2019

work page 2019
[10]

Optimized product quantization

Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. Optimized product quantization. volume 36, page 744–755, USA, April 2014. IEEE Computer Society

work page 2014
[11]

Gupta, R

A. Gupta, R. Krauthgamer, and J.R. Lee. Bounded geometries, fractals, and low-distortion embeddings. In44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings., pages 534–543, 2003

work page 2003
[12]

R-trees: a dynamic index structure for spatial searching

Antonin Guttman. R-trees: a dynamic index structure for spatial searching. InProceedings of the 1984 ACM SIGMOD International Conference on Management of Data, SIGMOD ’84, page 47–57, New York, NY , USA, 1984. Association for Computing Machinery

work page 1984
[13]

On the difficulty of nearest neighbor search

Junfeng He, Sanjiv Kumar, and Shih-Fu Chang. On the difficulty of nearest neighbor search. InProceedings of the 29th International Coference on International Conference on Machine Learning, ICML’12, page 41–48, Madison, WI, USA, 2012. Omnipress

work page 2012
[14]

Michael E. Houle. Local intrinsic dimensionality I: an extreme-value-theoretic foundation for similarity applications. In Christian Beecks, Felix Borutta, Peer Kröger, and Thomas Seidl, editors,Similarity Search and Applications - 10th International Conference, SISAP 2017, Munich, Germany, October 4-6, 2017, Proceedings, volume 10609 ofLecture Notes in Co...

work page 2017
[15]

Approximate nearest neighbors: towards removing the curse of dimensionality

Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. InProceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC ’98, page 604–613, New York, NY , USA, 1998. Association for Computing Machinery

work page 1998
[16]

Worst-case performance of popular approximate nearest neighbor search implementations: Guarantees and limitations

Piotr Indyk and Haike Xu. Worst-case performance of popular approximate nearest neighbor search implementations: Guarantees and limitations. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 66239–66256. Curran Associates, Inc., 2023

work page 2023
[17]

Survey of hallucination in natural language generation

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation. ACM Comput. Surv., 55(12), March 2023

work page 2023
[18]

Billion-scale similarity search with gpus.IEEE Transactions on Big Data, 7(3):535–547, 2021

Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with gpus.IEEE Transactions on Big Data, 7(3):535–547, 2021

work page 2021
[19]

Product quantization for nearest neighbor search.IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1):117–128, 2011

Herve Jégou, Matthijs Douze, and Cordelia Schmid. Product quantization for nearest neighbor search.IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1):117–128, 2011

work page 2011
[20]

Searching in one billion vectors: Re-rank with source coding

Hervé Jégou, Romain Tavenard, Matthijs Douze, and Laurent Amsaleg. Searching in one billion vectors: Re-rank with source coding. In2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 861–864, 2011

work page 2011
[21]

Karger and Matthias Ruhl

David R. Karger and Matthias Ruhl. Finding nearest neighbors in growth-restricted metrics. In Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, STOC ’02, page 741–750, New York, NY , USA, 2002. Association for Computing Machinery

work page 2002
[22]

Dense passage retrieval for open-domain question answering

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors,Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online,...

work page 2020
[23]

Gunawi, Cody Hammock, Joe Mambretti, Alexander Barnes, François Halbach, Alex Rocha, and Joe Stubbs

Kate Keahey, Jason Anderson, Zhuo Zhen, Pierre Riteau, Paul Ruth, Dan Stanzione, Mert Cevik, Jacob Colleran, Haryadi S. Gunawi, Cody Hammock, Joe Mambretti, Alexander Barnes, François Halbach, Alex Rocha, and Joe Stubbs. Lessons learned from the chameleon testbed. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC ’20). USENIX Assoc...

work page 2020
[24]

Maximum likelihood estimation of intrinsic dimension

Elizaveta Levina and Peter Bickel. Maximum likelihood estimation of intrinsic dimension. In L. Saul, Y . Weiss, and L. Bottou, editors,Advances in Neural Information Processing Systems, volume 17. MIT Press, 2004

work page 2004
[25]

Retrieval-augmented generation for knowledge-intensive nlp tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks. InProceed- ings of the 34th International Conference on Neural Information Processing Systems, NIPS ’...

work page 2020
[26]

Erfani, Sudanthi N

Xingjun Ma, Bo Li, Yisen Wang, Sarah M. Erfani, Sudanthi N. R. Wijewickrema, Michael E. Houle, Grant Schoenebeck, Dawn Song, and James Bailey. Characterizing adversarial subspaces using local intrinsic dimensionality. InInternational Conference on Learning Representations, 2018

work page 2018
[27]

Malkov and D

Yu A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. volume 42, page 824–836, USA, April 2020. IEEE Computer Society

work page 2020
[28]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Global vectors for word representation. In Alessandro Moschitti, Bo Pang, and Walter Daelemans, editors, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group o...

work page 2014
[29]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors,Proceedings of the 38th International Conference on Machin...

work page 2021
[30]

The probabilistic relevance framework: Bm25 and beyond

Stephen Robertson and Hugo Zaragoza. The probabilistic relevance framework: Bm25 and beyond. InFoundations and Trends in Information Retrieval, volume 3, pages 333–389. Now Publishers, Inc., 2009

work page 2009
[31]

Roweis and Lawrence K

Sam T. Roweis and Lawrence K. Saul. Nonlinear dimensionality reduction by locally linear embedding.Science, 290(5500):2323–2326, 2000

work page 2000
[32]

Results of the big ANN: NeurIPS’23 competition

Harsha Vardhan simhadri, Martin Aumüller, Matthijs Douze, Dmitry Baranchuk, Amir Ingber, Edo Liberty, George Williams, Ben Landrum, Magdalen Dobson Manohar, Mazin Karjikar, Laxman Dhulipala, Meng Chen, Yue Chen, Rui Ma, Kai Zhang, Yuzheng Cai, Jiayang Shi, Weiguo Zheng, Yizhuo Chen, Jie Yin, and Ben Huang. Results of the big ANN: NeurIPS’23 competition. I...

work page 2025
[33]

Results of the neurips’21 challenge on billion- scale approximate nearest neighbor search

Harsha Vardhan Simhadri, George Williams, Martin Aumüller, Matthijs Douze, Artem Babenko, Dmitry Baranchuk, Qi Chen, Lucas Hosseini, Ravishankar Krishnaswamy, Gopal Srinivasa, Suhas Jayaram Subramanya, and Jingdong Wang. Results of the neurips’21 challenge on billion- scale approximate nearest neighbor search. InNeurIPS 2021 Competitions and Demonstration...

work page 2021
[34]

Diskann: fast accurate billion-point nearest neighbor search on a single node

Suhas Jayaram Subramanya, Devvrit, Rohan Kadekodi, Ravishankar Krishaswamy, and Har- sha Vardhan Simhadri. Diskann: fast accurate billion-point nearest neighbor search on a single node. InProceedings of the 33rd International Conference on Neural Information Processing Systems, 2019

work page 2019
[35]

Tenenbaum, Vin de Silva, and John C

Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework for nonlinear dimensionality reduction.Science, 290(5500):2319–2323, 2000

work page 2000
[36]

Toussaint

Godfried T. Toussaint. The relative neighbourhood graph of a finite planar set.Pattern Recognition, 12(4):261–268, 1980

work page 1980
[37]

Llama: Open and efficient foundation language models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models. 2023

work page 2023
[38]

Steiner-hardness: A query hardness measure for graph-based ann indexes.Proc

Zeyu Wang, Qitong Wang, Xiaoxing Cheng, Peng Wang, Themis Palpanas, and Wei Wang. Steiner-hardness: A query hardness measure for graph-based ann indexes.Proc. VLDB Endow., 17(13):4668–4682, September 2024

work page 2024
[39]

Cspg: Crossing sparse proximity graphs for approximate nearest neighbor search

Ming Yang, Yuzheng Cai, and Weiguo Zheng. Cspg: Crossing sparse proximity graphs for approximate nearest neighbor search. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 103076–103100. Curran Associates, Inc., 2024

work page 2024
[40]

Incremental isometric embedding of high-dimensional data using connected neighborhood graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1):86–98, 2009

Dongfang Zhao and Li Yang. Incremental isometric embedding of high-dimensional data using connected neighborhood graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1):86–98, 2009

work page 2009
[41]

Towards efficient index construction and approximate nearest neighbor search in high-dimensional spaces.Proc

Xi Zhao, Yao Tian, Kai Huang, Bolong Zheng, and Xiaofang Zhou. Towards efficient index construction and approximate nearest neighbor search in high-dimensional spaces.Proc. VLDB Endow., 16(8):1979–1991, April 2023. 12 A Complete Proofs Proposition A.1(Monotonicity).The mapping function Φ is strictly decreasing with respect to the estimated local intrinsic...

work page 1979
[42]

Let s=∥u−v∥ be the distance between the current node and the candidate neighbor (the step size)

Geometric Condition for Improvement.To make progress, the algorithm must identify a neighbor v such that the distance to the query is reduced by a factor ϵ (where 0< ϵ <1 ), i.e., ∥v−q∥ ≤(1−ϵ)r . Let s=∥u−v∥ be the distance between the current node and the candidate neighbor (the step size). Applying the Law of Cosines to triangle△uvq, we have: ∥v−q∥ 2 =r...

work page
[43]

The probability Psuccess of finding a neighbor in the improving cone corresponds to the ratio of the surface area of the spherical cap C(θmax) to the total surface area of Sd−1

Volume Concentration of the Improving Region.In the tangent space at u, the directions of neighbors are uniformly distributed on the unit hypersphere Sd−1. The probability Psuccess of finding a neighbor in the improving cone corresponds to the ratio of the surface area of the spherical cap C(θmax) to the total surface area of Sd−1. This ratio is given by ...

work page
[44]

The denominator relates to the surface area of the hypersphere and can be evaluated using the standard integral identity for powers of sine

Asymptotic Analysis.We analyze the asymptotic behavior of this ratio as d→ ∞ . The denominator relates to the surface area of the hypersphere and can be evaluated using the standard integral identity for powers of sine. Exploiting the symmetry ofsinϕaroundπ/2, we have: Z π 0 sind−2 ϕ dϕ= 2 Z π 2 0 sind−2 ϕ dϕ = 2· √π 2 Γ( d−1 2 ) Γ( d

work page
[45]

= √π Γ( d−1 2 ) Γ( d

work page
[46]

Settingx=d/2anda=−1/2, we obtain the approximation: √π Γ( d 2 − 1 2) Γ( d

(40) For high-dimensional spaces (d≫1 ), we apply the asymptotic property of the Gamma function ratio, Γ(x+a) Γ(x) ≈x a. Settingx=d/2anda=−1/2, we obtain the approximation: √π Γ( d 2 − 1 2) Γ( d

work page
[47]

Since the integrand sind−2 ϕ decays exponentially fast away from θmax, the integral is dominated by the contribution near the upper limit

≈ √π d 2 − 1 2 = r 2π d .(41) For the numerator, we exploit the concentration of measure near the integration boundary θmax. Since the integrand sind−2 ϕ decays exponentially fast away from θmax, the integral is dominated by the contribution near the upper limit. We apply integration by parts to extract the leading order term. Recall that d dϕ(sind−1 ϕ) =...

work page
[48]

Thus: E[Ndist]≥Ω √ d·exp(λ·d) .(47) This confirms that the routing cost is dominated by the exponential termexp(λ·LID(q)) , necessitating the adaptive beam width design in MCGI

Complexity Bound.The expected number of distance evaluations follows E[Ndist] = 1/Psuccess. Thus: E[Ndist]≥Ω √ d·exp(λ·d) .(47) This confirms that the routing cost is dominated by the exponential termexp(λ·LID(q)) , necessitating the adaptive beam width design in MCGI. B Experimental Details Datasets.We evaluate our method on four standard benchmarks, ran...

work page 2001

[1] [1]

Houle, Ken-ichi Kawarabayashi, and Michael Nett

Laurent Amsaleg, Oussama Chelly, Teddy Furon, Stéphane Girard, Michael E. Houle, Ken-ichi Kawarabayashi, and Michael Nett. Estimating local intrinsic dimensionality. InProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, page 29–38, New York, NY , USA, 2015. Association for Computing Machinery

work page 2015

[2] [2]

Princeton University Press, 2010

RICHARD BELLMAN and Stuart Dreyfus.Dynamic Programming, volume 33. Princeton University Press, 2010

work page 2010

[3] [3]

Multidimensional binary search trees used for associative searching.Com- mun

Jon Louis Bentley. Multidimensional binary search trees used for associative searching.Com- mun. ACM, 18(9):509–517, September 1975

work page 1975

[4] [4]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwi...

work page 2020

[5] [5]

Spann: highly-efficient billion-scale approximate nearest neighbor search

Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. Spann: highly-efficient billion-scale approximate nearest neighbor search. In Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ’21, Red Hook, NY , USA, 2021. Curran Associates Inc

work page 2021

[6] [6]

Random projection trees and low dimensional manifolds

Sanjoy Dasgupta and Yoav Freund. Random projection trees and low dimensional manifolds. InProceedings of the Fortieth Annual ACM Symposium on Theory of Computing, STOC ’08, page 537–546, New York, NY , USA, 2008. Association for Computing Machinery

work page 2008

[7] [7]

Mirrokni

Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. InProceedings of the Twentieth Annual Symposium on Computational Geometry, SCG ’04, page 253–262, New York, NY , USA, 2004. Association for Computing Machinery

work page 2004

[8] [8]

Navigable graphs for high-dimensional nearest neighbor search: Constructions and limits

Haya Diwan, Jinrui Gou, Cameron Musco, Christopher Musco, and Torsten Suel. Navigable graphs for high-dimensional nearest neighbor search: Constructions and limits. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 59513–59531. Curran Associates...

work page 2024

[9] [9]

Fast approximate nearest neighbor search with the navigating spreading-out graph.Proc

Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. Fast approximate nearest neighbor search with the navigating spreading-out graph.Proc. VLDB Endow., 12(5):461–474, January 2019

work page 2019

[10] [10]

Optimized product quantization

Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. Optimized product quantization. volume 36, page 744–755, USA, April 2014. IEEE Computer Society

work page 2014

[11] [11]

Gupta, R

A. Gupta, R. Krauthgamer, and J.R. Lee. Bounded geometries, fractals, and low-distortion embeddings. In44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings., pages 534–543, 2003

work page 2003

[12] [12]

R-trees: a dynamic index structure for spatial searching

Antonin Guttman. R-trees: a dynamic index structure for spatial searching. InProceedings of the 1984 ACM SIGMOD International Conference on Management of Data, SIGMOD ’84, page 47–57, New York, NY , USA, 1984. Association for Computing Machinery

work page 1984

[13] [13]

On the difficulty of nearest neighbor search

Junfeng He, Sanjiv Kumar, and Shih-Fu Chang. On the difficulty of nearest neighbor search. InProceedings of the 29th International Coference on International Conference on Machine Learning, ICML’12, page 41–48, Madison, WI, USA, 2012. Omnipress

work page 2012

[14] [14]

Michael E. Houle. Local intrinsic dimensionality I: an extreme-value-theoretic foundation for similarity applications. In Christian Beecks, Felix Borutta, Peer Kröger, and Thomas Seidl, editors,Similarity Search and Applications - 10th International Conference, SISAP 2017, Munich, Germany, October 4-6, 2017, Proceedings, volume 10609 ofLecture Notes in Co...

work page 2017

[15] [15]

Approximate nearest neighbors: towards removing the curse of dimensionality

Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. InProceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC ’98, page 604–613, New York, NY , USA, 1998. Association for Computing Machinery

work page 1998

[16] [16]

Worst-case performance of popular approximate nearest neighbor search implementations: Guarantees and limitations

Piotr Indyk and Haike Xu. Worst-case performance of popular approximate nearest neighbor search implementations: Guarantees and limitations. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 66239–66256. Curran Associates, Inc., 2023

work page 2023

[17] [17]

Survey of hallucination in natural language generation

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation. ACM Comput. Surv., 55(12), March 2023

work page 2023

[18] [18]

Billion-scale similarity search with gpus.IEEE Transactions on Big Data, 7(3):535–547, 2021

Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with gpus.IEEE Transactions on Big Data, 7(3):535–547, 2021

work page 2021

[19] [19]

Product quantization for nearest neighbor search.IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1):117–128, 2011

Herve Jégou, Matthijs Douze, and Cordelia Schmid. Product quantization for nearest neighbor search.IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1):117–128, 2011

work page 2011

[20] [20]

Searching in one billion vectors: Re-rank with source coding

Hervé Jégou, Romain Tavenard, Matthijs Douze, and Laurent Amsaleg. Searching in one billion vectors: Re-rank with source coding. In2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 861–864, 2011

work page 2011

[21] [21]

Karger and Matthias Ruhl

David R. Karger and Matthias Ruhl. Finding nearest neighbors in growth-restricted metrics. In Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, STOC ’02, page 741–750, New York, NY , USA, 2002. Association for Computing Machinery

work page 2002

[22] [22]

Dense passage retrieval for open-domain question answering

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors,Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online,...

work page 2020

[23] [23]

Gunawi, Cody Hammock, Joe Mambretti, Alexander Barnes, François Halbach, Alex Rocha, and Joe Stubbs

Kate Keahey, Jason Anderson, Zhuo Zhen, Pierre Riteau, Paul Ruth, Dan Stanzione, Mert Cevik, Jacob Colleran, Haryadi S. Gunawi, Cody Hammock, Joe Mambretti, Alexander Barnes, François Halbach, Alex Rocha, and Joe Stubbs. Lessons learned from the chameleon testbed. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC ’20). USENIX Assoc...

work page 2020

[24] [24]

Maximum likelihood estimation of intrinsic dimension

Elizaveta Levina and Peter Bickel. Maximum likelihood estimation of intrinsic dimension. In L. Saul, Y . Weiss, and L. Bottou, editors,Advances in Neural Information Processing Systems, volume 17. MIT Press, 2004

work page 2004

[25] [25]

Retrieval-augmented generation for knowledge-intensive nlp tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks. InProceed- ings of the 34th International Conference on Neural Information Processing Systems, NIPS ’...

work page 2020

[26] [26]

Erfani, Sudanthi N

Xingjun Ma, Bo Li, Yisen Wang, Sarah M. Erfani, Sudanthi N. R. Wijewickrema, Michael E. Houle, Grant Schoenebeck, Dawn Song, and James Bailey. Characterizing adversarial subspaces using local intrinsic dimensionality. InInternational Conference on Learning Representations, 2018

work page 2018

[27] [27]

Malkov and D

Yu A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. volume 42, page 824–836, USA, April 2020. IEEE Computer Society

work page 2020

[28] [28]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Global vectors for word representation. In Alessandro Moschitti, Bo Pang, and Walter Daelemans, editors, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group o...

work page 2014

[29] [29]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors,Proceedings of the 38th International Conference on Machin...

work page 2021

[30] [30]

The probabilistic relevance framework: Bm25 and beyond

Stephen Robertson and Hugo Zaragoza. The probabilistic relevance framework: Bm25 and beyond. InFoundations and Trends in Information Retrieval, volume 3, pages 333–389. Now Publishers, Inc., 2009

work page 2009

[31] [31]

Roweis and Lawrence K

Sam T. Roweis and Lawrence K. Saul. Nonlinear dimensionality reduction by locally linear embedding.Science, 290(5500):2323–2326, 2000

work page 2000

[32] [32]

Results of the big ANN: NeurIPS’23 competition

Harsha Vardhan simhadri, Martin Aumüller, Matthijs Douze, Dmitry Baranchuk, Amir Ingber, Edo Liberty, George Williams, Ben Landrum, Magdalen Dobson Manohar, Mazin Karjikar, Laxman Dhulipala, Meng Chen, Yue Chen, Rui Ma, Kai Zhang, Yuzheng Cai, Jiayang Shi, Weiguo Zheng, Yizhuo Chen, Jie Yin, and Ben Huang. Results of the big ANN: NeurIPS’23 competition. I...

work page 2025

[33] [33]

Results of the neurips’21 challenge on billion- scale approximate nearest neighbor search

Harsha Vardhan Simhadri, George Williams, Martin Aumüller, Matthijs Douze, Artem Babenko, Dmitry Baranchuk, Qi Chen, Lucas Hosseini, Ravishankar Krishnaswamy, Gopal Srinivasa, Suhas Jayaram Subramanya, and Jingdong Wang. Results of the neurips’21 challenge on billion- scale approximate nearest neighbor search. InNeurIPS 2021 Competitions and Demonstration...

work page 2021

[34] [34]

Diskann: fast accurate billion-point nearest neighbor search on a single node

Suhas Jayaram Subramanya, Devvrit, Rohan Kadekodi, Ravishankar Krishaswamy, and Har- sha Vardhan Simhadri. Diskann: fast accurate billion-point nearest neighbor search on a single node. InProceedings of the 33rd International Conference on Neural Information Processing Systems, 2019

work page 2019

[35] [35]

Tenenbaum, Vin de Silva, and John C

Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework for nonlinear dimensionality reduction.Science, 290(5500):2319–2323, 2000

work page 2000

[36] [36]

Toussaint

Godfried T. Toussaint. The relative neighbourhood graph of a finite planar set.Pattern Recognition, 12(4):261–268, 1980

work page 1980

[37] [37]

Llama: Open and efficient foundation language models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models. 2023

work page 2023

[38] [38]

Steiner-hardness: A query hardness measure for graph-based ann indexes.Proc

Zeyu Wang, Qitong Wang, Xiaoxing Cheng, Peng Wang, Themis Palpanas, and Wei Wang. Steiner-hardness: A query hardness measure for graph-based ann indexes.Proc. VLDB Endow., 17(13):4668–4682, September 2024

work page 2024

[39] [39]

Cspg: Crossing sparse proximity graphs for approximate nearest neighbor search

Ming Yang, Yuzheng Cai, and Weiguo Zheng. Cspg: Crossing sparse proximity graphs for approximate nearest neighbor search. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 103076–103100. Curran Associates, Inc., 2024

work page 2024

[40] [40]

Incremental isometric embedding of high-dimensional data using connected neighborhood graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1):86–98, 2009

Dongfang Zhao and Li Yang. Incremental isometric embedding of high-dimensional data using connected neighborhood graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1):86–98, 2009

work page 2009

[41] [41]

Towards efficient index construction and approximate nearest neighbor search in high-dimensional spaces.Proc

Xi Zhao, Yao Tian, Kai Huang, Bolong Zheng, and Xiaofang Zhou. Towards efficient index construction and approximate nearest neighbor search in high-dimensional spaces.Proc. VLDB Endow., 16(8):1979–1991, April 2023. 12 A Complete Proofs Proposition A.1(Monotonicity).The mapping function Φ is strictly decreasing with respect to the estimated local intrinsic...

work page 1979

[42] [42]

Let s=∥u−v∥ be the distance between the current node and the candidate neighbor (the step size)

Geometric Condition for Improvement.To make progress, the algorithm must identify a neighbor v such that the distance to the query is reduced by a factor ϵ (where 0< ϵ <1 ), i.e., ∥v−q∥ ≤(1−ϵ)r . Let s=∥u−v∥ be the distance between the current node and the candidate neighbor (the step size). Applying the Law of Cosines to triangle△uvq, we have: ∥v−q∥ 2 =r...

work page

[43] [43]

The probability Psuccess of finding a neighbor in the improving cone corresponds to the ratio of the surface area of the spherical cap C(θmax) to the total surface area of Sd−1

Volume Concentration of the Improving Region.In the tangent space at u, the directions of neighbors are uniformly distributed on the unit hypersphere Sd−1. The probability Psuccess of finding a neighbor in the improving cone corresponds to the ratio of the surface area of the spherical cap C(θmax) to the total surface area of Sd−1. This ratio is given by ...

work page

[44] [44]

The denominator relates to the surface area of the hypersphere and can be evaluated using the standard integral identity for powers of sine

Asymptotic Analysis.We analyze the asymptotic behavior of this ratio as d→ ∞ . The denominator relates to the surface area of the hypersphere and can be evaluated using the standard integral identity for powers of sine. Exploiting the symmetry ofsinϕaroundπ/2, we have: Z π 0 sind−2 ϕ dϕ= 2 Z π 2 0 sind−2 ϕ dϕ = 2· √π 2 Γ( d−1 2 ) Γ( d

work page

[45] [45]

= √π Γ( d−1 2 ) Γ( d

work page

[46] [46]

Settingx=d/2anda=−1/2, we obtain the approximation: √π Γ( d 2 − 1 2) Γ( d

(40) For high-dimensional spaces (d≫1 ), we apply the asymptotic property of the Gamma function ratio, Γ(x+a) Γ(x) ≈x a. Settingx=d/2anda=−1/2, we obtain the approximation: √π Γ( d 2 − 1 2) Γ( d

work page

[47] [47]

Since the integrand sind−2 ϕ decays exponentially fast away from θmax, the integral is dominated by the contribution near the upper limit

≈ √π d 2 − 1 2 = r 2π d .(41) For the numerator, we exploit the concentration of measure near the integration boundary θmax. Since the integrand sind−2 ϕ decays exponentially fast away from θmax, the integral is dominated by the contribution near the upper limit. We apply integration by parts to extract the leading order term. Recall that d dϕ(sind−1 ϕ) =...

work page

[48] [48]

Thus: E[Ndist]≥Ω √ d·exp(λ·d) .(47) This confirms that the routing cost is dominated by the exponential termexp(λ·LID(q)) , necessitating the adaptive beam width design in MCGI

Complexity Bound.The expected number of distance evaluations follows E[Ndist] = 1/Psuccess. Thus: E[Ndist]≥Ω √ d·exp(λ·d) .(47) This confirms that the routing cost is dominated by the exponential termexp(λ·LID(q)) , necessitating the adaptive beam width design in MCGI. B Experimental Details Datasets.We evaluate our method on four standard benchmarks, ran...

work page 2001