Generalized Range Filtering Approximate Nearest Neighbor Search: Containment and Overlap [Technical Report]

Jeffrey Xu Yu; Jiadong Xie; Jiangtao Cui; Tong Wu; Yang Zhao; Yingfan Liu

arxiv: 2605.26474 · v1 · pith:GO7EC5ZAnew · submitted 2026-05-26 · 💻 cs.DB · cs.IR

Generalized Range Filtering Approximate Nearest Neighbor Search: Containment and Overlap [Technical Report]

Yingfan Liu , Tong Wu , Jiadong Xie , Yang Zhao , Jeffrey Xu Yu , Jiangtao Cui This is my paper

Pith reviewed 2026-07-01 16:38 UTC · model grok-4.3

classification 💻 cs.DB cs.IR

keywords range filteringapproximate nearest neighborRRANNcontainment predicateoverlap predicatemulti-segment tree graphpredicate pruningvector search

0 comments

The pith

A multi-segment tree graph supports approximate nearest neighbor search under arbitrary range-range predicates by pruning non-matching paths at internal nodes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method for approximate nearest neighbor search where each vector has an associated numeric range, and queries specify both a query vector and a query range that must relate to object ranges via user-chosen predicates such as containment or overlap. Existing approaches handle only fixed predicate types and cannot accommodate arbitrary ones without new indexes. The new structure evaluates predicates during traversal to skip invalid branches while using the same overall index size and build time as prior methods for restricted cases. This matters for applications involving price ranges, time intervals, or similar attributes because it allows flexible query conditions without sacrificing search speed. Experiments on real data report up to 12.5 times faster queries for the generalized case and better or equal performance on the restricted variants.

Core claim

The multi-segment tree graph enables ANN search with arbitrary RR predicates on range-valued attributes by evaluating the predicate at internal nodes to avoid traversing non-satisfying paths, while keeping index size and construction time equivalent to state-of-the-art methods designed only for more limited RFANN queries.

What carries the argument

The multi-segment tree graph, which augments a graph-based ANN index with structures at each node that evaluate range-range predicates to prune non-matching subpaths during search.

If this is right

Supports user-defined RR predicates including containment, overlap, and their disjunctions without requiring separate indexes.
Delivers up to 12.5x faster RRANN queries than baselines at the same accuracy level.
Matches state-of-the-art speed on RFANN queries and exceeds it on IFANN and TSANN queries.
Maintains identical index size and construction time to existing methods for restricted range-filtered ANN.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same node-level predicate evaluation could be adapted to other attribute types such as sets or strings if the evaluation cost stays low.
In very high-dimensional vector spaces the pruning may also reduce the number of distance computations that dominate runtime.
If predicates change frequently between queries, the method would benefit from lazy or cached predicate results at nodes.

Load-bearing premise

Evaluating the range predicate at each internal node during traversal adds little enough time that the savings from skipping invalid paths outweigh the added cost.

What would settle it

A workload where nearly all objects satisfy the predicate, so pruning rarely triggers, combined with measurements showing whether total search time remains equal to or better than a baseline that traverses without predicate checks.

Figures

Figures reproduced from arXiv: 2605.26474 by Jeffrey Xu Yu, Jiadong Xie, Jiangtao Cui, Tong Wu, Yang Zhao, Yingfan Liu.

**Figure 2.** Figure 2: Next, we insert each 𝑜𝑖 ∈ 𝑂 into MSTG in ascending order of its 𝑙𝑖 value, employing its 𝑟𝑖 value as the key for segment tree insertion. This process constructs indexes T 1 , · · · , T |𝐴| , where T 𝑥 corresponds to the graph index 𝐺𝑥 and manages 𝑂𝑥 . When inserting an object 𝑜𝑖 = (𝑣𝑖 ,𝑙𝑖 , 𝑟𝑖) into the current segment tree T 𝑥 , we represent each tree node in T 𝑥 by T 𝑥 𝑙,𝑟 , where [𝑙, 𝑟] indicates its ran… view at source ↗

**Figure 2.** Figure 2: The illustration of the merged MSTG Algorithm 2: ConstructMSTG(𝑂, 𝐴,𝑚) Input : 𝑂: the object set, 𝐴 = {𝑎1, · · · , 𝑎|𝐴| }: the attribute set (𝑎1 < · · · , 𝑎|𝐴| ), and 𝑚: the out-degree limit Output : The merged MSTG T 1 , · · · , T |𝐴| build a segment tree T 0 1 based on 𝐴 without objects; 2 𝑥 ← 1; for each 𝑜𝑖 = (𝑣𝑖 ,𝑙𝑖 , 𝑟𝑖 ) ∈ 𝑂 in ascending order of 𝑙 3 𝑖 do T ←InsertMSTG (𝑜𝑖 4 , T, 𝐴,𝑚); if 𝑙𝑖 ≠ 𝑙 5 𝑖+… view at source ↗

**Figure 3.** Figure 3: Overall query performance of RRANN (Exp. 1) [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 5.** Figure 5: Indexing costs of RRANN queries (Exp. 2) [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Query performance on RFANN queries (Exp. 3) [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Indexing costs (Exp. 3&4&5) 0.70 0.80 0.90 1.00 Recall@10 10 3 10 4 QPS Sift (sel=10%) 0.70 0.80 0.90 1.00 Recall@10 10 2 10 3 Gist (sel=10%) 0.70 0.80 0.90 1.00 Recall@10 10 2 10 3 Redcaps (sel=10%) MSTG Hi-PNG [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 10.** Figure 10: Scalability evaluation of MSTG (Exp. 6) Other Experiments: Due to the space limit, we put other important experiments in Appendix F. Specifically, we investigate the impact of query selectivity, attribute distribution, and the cardinality of 𝐴 on the performance of RRANN queries in Exp.s 7, 8, and 10, respectively. We also explore the effects of parameter 𝑘, 𝑒 𝑓𝑐𝑜𝑛, and 𝑀 on the performance of our metho… view at source ↗

**Figure 15.** Figure 15: Impact of the Cardinality of 𝐴 (Exp. 10) Exp. 10: Impact of the Cardinality of 𝐴. We vary the size of the attribute 𝐴 from 103 and 105 to show its effect on the RRANN performance. As shown in [PITH_FULL_IMAGE:figures/full_fig_p013_15.png] view at source ↗

**Figure 16.** Figure 16: Impact of 𝑘 values (Exp. 11) Exp. 11: Impact of 𝑘 [PITH_FULL_IMAGE:figures/full_fig_p013_16.png] view at source ↗

**Figure 14.** Figure 14: MSTG vs. Oracle-HNSW (Exp. 9) Exp. 9: MSTG vs. Oracle-HNSW. As aforementioned, our primary goal is to achieve search performance comparable to 𝑘-ANN search on a PG. Here, we compare MSTG with Oracle-HNSW, where a specific HNSW is constructed for each query to manage the vectors satisfying the query filter. Note that Oracle-HNSW is not a practical solution, because building such an HNSW for each query is u… view at source ↗

**Figure 11.** Figure 11: Overall query performance of RRANN with RDE as the accuracy measure (Exp. 1) [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗

**Figure 13.** Figure 13: The impact of attribute distribution on RRANN search performance (Exp. 8). Two lines represent the results of Sift [PITH_FULL_IMAGE:figures/full_fig_p014_13.png] view at source ↗

**Figure 17.** Figure 17: Impact of 𝑒 𝑓𝑐𝑜𝑛 values (Exp. 12) 8 16 32 64 0.90 0.95 1.00 Recall@10 10 QPS 3 Sift (sel=10%) 0.90 0.95 1.00 Recall@10 10 2 10 3 Gist (sel=10%) 0.90 0.95 1.00 Recall@10 10 2 10 3 WIT-Image (sel=10%) 0.90 0.95 1.00 Recall@10 10 3 10 4 Paper (sel=10%) 0.90 0.95 1.00 Recall@10 10 2 Redcaps (sel=10%) [PITH_FULL_IMAGE:figures/full_fig_p014_17.png] view at source ↗

**Figure 18.** Figure 18: Impact of 𝑀 values (Exp. 13) [PITH_FULL_IMAGE:figures/full_fig_p014_18.png] view at source ↗

read the original abstract

Approximate nearest neighbor (ANN) search with range filters has recently garnered significant attention. This paper delves into a generalized form of this problem, i.e., ANN search with exact range-range (RR) predicates on a range-valued attribute, named RR filtering ANN (RRANN). Specifically, given $n$ vectors in $\mathbb{R}^d$, each vector $v_i$ is associated with a numeric range $[l_i, r_i]$, symbolizing aspects like a price range or time interval. An RRANN query $(v_q, l_q, r_q)$ aims at finding $k$ vectors closest to $v_q$ within the vectors satisfying an arbitrary RR predicate defined between the query range $[l_q, r_q]$ and the object range $[l_i, r_i]$. The RR predicate remains unspecified, enabling user-defined conditions. It may encompass containment ($[l_i, r_i] \subseteq [l_q, r_q]$ or $[l_q, r_q] \subseteq [l_i, r_i]$), overlap ($l_i \le l_q \le r_i \le r_q$ or $l_q \le l_i \le r_q \le r_i$), or a disjunction of them. RRANN has broad applications in queries related to price ranges or time intervals, and it generalizes existing variants of ANN search with range filters. However, existing dedicated approaches for these problems lack the capacity to support queries with arbitrary RR predicates. Hence, we introduce a new approach, labeled multi-segment tree graph. It efficiently handles arbitrary RR predicates by avoiding traversal through non-predicate-satisfied nodes, and keeps equivalent index size and construction time to state-of-the-art methods for RFANN. Extensive experiments on real-world data demonstrate the efficacy of our approach in RRANN queries, achieving up to 12.5x speedups with the same accuracy as the baselines. Moreover, our approach attains comparable RFANN search performance and notably superior IFANN and TSANN search performance compared to the respective state-of-the-art approaches. Our code is available at https://github.com/FanEDG/MSTG.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The multi-segment tree graph extends prior RFANN/IFANN/TSANN indexes to user-defined RR predicates with claimed speedups, but the arbitrary-predicate pruning claim rests on unverified assumptions about what aggregates suffice.

read the letter

The core advance is a multi-segment tree graph index for RRANN queries that supports arbitrary range-range predicates instead of hard-coding containment or overlap. It keeps index size and build time in line with the specialized baselines while reporting up to 12.5x faster queries at the same accuracy on real data, plus competitive or better numbers when run on the older RFANN, IFANN, and TSANN tasks.

The design evaluates the predicate at internal nodes to skip non-matching subtrees. That is the practical payoff if the evaluation is cheap and the predicate admits early rejection.

The soft spot is exactly the generality claim. The abstract lists containment, overlap, and their disjunctions, but does not spell out the node aggregates used for pruning. For a predicate that is non-monotonic or depends on external state, no fixed min/max per node will decide rejection without visiting leaves. If the experiments only exercise the standard cases, the 12.5x figure does not yet demonstrate the arbitrary case. The public code helps here; a referee can check the actual pruning logic and run additional predicate tests.

Experimental reporting in the abstract is thin—no error bars, dataset sizes, or statistical tests—so the full paper needs to show those details are solid. No circularity or self-citation problems appear.

This is for database and IR researchers who need filtered ANN on range attributes. It is worth a serious referee because the generalization is concrete, the code is released, and the performance numbers on the tested predicates are worth verifying even if the broadest claim needs tightening.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes a multi-segment tree graph (MSTG) index structure for the generalized problem of approximate nearest neighbor search under range-range (RR) predicates (RRANN). Given vectors each associated with a numeric range [l_i, r_i], an RRANN query (v_q, [l_q, r_q]) seeks the k nearest vectors satisfying an arbitrary user-defined RR predicate between the query and object ranges. The predicate is left unspecified but illustrated with containment, overlap, and disjunctions thereof. The approach claims to prune non-satisfying subtrees at internal nodes while preserving index size and build time comparable to prior RFANN methods. Experiments on real-world data are reported to yield up to 12.5× speedups at accuracy parity for RRANN, with comparable RFANN performance and superior IFANN/TSANN results versus respective baselines. Public code is provided.

Significance. If the pruning mechanism for arbitrary predicates holds without unstated monotonicity assumptions, the work would generalize existing specialized range-filtered ANN methods and enable broader applications involving price or temporal ranges. The public availability of code is a clear strength that supports reproducibility and verification of the claimed speedups.

major comments (1)

[Abstract] Abstract: The central claim that the multi-segment tree graph 'efficiently handles arbitrary RR predicates by avoiding traversal through non-predicate-satisfied nodes' is load-bearing for the reported 12.5× speedups. No mechanism is described for evaluating an arbitrary predicate (as opposed to the listed containment/overlap cases) at internal nodes using fixed aggregates such as min-l/max-r; for non-monotonic or externally dependent predicates this would require leaf enumeration, undermining the pruning benefit.

minor comments (1)

[Abstract] Abstract: Experimental claims of speedups and accuracy parity provide no information on dataset characteristics, sizes, number of runs, error bars, or statistical significance testing, limiting assessment of the results.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and constructive feedback on the abstract and pruning claims. We address the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the multi-segment tree graph 'efficiently handles arbitrary RR predicates by avoiding traversal through non-predicate-satisfied nodes' is load-bearing for the reported 12.5× speedups. No mechanism is described for evaluating an arbitrary predicate (as opposed to the listed containment/overlap cases) at internal nodes using fixed aggregates such as min-l/max-r; for non-monotonic or externally dependent predicates this would require leaf enumeration, undermining the pruning benefit.

Authors: The paper targets RR predicates that can be decided using only the min-l and max-r aggregates stored at internal nodes of the multi-segment tree, specifically containment, overlap, and their disjunctions. These predicates are monotonic in the range bounds, allowing a node to determine whether its entire subtree can be pruned (none satisfy), must be traversed (all may satisfy), or requires further descent. For example, under containment [l_i, r_i] ⊆ [l_q, r_q], a subtree is pruned if its max-r < l_q or min-l > r_q. The full manuscript details this aggregate-based decision procedure for the illustrated predicates. We agree the abstract overstates generality by using 'arbitrary' without qualification. We will revise the abstract to state that the approach supports user-defined RR predicates evaluable via the stored aggregates (with containment/overlap as primary examples) and will add an explicit note on the limitation for non-monotonic or externally dependent predicates. The 12.5× speedups are measured on the supported containment and overlap cases. revision: yes

Circularity Check

0 steps flagged

No circularity; method and claims are independent of inputs

full rationale

The abstract and description introduce a multi-segment tree graph for RRANN queries supporting arbitrary predicates via node pruning, with performance claims backed by experiments against external baselines (RFANN, IFANN, TSANN SOTA) and public code release. No equations, self-citations, or fitted parameters are shown reducing results to inputs by construction; index size equivalence and speedups are presented as empirical outcomes rather than definitional. The approach is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities beyond the algorithmic construction itself; standard assumptions of vector spaces and range predicates are implicit but not detailed.

pith-pipeline@v0.9.1-grok · 5943 in / 1157 out tokens · 37784 ms · 2026-07-01T16:38:22.744529+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Unified Dominance Graph for Interval-Predicate Approximate Nearest Neighbor Search
cs.DB 2026-06 unverdicted novelty 7.0

Proposes Unified Dominance Graph (UDG) for interval-predicate ANNS by mapping to dominance space and building a predicate-specific graph index with patch edges for better search under filters.

Reference graph

Works this paper leans on

42 extracted references · 5 canonical work pages · cited by 1 Pith paper

[1]

Datasets for approximate nearest neighbor search

2010. Datasets for approximate nearest neighbor search. http://corpus-texmex. irisa.fr/

2010
[2]

James F. Allen. 1983. Maintaining Knowledge about Temporal Intervals.Commun. ACM26, 11 (1983), 832–843

1983
[3]

Ilias Azizi, Karima Echihabi, and Themis Palpanas. 2023. Elpis: Graph-Based Similarity Search for Scalable Data Science.Proc. VLDB Endow.16, 6 (2023), 1548–1559

2023
[4]

Ilias Azizi, Karima Echihabi, and Themis Palpanas. 2025. Graph-Based Vector Search: An Experimental Evaluation of the State-of-the-Art.Proc. ACM Manag. Data3, 1 (2025), 43:1–43:31

2025
[5]

Yuzheng Cai, Jiayang Shi, Yizhuo Chen, and Weiguo Zheng. 2024. Navigating La- bels and Vectors: A Unified Approach to Filtered Approximate Nearest Neighbor Search.Proceedings of the ACM on Management of Data2, 6 (2024), 1–27

2024
[6]

Yangshen Deng, Zhengxin You, Long Xiang, Qilong Li, Peiqi Yuan, Zhaoyang Hong, Yitao Zheng, Wanting Li, Runzhong Li, Haotian Liu, Kyriakos Mouratidis, Man Lung Yiu, Huan Li, Qiaomu Shen, Rui Mao, and Bo Tang. 2025. AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference. CoRRabs/2504.10326 (2025)

work page arXiv 2025
[7]

Karan Desai, Gaurav Kaul, Zubin Aysola, and Justin Johnson. 2021. Redcaps: Web-curated image-text data created by the people, for the people.arXiv preprint arXiv:2111.11431(2021)

work page arXiv 2021
[8]

Josh Engels, Ben Landrum, Shangdi Yu, Laxman Dhulipala, and Julian Shun. 2024. Approximate Nearest Neighbor Search with Window Filters. InICML. 12469 – 12490

2024
[9]

Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2019. Fast approximate nearest neighbor search with the navigating spreading-out graph.PVLDB12, 5 (2019), 461–474

2019
[10]

Siddharth Gollapudi, Neel Karia, Varun Sivashankar, Ravishankar Krishnaswamy, Nikit Begwani, Swapnil Raz, Yiyong Lin, Yin Zhang, Neelam Mahapatro, Premku- mar Srinivasan, et al. 2023. Filtered-diskann: Graph algorithms for approximate nearest neighbor search with filters. InProceedings of the ACM Web Conference

2023
[11]

Mengxu Jiang, Zhi Yang, Fangyuan Zhang, Guanhao Hou, Jieming Shi, Wenchao Zhou, Feifei Li, and Sibo Wang. 2025. DIGRA: A Dynamic Graph Indexing for Approximate Nearest Neighbor Search with Range Filter.Proc. ACM Manag. Data 3, 3 (2025), 148:1–148:26

2025
[12]

Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Mingjie Li, Wenjie Zhang, and Xuemin Lin. 2019. Approximate nearest neighbor search on high dimensional data – experiments, analyses, and improvement.IEEE TKDE32, 8 (2019), 1475– 1488

2019
[13]

Anqi Liang, Pengcheng Zhang, Bin Yao, Zhongpu Chen, Yitong Song, and Guangxu Cheng. 2025. UNIFY: Unified Index for Range Filtered Approximate Nearest Neighbors Search.Proc. VLDB Endow.18, 4 (May 2025), 1118–1130

2025
[14]

Shige Liu, Zhifang Zeng, Li Chen, Adil Ainihaer, Arun Ramasami, Songting Chen, Yu Xu, Mingxi Wu, and Jianguo Wang. 2025. TigerVector: Supporting Vector Search in Graph Databases for Advanced RAGs.CoRRabs/2501.11216 (2025)

work page arXiv 2025
[15]

Yury Malkov and Dmitry Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE TPAMI42, 4 (2018), 824–836

2018
[16]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. NeurIPS26 (2013)

2013
[17]

Nasser M Nasrabadi and Robert A King. 1988. Image coding using vector quanti- zation: A review.IEEE Transactions on communications36, 8 (1988), 957–971

1988
[18]

James Jie Pan, Jianguo Wang, and Guoliang Li. 2024. Survey of vector database management systems.VLDB J.33, 5 (2024), 1591–1615

2024
[19]

Liana Patel, Peter Kraft, Carlos Guestrin, and Matei Zaharia. 2024. ACORN: Per- formant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data.Proceedings of the ACM on Management of Data2, 3 (2024), 1 – 27

2024
[20]

Yun Peng, Byron Choi, Tsz Nam Chan, Jianye Yang, and Jianliang Xu. 2023. Efficient Approximate Nearest Neighbor Search in Multi-dimensional Databases. Proc. ACM Manag. Data1, 1 (2023), 54:1–54:27

2023
[21]

Hanan Samet. 1984. The quadtree and related hierarchical data structures.Com- put. Surveys16, 2 (1984), 187–260

1984
[22]

Aditi Singh, Suhas Jayaram Subramanya, Ravishankar Krishnaswamy, and Har- sha Vardhan Simhadri. 2021. FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search.CoRRabs/2105.09613 (2021)

work page arXiv 2021
[23]

Krishna Srinivasan, Karthik Raman, Jiecao Chen, Michael Bendersky, and Marc Najork. 2021. Wit: Wikipedia-based image text dataset for multimodal multi- lingual machine learning. InProceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 2443–2449

2021
[24]

Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xi- angyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, Kun Yu, Yuxing Yuan, Yinghao Zou, Jiquan Long, Yudong Cai, Zhenxiang Li, Zhifeng Zhang, Yihua Mo, Jun Gu, Ruiyi Jiang, Yi Wei, and Charles Xie. 2021. Milvus: A Purpose-Built Vector Data Management System. InSIGMOD ’21: Internatio...

2021
[25]

Mengzhao Wang, Lingwei Lv, Xiaoliang Xu, Yuxiang Wang, Qiang Yue, and Jiongkang Ni. 2023. An efficient and robust framework for approximate near- est neighbor search with attribute constraint.Advances in Neural Information Processing Systems36 (2023), 15738–15751

2023
[26]

Mengzhao Wang, Xiaoliang Xu, Qiang Yue, and Yuxiang Wang. 2021. A Com- prehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search.PVLDB14, 11 (2021), 1964–1978

2021
[27]

Yuxiang Wang, Ziyuan He, Yongxin Tong, Zimu Zhou, and Yiman Zhong. 2025. Timestamp Approximate Nearest Neighbor Search over High-Dimensional Vector Data. InICDE. IEEE Computer Society, 3043–3055

2025
[28]

Chuangxian Wei, Bin Wu, Sheng Wang, Renjie Lou, Chaoqun Zhan, Feifei Li, and Yuanzhe Cai. 2020. AnalyticDB-V: A Hybrid Analytical Engine Towards Query Fusion for Structured and Unstructured Data.PVLDB13, 12 (2020), 3152–3165

2020
[29]

Jiadong Xie, Jeffrey Xu Yu, and Yingfan Liu. 2025. Fast Approximate Similarity Join in Vector Databases.Proc. ACM Manag. Data3, 3 (2025), 158:1–158:26

2025
[30]

Jiadong Xie, Jeffrey Xu Yu, and Yingfan Liu. 2025. Graph Based K-Nearest Neighbor Search Revisited.ACM Trans. Database Syst.(May 2025)

2025
[31]

Jiadong Xie, Jeffrey Xu Yu, Siyi Teng, and Yingfan Liu. 2025. Beyond Vector Search: Querying With and Without Predicates.Proc. ACM Manag. Data3, 6 (2025), 1–26

2025
[32]

Yuexuan Xu, Jianyang Gao, Yutong Gou, Cheng Long, and Christian S Jensen
[33]

iRangeGraph: Improvising Range-dedicated Graphs for Range-filtering Nearest Neighbor Search.Proceedings of the ACM on Management of Data2, 6 (2025), 1–26

2025
[34]

Ming Yang, Yuzheng Cai, and Weiguo Zheng. 2024. CSPG: Crossing Sparse Proximity Graphs for Approximate Nearest Neighbor Search. InNeurIPS 2024

2024
[35]

Ming Yang, Yuzheng Cai, and Weiguo Zheng. 2025. Hi-PNG: Efficient Interval- Filtering ANNS via Hierarchical Interval Partition Navigating Graph. InSIGKDD. 3518–3529

2025
[36]

Shuo Yang, Jiadong Xie, Yingfan Liu, Jeffrey Xu Yu, Xiyue Gao, Qianru Wang, Yanguo Peng, and Jiangtao Cui. 2025. Revisiting the Index Construction of Proximity Graph-Based Approximate Nearest Neighbor Search.PVLDB18, 6 (2025), 1825–1838

2025
[37]

Song Yu, Shengyuan Lin, Shufeng Gong, Yongqing Xie, Ruicheng Liu, Yijie Zhou, Ji Sun, Yanfeng Zhang, Guoliang Li, and Ge Yu. 2025. A Topology-Aware Localized Update Strategy for Graph-Based ANN Index.CoRRabs/2503.00402 (2025)

work page arXiv 2025
[38]

Qianxi Zhang, Shuotao Xu, Qi Chen, Guoxin Sui, Jiadong Xie, Zhizhen Cai, Yaoqi Chen, Yinxuan He, Yuqing Yang, Fan Yang, Mao Yang, and Lidong Zhou. 2023. VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity. In17th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2023. USENIX Association, 377–395

2023
[39]

Yifan Zhu, Lu Chen, Yunjun Gao, Ruiyao Ma, Baihua Zheng, and Jingwen Zhao
[40]

HJG: An Effective Hierarchical Joint Graph for ANNS in Multi-Metric Spaces. InICDE. IEEE, 4275–4287
[41]

Chaoji Zuo, Miao Qiao, Wenchao Zhou, Feifei Li, and Dong Deng. 2024. SeRF: Segment Graph for Range-Filtering Approximate Nearest Neighbor Search.Pro- ceedings of the ACM on Management of Data2, 1 (2024), 1 – 26. A Atomic RR Predicates vs Base Relations of Allen’s Interval Algebra Allen’s Interval Algebra defines a total of 13 base relations between two in...

2024
[42]

Hence, the labeled range of inserted edges included 𝑥+ 1and the removed edges excluded 𝑥+ 1, which leads to 𝐺𝑥+1 being identical to the PG T 𝑥+1 𝑙,𝑟 .𝐺

and the edges pruned with labeled with [𝑏, 𝑥] (line 10). Hence, the labeled range of inserted edges included 𝑥+ 1and the removed edges excluded 𝑥+ 1, which leads to 𝐺𝑥+1 being identical to the PG T 𝑥+1 𝑙,𝑟 .𝐺. This completes the induction.□ Proof Sketch of Theorem 4.1:We first discuss the cases of dis- junction between two conditions. 𝑅𝑞 is 1○∨ 2○: Since ...

2026

[1] [1]

Datasets for approximate nearest neighbor search

2010. Datasets for approximate nearest neighbor search. http://corpus-texmex. irisa.fr/

2010

[2] [2]

James F. Allen. 1983. Maintaining Knowledge about Temporal Intervals.Commun. ACM26, 11 (1983), 832–843

1983

[3] [3]

Ilias Azizi, Karima Echihabi, and Themis Palpanas. 2023. Elpis: Graph-Based Similarity Search for Scalable Data Science.Proc. VLDB Endow.16, 6 (2023), 1548–1559

2023

[4] [4]

Ilias Azizi, Karima Echihabi, and Themis Palpanas. 2025. Graph-Based Vector Search: An Experimental Evaluation of the State-of-the-Art.Proc. ACM Manag. Data3, 1 (2025), 43:1–43:31

2025

[5] [5]

Yuzheng Cai, Jiayang Shi, Yizhuo Chen, and Weiguo Zheng. 2024. Navigating La- bels and Vectors: A Unified Approach to Filtered Approximate Nearest Neighbor Search.Proceedings of the ACM on Management of Data2, 6 (2024), 1–27

2024

[6] [6]

Yangshen Deng, Zhengxin You, Long Xiang, Qilong Li, Peiqi Yuan, Zhaoyang Hong, Yitao Zheng, Wanting Li, Runzhong Li, Haotian Liu, Kyriakos Mouratidis, Man Lung Yiu, Huan Li, Qiaomu Shen, Rui Mao, and Bo Tang. 2025. AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference. CoRRabs/2504.10326 (2025)

work page arXiv 2025

[7] [7]

Karan Desai, Gaurav Kaul, Zubin Aysola, and Justin Johnson. 2021. Redcaps: Web-curated image-text data created by the people, for the people.arXiv preprint arXiv:2111.11431(2021)

work page arXiv 2021

[8] [8]

Josh Engels, Ben Landrum, Shangdi Yu, Laxman Dhulipala, and Julian Shun. 2024. Approximate Nearest Neighbor Search with Window Filters. InICML. 12469 – 12490

2024

[9] [9]

Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2019. Fast approximate nearest neighbor search with the navigating spreading-out graph.PVLDB12, 5 (2019), 461–474

2019

[10] [10]

Siddharth Gollapudi, Neel Karia, Varun Sivashankar, Ravishankar Krishnaswamy, Nikit Begwani, Swapnil Raz, Yiyong Lin, Yin Zhang, Neelam Mahapatro, Premku- mar Srinivasan, et al. 2023. Filtered-diskann: Graph algorithms for approximate nearest neighbor search with filters. InProceedings of the ACM Web Conference

2023

[11] [11]

Mengxu Jiang, Zhi Yang, Fangyuan Zhang, Guanhao Hou, Jieming Shi, Wenchao Zhou, Feifei Li, and Sibo Wang. 2025. DIGRA: A Dynamic Graph Indexing for Approximate Nearest Neighbor Search with Range Filter.Proc. ACM Manag. Data 3, 3 (2025), 148:1–148:26

2025

[12] [12]

Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Mingjie Li, Wenjie Zhang, and Xuemin Lin. 2019. Approximate nearest neighbor search on high dimensional data – experiments, analyses, and improvement.IEEE TKDE32, 8 (2019), 1475– 1488

2019

[13] [13]

Anqi Liang, Pengcheng Zhang, Bin Yao, Zhongpu Chen, Yitong Song, and Guangxu Cheng. 2025. UNIFY: Unified Index for Range Filtered Approximate Nearest Neighbors Search.Proc. VLDB Endow.18, 4 (May 2025), 1118–1130

2025

[14] [14]

Shige Liu, Zhifang Zeng, Li Chen, Adil Ainihaer, Arun Ramasami, Songting Chen, Yu Xu, Mingxi Wu, and Jianguo Wang. 2025. TigerVector: Supporting Vector Search in Graph Databases for Advanced RAGs.CoRRabs/2501.11216 (2025)

work page arXiv 2025

[15] [15]

Yury Malkov and Dmitry Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE TPAMI42, 4 (2018), 824–836

2018

[16] [16]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. NeurIPS26 (2013)

2013

[17] [17]

Nasser M Nasrabadi and Robert A King. 1988. Image coding using vector quanti- zation: A review.IEEE Transactions on communications36, 8 (1988), 957–971

1988

[18] [18]

James Jie Pan, Jianguo Wang, and Guoliang Li. 2024. Survey of vector database management systems.VLDB J.33, 5 (2024), 1591–1615

2024

[19] [19]

Liana Patel, Peter Kraft, Carlos Guestrin, and Matei Zaharia. 2024. ACORN: Per- formant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data.Proceedings of the ACM on Management of Data2, 3 (2024), 1 – 27

2024

[20] [20]

Yun Peng, Byron Choi, Tsz Nam Chan, Jianye Yang, and Jianliang Xu. 2023. Efficient Approximate Nearest Neighbor Search in Multi-dimensional Databases. Proc. ACM Manag. Data1, 1 (2023), 54:1–54:27

2023

[21] [21]

Hanan Samet. 1984. The quadtree and related hierarchical data structures.Com- put. Surveys16, 2 (1984), 187–260

1984

[22] [22]

Aditi Singh, Suhas Jayaram Subramanya, Ravishankar Krishnaswamy, and Har- sha Vardhan Simhadri. 2021. FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search.CoRRabs/2105.09613 (2021)

work page arXiv 2021

[23] [23]

Krishna Srinivasan, Karthik Raman, Jiecao Chen, Michael Bendersky, and Marc Najork. 2021. Wit: Wikipedia-based image text dataset for multimodal multi- lingual machine learning. InProceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 2443–2449

2021

[24] [24]

Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xi- angyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, Kun Yu, Yuxing Yuan, Yinghao Zou, Jiquan Long, Yudong Cai, Zhenxiang Li, Zhifeng Zhang, Yihua Mo, Jun Gu, Ruiyi Jiang, Yi Wei, and Charles Xie. 2021. Milvus: A Purpose-Built Vector Data Management System. InSIGMOD ’21: Internatio...

2021

[25] [25]

Mengzhao Wang, Lingwei Lv, Xiaoliang Xu, Yuxiang Wang, Qiang Yue, and Jiongkang Ni. 2023. An efficient and robust framework for approximate near- est neighbor search with attribute constraint.Advances in Neural Information Processing Systems36 (2023), 15738–15751

2023

[26] [26]

Mengzhao Wang, Xiaoliang Xu, Qiang Yue, and Yuxiang Wang. 2021. A Com- prehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search.PVLDB14, 11 (2021), 1964–1978

2021

[27] [27]

Yuxiang Wang, Ziyuan He, Yongxin Tong, Zimu Zhou, and Yiman Zhong. 2025. Timestamp Approximate Nearest Neighbor Search over High-Dimensional Vector Data. InICDE. IEEE Computer Society, 3043–3055

2025

[28] [28]

Chuangxian Wei, Bin Wu, Sheng Wang, Renjie Lou, Chaoqun Zhan, Feifei Li, and Yuanzhe Cai. 2020. AnalyticDB-V: A Hybrid Analytical Engine Towards Query Fusion for Structured and Unstructured Data.PVLDB13, 12 (2020), 3152–3165

2020

[29] [29]

Jiadong Xie, Jeffrey Xu Yu, and Yingfan Liu. 2025. Fast Approximate Similarity Join in Vector Databases.Proc. ACM Manag. Data3, 3 (2025), 158:1–158:26

2025

[30] [30]

Jiadong Xie, Jeffrey Xu Yu, and Yingfan Liu. 2025. Graph Based K-Nearest Neighbor Search Revisited.ACM Trans. Database Syst.(May 2025)

2025

[31] [31]

Jiadong Xie, Jeffrey Xu Yu, Siyi Teng, and Yingfan Liu. 2025. Beyond Vector Search: Querying With and Without Predicates.Proc. ACM Manag. Data3, 6 (2025), 1–26

2025

[32] [32]

Yuexuan Xu, Jianyang Gao, Yutong Gou, Cheng Long, and Christian S Jensen

[33] [33]

iRangeGraph: Improvising Range-dedicated Graphs for Range-filtering Nearest Neighbor Search.Proceedings of the ACM on Management of Data2, 6 (2025), 1–26

2025

[34] [34]

Ming Yang, Yuzheng Cai, and Weiguo Zheng. 2024. CSPG: Crossing Sparse Proximity Graphs for Approximate Nearest Neighbor Search. InNeurIPS 2024

2024

[35] [35]

Ming Yang, Yuzheng Cai, and Weiguo Zheng. 2025. Hi-PNG: Efficient Interval- Filtering ANNS via Hierarchical Interval Partition Navigating Graph. InSIGKDD. 3518–3529

2025

[36] [36]

Shuo Yang, Jiadong Xie, Yingfan Liu, Jeffrey Xu Yu, Xiyue Gao, Qianru Wang, Yanguo Peng, and Jiangtao Cui. 2025. Revisiting the Index Construction of Proximity Graph-Based Approximate Nearest Neighbor Search.PVLDB18, 6 (2025), 1825–1838

2025

[37] [37]

Song Yu, Shengyuan Lin, Shufeng Gong, Yongqing Xie, Ruicheng Liu, Yijie Zhou, Ji Sun, Yanfeng Zhang, Guoliang Li, and Ge Yu. 2025. A Topology-Aware Localized Update Strategy for Graph-Based ANN Index.CoRRabs/2503.00402 (2025)

work page arXiv 2025

[38] [38]

Qianxi Zhang, Shuotao Xu, Qi Chen, Guoxin Sui, Jiadong Xie, Zhizhen Cai, Yaoqi Chen, Yinxuan He, Yuqing Yang, Fan Yang, Mao Yang, and Lidong Zhou. 2023. VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity. In17th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2023. USENIX Association, 377–395

2023

[39] [39]

Yifan Zhu, Lu Chen, Yunjun Gao, Ruiyao Ma, Baihua Zheng, and Jingwen Zhao

[40] [40]

HJG: An Effective Hierarchical Joint Graph for ANNS in Multi-Metric Spaces. InICDE. IEEE, 4275–4287

[41] [41]

Chaoji Zuo, Miao Qiao, Wenchao Zhou, Feifei Li, and Dong Deng. 2024. SeRF: Segment Graph for Range-Filtering Approximate Nearest Neighbor Search.Pro- ceedings of the ACM on Management of Data2, 1 (2024), 1 – 26. A Atomic RR Predicates vs Base Relations of Allen’s Interval Algebra Allen’s Interval Algebra defines a total of 13 base relations between two in...

2024

[42] [42]

Hence, the labeled range of inserted edges included 𝑥+ 1and the removed edges excluded 𝑥+ 1, which leads to 𝐺𝑥+1 being identical to the PG T 𝑥+1 𝑙,𝑟 .𝐺

and the edges pruned with labeled with [𝑏, 𝑥] (line 10). Hence, the labeled range of inserted edges included 𝑥+ 1and the removed edges excluded 𝑥+ 1, which leads to 𝐺𝑥+1 being identical to the PG T 𝑥+1 𝑙,𝑟 .𝐺. This completes the induction.□ Proof Sketch of Theorem 4.1:We first discuss the cases of dis- junction between two conditions. 𝑅𝑞 is 1○∨ 2○: Since ...

2026