To GPU or Not to GPU: Vector Search in Relational Engines

Bowen Wu; Gustavo Alonso; Joel Andr\'e; Marko Kabi\'c; Vasilis Mageirakos; Yannis Chronis

arxiv: 2605.15957 · v1 · pith:FCAPCZUOnew · submitted 2026-05-15 · 💻 cs.DB

To GPU or Not to GPU: Vector Search in Relational Engines

Vasilis Mageirakos , Joel Andr\'e , Marko Kabi\'c , Bowen Wu , Yannis Chronis , Gustavo Alonso This is my paper

Pith reviewed 2026-05-19 19:01 UTC · model grok-4.3

classification 💻 cs.DB

keywords vector searchGPU accelerationrelational databasesTPC-H benchmarkSQL queriesembeddingsindex optimizationdatabase engines

0 comments

The pith

An alternative organization of vector indexes and embeddings lets GPUs accelerate both relational queries and vector search in database engines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether vector search should move to GPUs inside relational database engines, since GPUs dominate AI workloads but databases remain CPU-based. It extends the TPC-H benchmark with vector data from text and images, creates representative SQL-plus-vector queries, and builds a modular execution engine that can dispatch work to CPU or GPU. Experiments across memory locations, index types, GPUs, and interconnects show that relational operations gain more from the GPU than vector search does, and that moving full indexes and embeddings to the GPU is often slower. By reorganizing the vector index and embeddings to shrink their footprint, the design reverses this outcome so that both relational and vector-search components run faster on the GPU, especially over fast interconnects such as NVLink.

Core claim

With an alternative organization of vector index and embeddings that reduces index size, both the relational and vector search components are faster on the GPU, particularly on fast interconnects, in contrast with the architecture used in existing engines.

What carries the argument

Alternative organization of vector index and embeddings that reduces the size of the index, allowing GPU execution of SQL+VS queries without the data-movement penalty of conventional designs.

If this is right

Relational components of SQL+VS queries benefit more from GPU execution than the vector-search component itself.
Moving existing vector indexes and embeddings to the GPU is not the best option even with fast interconnects.
Reducing index size through reorganization makes GPU-based vector search competitive with CPU versions.
Both relational and vector-search parts become faster on GPU than on CPU when the smaller index is used, especially over fast interconnects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Database architects may need to treat vector indexes as first-class GPU-resident structures rather than CPU-first objects that are occasionally copied.
The same reorganization technique could be tested on other vector-search workloads outside TPC-H to check whether the size reduction generalizes.
Future engines might expose the choice of index layout as a tunable parameter so users can trade index size for GPU acceleration.

Load-bearing premise

The modular execution engine accurately models the overheads and integration costs that would appear in a production relational database engine when adding GPU vector search support.

What would settle it

A production implementation of the optimized index inside an actual database engine that still shows higher end-to-end latency on GPU than on CPU even with NVLink.

Figures

Figures reproduced from arXiv: 2605.15957 by Bowen Wu, Gustavo Alonso, Joel Andr\'e, Marko Kabi\'c, Vasilis Mageirakos, Yannis Chronis.

**Figure 3.** Figure 3: Vector search operators in MaxVec. Exhaustive (left) [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Vec-H per-query runtime with owning indexes ( [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Share of total 𝑐𝑝𝑢 to 𝑔𝑝𝑢 wall-time savings attributable to relational operators. benefit when the data is pre-resident on GPU. In [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Vec-H per-query runtime under hybrid execution (VS on CPU, Rel on GPU). The [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Vec-H per-query runtime on GH200-NVLink under the four optimized execution strategies ( [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: Vector search operator runtime on the reviews [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: Per-query Vec-H runtime on DGX-Spark and GH200 for the [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

read the original abstract

Vector search (VS) is now available in most database engines. However, while vector search is a common feature in AI/ML/LLMs where the dominant computing platforms are GPUs, existing database engines operate on CPUs even when implementing vector search. This raises the question of whether integrating vector processing on GPUs as part of the engine would be a better design. In this paper, we explore this question in detail. First, we extend the TPC-H benchmark with vector data (from text and images) and propose a number of representative SQL+VS queries. Second, we develop a modular execution engine that can run SQL+VS queries across CPU and GPU. Third, we perform extensive experiments on a number of deployments: running the SQL+VS queries across CPU and/or GPU, with data residing in CPU or GPU memory, with existing indices and novel, optimized versions, as well as across different GPUs and interconnects (PCIe, NVLink). The results provide actionable and counter-intuitive insights on how to run such queries over CPUs and GPUs. For instance, the relational components benefit much more from running on the GPU than the vector search part. In addition, when the vector search involves moving data and indexes, using the GPU is not the best option, even with fast interconnects. Thus, we develop an alternative organization of vector index and embeddings that reduces the size of the index, making GPU-based vector search more competitive. With these improvements, the final result is that both the relational and vector search components are faster on the GPU, particularly on fast interconnects, in contrast with the architecture used in existing engines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper investigates whether GPU-based vector search should be integrated into relational database engines, which currently rely on CPUs. It extends TPC-H with vector data from text and images, defines representative SQL+VS queries, builds a modular execution engine supporting CPU/GPU execution with varying data placements and interconnects (PCIe, NVLink), and evaluates existing and novel index organizations. The central claim is that an alternative vector index/embedding layout reducing index size makes both relational and vector-search components faster on GPU than CPU, especially on fast interconnects, in contrast to architectures in existing engines.

Significance. If the results hold, the work provides actionable guidance for hybrid SQL+vector workloads in AI/ML contexts by quantifying when GPU acceleration benefits relational components more than vector search itself and by demonstrating a size-reduced index organization that improves GPU competitiveness. Strengths include the broad experimental matrix across hardware, data locations, and index variants, plus direct measurements rather than model-derived claims.

major comments (2)

[Modular execution engine description and experimental setup] The central performance claims rest on a custom modular execution engine whose fidelity to production relational engine costs is not demonstrated. Query optimizer extensions, cost-model integration, buffer-pool interactions, transaction/concurrency semantics, and data-movement consistency checks are omitted; if these costs are material, the reported GPU advantages with the reduced-size index may not hold in a real deployment such as PostgreSQL.
[Results and index organization sections] The paper should quantify the index-size reduction achieved by the alternative organization and show its effect on data-movement volume and query plans; without these measurements it is difficult to isolate whether the reported speedups are due to the new layout or to other experimental factors.

minor comments (2)

[Benchmark and query definitions] Clarify the exact set of SQL+VS queries used and whether they are representative of production vector workloads beyond TPC-H extensions.
[Experimental results] Add statistical significance tests or confidence intervals for the reported performance differences across configurations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's comments. We address each major comment below with explanations and indicate where revisions will be made to improve the manuscript.

read point-by-point responses

Referee: The central performance claims rest on a custom modular execution engine whose fidelity to production relational engine costs is not demonstrated. Query optimizer extensions, cost-model integration, buffer-pool interactions, transaction/concurrency semantics, and data-movement consistency checks are omitted; if these costs are material, the reported GPU advantages with the reduced-size index may not hold in a real deployment such as PostgreSQL.

Authors: We thank the referee for this observation. Our modular execution engine is a research prototype constructed specifically to isolate and directly measure the execution costs of relational operators and vector search on CPU versus GPU across controlled data placements and interconnects. This design choice enables precise attribution of performance differences to hardware and layout factors without the overheads of a full production stack. We acknowledge that a complete integration into a system such as PostgreSQL would introduce additional costs from query optimization, buffer-pool management, concurrency control, and consistency mechanisms that are outside the current scope. In the revised manuscript we will expand the experimental-setup section to explicitly discuss these limitations and their possible influence on generalizability, thereby clarifying the boundaries of our claims while retaining the value of the measured trade-offs. revision: partial
Referee: The paper should quantify the index-size reduction achieved by the alternative organization and show its effect on data-movement volume and query plans; without these measurements it is difficult to isolate whether the reported speedups are due to the new layout or to other experimental factors.

Authors: We agree that explicit quantification of the index-size reduction is needed to strengthen attribution of the observed speedups. The alternative organization reduces index size by co-locating compact embeddings with a pruned index structure, which directly lowers the volume of data transferred over the interconnect. In the revised version we will report concrete index sizes (in absolute terms and as percentage reduction) for both the baseline and proposed organizations, present measured data-movement volumes for representative queries, and describe how the smaller footprint alters execution plans within our modular engine. These additions will make it clearer that the performance gains stem from the reduced data movement enabled by the new layout. revision: yes

Circularity Check

0 steps flagged

No circularity: results are direct experimental measurements

full rationale

The paper conducts an empirical study: it extends TPC-H with vector data, builds a modular execution engine, and reports measured runtimes for SQL+VS queries across CPU/GPU, memory placements, indices, and interconnects. The central claim (alternative index/embedding organization improves GPU performance) follows from these measurements rather than any equation, fitted parameter, or self-citation that reduces the outcome to its own inputs by construction. No load-bearing derivation step collapses to a prior result or definition; the work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the custom modular engine faithfully represents production database behavior and that the chosen TPC-H vector extensions are representative of real workloads. No free parameters are fitted in the reported results; the work is purely empirical.

axioms (1)

domain assumption The modular execution engine accurately captures the integration and data-movement costs of a full relational database engine.
Invoked when interpreting all CPU/GPU performance differences as representative of what a production system would experience.

pith-pipeline@v0.9.0 · 5839 in / 1290 out tokens · 34630 ms · 2026-05-19T19:01:07.638223+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We develop a modular execution engine that can run SQL+VS queries across CPU and GPU... alternative organization of vector index and embeddings that reduces the size of the index
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Key Insight.With current data-owning vector indexes, executing vector search on a GPU does not pay off, even with fast interconnects

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · 3 internal anchors

[1]

DuckDB Vector Similarity Search (VSS) Extension

2024. DuckDB Vector Similarity Search (VSS) Extension. https://github.com/ duckdb/duckdb-vss. Accessed: 2026-05-15

work page 2024
[2]

Apache Software Foundation. 2026. Apache Arrow: A Cross-Language Devel- opment Platform for In-Memory Data. https://arrow.apache.org/. Accessed: 12 2026-04-29

work page 2026
[3]

Felipe Aramburú, William Malpica, Kaouther Abrougui, Amin Aramoon, Ro- mulo Auccapuclla, Claude Brisson, Matthijs Brobbel, Colby Farrell, Pradeep Garigipati, Joost Hoozemans, et al. 2025. Theseus: A Distributed and Scalable GPU-Accelerated Query Processing Platform Optimized for Efficient Data Move- ment.arXiv preprint arXiv:2508.05029(2025)

work page arXiv 2025
[4]

Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2020. ANN- Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems87 (2020), 101374. https://doi.org/10.1016/j.is.2019.02.006

work page doi:10.1016/j.is.2019.02.006 2020
[5]

David Boehme, Todd Gamblin, David Beckingsale, Peer-Timo Bremer, Alfredo Gimenez, Matthew LeGendre, Olga Pearce, and Martin Schulz. 2016. Caliper: performance introspection for HPC software stacks. InProceedings of the Inter- national Conference for High Performance Computing, Networking, Storage and Analysis(Salt Lake City, Utah)(SC ’16). IEEE Press, Art...

work page 2016
[6]

Cheng Chen, Chenzhe Jin, Yunan Zhang, Sasha Podolsky, Chun Wu, Szu- Po Wang, Eric Hanson, Zhou Sun, Robert Walzer, and Jianguo Wang. 2024. SingleStore-V: An Integrated Vector Database System in SingleStore.Proc. VLDB Endow.17, 12 (Aug. 2024), 3772–3785. https://doi.org/10.14778/3685800.3685805

work page doi:10.14778/3685800.3685805 2024
[7]

Yannis Chronis, Helena Caminal, Yannis Papakonstantinou, Fatma Özcan, and Anastasia Ailamaki. 2025. Filtered Vector Search: State-of-the-Art and Research Opportunities.Proc. VLDB Endow.18, 12 (Aug. 2025), 5488–5492. https://doi. org/10.14778/3750601.3750700

work page doi:10.14778/3750601.3750700 2025
[8]

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2026. The Faiss Library.IEEE Transactions on Big Data12, 2 (2026), 346–361. https: //doi.org/10.1109/TBDATA.2025.3618474

work page doi:10.1109/tbdata.2025.3618474 2026
[9]

Luigi Fusco, Mikhail Khalilov, Marcin Chrapek, Giridhar Chukkapalli, Thomas Schulthess, and Torsten Hoefler. 2024. Understanding Data Movement in Tightly Coupled Heterogeneous Systems: A Case Study with the Grace Hopper Superchip. arXiv preprint arXiv:2408.11556(2024)

work page arXiv 2024
[10]

Google Cloud. 2025. ScaNN for AlloyDB. https://services.google.com/fh/files/ misc/scann_for_alloydb_whitepaper.pdf. Accessed: 2026-05-15

work page 2025
[11]

Mark Harris. 2012. How to Optimize Data Transfers in CUDA C/C++. NVIDIA Technical Blog. https://developer.nvidia.com/blog/how-optimize-data-transfers- cuda-cc/ Accessed: 2026-04-30

work page 2012
[12]

Yupeng Hou, Jiacheng Li, Zhankui He, An Yan, Xiusi Chen, and Julian McAuley

work page
[13]

Bridging language and items for retrieval and recommendation.arXiv preprint arXiv:2403.03952(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2021. Billion-Scale Similarity Search with GPUs.IEEE Transactions on Big Data7, 3 (2021), 535–547. https: //doi.org/10.1109/TBDATA.2019.2921572

work page doi:10.1109/tbdata.2019.2921572 2021
[15]

Marko Kabić, Shriram Chandran, and Gustavo Alonso. 2025. Maximus: A Modular Accelerated Query Engine for Data Analytics on Heterogeneous Systems.Proc. ACM Manag. Data3, 3, Article 187 (June 2025), 25 pages. https://doi.org/10.1145/ 3725324

work page 2025
[16]

Marko Kabić, Bowen Wu, Jonas Dann, and Gustavo Alonso. 2025. Powerful GPUs or Fast Interconnects: Analyzing Relational Workloads on Modern GPUs.Proc. VLDB Endow.18, 11 (July 2025), 4350–4363. https://doi.org/10.14778/3749646. 3749698

work page doi:10.14778/3749646 2025
[17]

Andrew Kane et al. 2025. pgvector: Open-Source Vector Similarity Search for Postgres. https://github.com/pgvector/pgvector. Accessed: 2026-05-15

work page 2025
[18]

Guoxin Kang, Zhongxin Ge, Jingpei Hu, Xueya Zhang, Lei Wang, and Jianfeng Zhan. 2025. BigVectorBench: Heterogeneous Data Embedding and Compound Queries are Essential in Evaluating Vector Databases.Proc. VLDB Endow.18, 5 (Jan. 2025), 1536–1550. https://doi.org/10.14778/3718057.3718078

work page doi:10.14778/3718057.3718078 2025
[19]

Hyunjoon Kim, Chaerim Lim, Hyeonjun An, Rathijit Sen, and Kwanghyun Park

work page
[20]

Exqutor: Extended Query Optimizer for Vector-augmented Analytical Queries.arXiv preprint arXiv:2512.09695(2025)

work page arXiv 2025
[21]

Jiale Lao, Andreas Zimmerer, Olga Ovcharenko, Tianji Cong, Matthew Russo, Gerardo Vitagliano, Michael Cochez, Fatma Özcan, Gautam Gupta, Thibaud Hottelier, et al. 2025. SemBench: A Benchmark for Semantic Query Processing Engines.arXiv preprint arXiv:2511.01716(2025)

work page arXiv 2025
[22]

Yaowen Liu, Xuejia Chen, Anxin Tian, Haoyang Li, Qinbin Li, Xin Zhang, Alexan- der Zhou, Chen Jason Zhang, Qing Li, and Lei Chen. 2026. GPU-Accelerated Algorithms for Graph Vector Search: Taxonomy, Empirical Study, and Research Directions.arXiv preprint arXiv:2602.16719(2026)

work page arXiv 2026
[23]

Clemens Lutz, Sebastian Breß, Steffen Zeuch, Tilmann Rabl, and Volker Markl

work page
[24]

InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data(Portland, OR, USA)(SIGMOD ’20)

Pump Up the Volume: Processing Large Data on GPUs with Fast Inter- connects. InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data(Portland, OR, USA)(SIGMOD ’20). Association for Comput- ing Machinery, New York, NY, USA, 1633–1649. https://doi.org/10.1145/3318464. 3389705

work page doi:10.1145/3318464 2020
[25]

Vasilis Mageirakos, Bowen Wu, and Gustavo Alonso. 2025. Cracking Vector Search Indexes.Proc. VLDB Endow.18, 11 (July 2025), 3951–3964. https://doi. org/10.14778/3749646.3749666

work page doi:10.14778/3749646.3749666 2025
[26]

Malkov and D

Yu A. Malkov and D. A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence42, 4 (2020), 824–

work page 2020
[27]

https://doi.org/10.1109/TPAMI.2018.2889473

work page doi:10.1109/tpami.2018.2889473 2018
[28]

Meta AI Research. 2025. FAISS v1.13.0, gpu/impl/IndexUtils.cu: getMaxKSelection. https://github.com/facebookresearch/faiss/blob/v1.13.0/ faiss/gpu/impl/IndexUtils.cu. Accessed 2026-04-13

work page 2025
[29]

Chenghao Mo, Ben Karsin, Philip Adams, and Minjia Zhang. 2026. VecFlow- Chamfer: A GPU-based Data Management System for High-Performance Multi- Vector Search on Superchips.Proc. ACM Manag. Data4, 1, Article 92 (April 2026), 26 pages. https://doi.org/10.1145/3786706

work page doi:10.1145/3786706 2026
[30]

Hubert Mohr-Daurat, Xuan Sun, and Holger Pirk. 2023. BOSS - An Architecture for Database Kernel Composition.Proc. VLDB Endow.17, 4 (Dec. 2023), 877–890. https://doi.org/10.14778/3636218.3636239

work page doi:10.14778/3636218.3636239 2023
[31]

NVIDIA. 2026. CUDA C++ Programming Guide: Full Unified Memory with Hardware Coherency. https://docs.nvidia.com/cuda/cuda-programming- guide/02-basics/understanding-memory.html#full-unified-memory-with- hardware-coherency. Accessed 2026-04-29

work page 2026
[32]

NVIDIA Corporation. 2023. Matrix Multiplication Background User’s Guide. https://docs.nvidia.com/deeplearning/performance/dl-performance- matrix-multiplication/. NVIDIA Deep Learning Performance Documentation. Accessed: 2026-04-29

work page 2023
[33]

NVIDIA Corporation. 2024. NVIDIA Grace Hopper Superchip. https://www. nvidia.com/en-us/data-center/grace-hopper-superchip/. Accessed: 2026-04-29

work page 2024
[34]

NVIDIA Corporation. 2025. NVIDIA DGX Spark Datasheet. https: //nvdam.widen.net/s/tlzm8smqjx/workstation-datasheet-dgx-spark-gtc25- spring-nvidia-us-3716899-web. GTC 2025 Spring. Accessed: 2026-05-01

work page 2025
[35]

NVIDIA Corporation. 2026. NVIDIA Nsight Systems. https://developer.nvidia. com/nsight-systems. Accessed: 2026-04-29

work page 2026
[36]

Hiroyuki Ootomo, Akira Naruse, Corey Nolet, Ray Wang, Tamas Feher, and Yong Wang. 2024. CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs. In2024 IEEE 40th International Conference on Data Engineering (ICDE). 4236–4247. https://doi.org/10.1109/ICDE60146.2024. 00323

work page doi:10.1109/icde60146.2024 2024
[37]

Oracle Corporation. 2025. Oracle AI Vector Search User’s Guide. https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/ai- vector-search-users-guide.pdf. Accessed: 2026-05-15

work page 2025
[38]

Pinecone. 2025. Pinecone: The Vector Database for AI Search and Retrieval. https://www.pinecone.io/. Accessed: 2026-05-15

work page 2025
[39]

Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: an Embeddable Analytical Database. InProceedings of the 2019 International Conference on Management of Data(Amsterdam, Netherlands)(SIGMOD ’19). Association for Computing Machinery, New York, NY, USA, 1981–1984. https://doi.org/10.1145/3299869. 3320212

work page doi:10.1145/3299869 2019
[40]

RAPIDS Development Team. 2026. cuDF: A GPU DataFrame Library. https: //github.com/rapidsai/cudf. NVIDIA RAPIDS

work page 2026
[41]

RAPIDS Development Team. 2026. cuVS: Vector Search and Clustering on the GPU. https://github.com/rapidsai/cuvs. NVIDIA RAPIDS

work page 2026
[42]

RAPIDS Development Team. 2026. RMM: RAPIDS Memory Manager. https: //github.com/rapidsai/rmm. NVIDIA RAPIDS

work page 2026
[43]

Silva, Walid G

Yasin N. Silva, Walid G. Aref, and Mohamed H. Ali. 2010. The similarity join database operator. In2010 IEEE 26th International Conference on Data Engineering (ICDE 2010). 892–903. https://doi.org/10.1109/ICDE.2010.5447873

work page doi:10.1109/icde.2010.5447873 2010
[44]

Josef Sivic and Andrew Zisserman. 2003. Video Google: A Text Retrieval Ap- proach to Object Matching in Videos. InProceedings of the Ninth IEEE Inter- national Conference on Computer Vision - Volume 2 (ICCV ’03). IEEE Computer Society, USA, 1470

work page 2003
[45]

Michael Stonebraker and Andrew Pavlo. 2024. What Goes Around Comes Around... And Around...SIGMOD Rec.53, 2 (July 2024), 21–37. https://doi.org/ 10.1145/3685980.3685984

work page doi:10.1145/3685980.3685984 2024
[46]

Ji Sun, Guoliang Li, James Pan, Jiang Wang, Yongqing Xie, Ruicheng Liu, and Wen Nie. 2025. GaussDB-Vector: A Large-Scale Persistent Real-Time Vector Database for LLM Applications.Proc. VLDB Endow.18, 12 (Aug. 2025), 4951–4963. https://doi.org/10.14778/3750601.3750619

work page doi:10.14778/3750601.3750619 2025
[47]

2022.TPC Benchmark H (Deci- sion Support) Standard Specification

Transaction Processing Performance Council. 2022.TPC Benchmark H (Deci- sion Support) Standard Specification. Technical Report. Transaction Processing Performance Council (TPC). https://www.tpc.org/TPC_Documents_Current_ Versions/pdf/TPC-H_v3.0.1.pdf Version 3.0.1, Accessed: 2026-05-15

work page 2022
[48]

Transaction Processing Performance Council. 2024. TPC Benchmark DS (TPC- DS) Standard Specification. https://www.tpc.org/tpcds/. Version 4.0.0, Accessed: 2026-04-29

work page 2024
[49]

Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alabdulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, et al. 2025. Siglip 2: Multilingual vision-language encoders with improved semantic understanding, localization, and dense features.arXiv preprint arXiv:2502.14786(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[50]

Nitish Upreti, Harsha Vardhan Simhadri, Hari Sudan Sundar, Krishnan Sundaram, Samer Boshra, Balachandar Perumalswamy, Shivam Atri, Martin Chisholm, Revti Raman Singh, Greg Yang, Tamara Hass, Nitesh Dudhey, Subramanyam Pattipaka, Mark Hildebrand, Magdalen Manohar, Jack Moffitt, Haiyang Xu, Naren 13 Datha, Suryansh Gupta, Ravishankar Krishnaswamy, Prashant ...

work page doi:10.14778/3750601.3750635 2025
[51]

Karthik Venkatasubba, Saim Khan, Somesh Singh, Harsha Vardhan Simhadri, and Jyothi Vedurada. 2025. BANG: Billion-Scale Approximate Nearest Neighbour Search Using a Single GPU.IEEE Transactions on Big Data11, 6 (2025), 3142–3157. https://doi.org/10.1109/TBDATA.2025.3581085

work page doi:10.1109/tbdata.2025.3581085 2025
[52]

Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xiangyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, Kun Yu, Yuxing Yuan, Yinghao Zou, Jiquan Long, Yudong Cai, Zhenxiang Li, Zhifeng Zhang, Yihua Mo, Jun Gu, Ruiyi Jiang, Yi Wei, and Charles Xie. 2021. Milvus: A Purpose-Built Vector Data Management System. InProceedings of the 2021 ...

work page doi:10.1145/3448016.3457550 2021
[53]

Weaviate. 2025. Weaviate Vector Database. https://weaviate.io/. Accessed: 2026-05-15

work page 2025
[54]

Chuangxian Wei, Bin Wu, Sheng Wang, Renjie Lou, Chaoqun Zhan, Feifei Li, and Yuanzhe Cai. 2020. AnalyticDB-V: a hybrid analytical engine towards query fusion for structured and unstructured data.Proc. VLDB Endow.13, 12 (Aug. 2020), 3152–3165. https://doi.org/10.14778/3415478.3415541

work page doi:10.14778/3415478.3415541 2020
[55]

Bowen Wu, Wei Cui, Carlo Curino, Matteo Interlandi, and Rathijit Sen. 2025. Terabyte-Scale Analytics in the Blink of an Eye. arXiv:2506.09226 [cs.DB] https://arxiv.org/abs/2506.09226

work page arXiv 2025
[56]

Bowen Wu, Dimitrios Koutsoukos, and Gustavo Alonso. 2025. Efficiently Pro- cessing Joins and Grouped Aggregations on GPUs.Proc. ACM Manag. Data3, 1, Article 39 (Feb. 2025), 27 pages. https://doi.org/10.1145/3709689

work page doi:10.1145/3709689 2025
[57]

Jingyi Xi, Chenghao Mo, Ben Karsin, Artem Chirkin, Mingqin Li, and Minjia Zhang. 2025. VecFlow: A High-Performance Vector Data Management System for Filtered-Search on GPUs.Proc. ACM Manag. Data3, 4, Article 271 (Sept. 2025), 27 pages. https://doi.org/10.1145/3749189

work page doi:10.1145/3749189 2025
[58]

Jiadong Xie, Jeffrey Xu Yu, and Yingfan Liu. 2025. Fast Approximate Similarity Join in Vector Databases.Proc. ACM Manag. Data3, 3, Article 158 (June 2025), 26 pages. https://doi.org/10.1145/3725403

work page doi:10.1145/3725403 2025
[59]

Bobbi Yogatama, Yifei Yang, Kevin Kristensen, Devesh Sarda, Abigale Kim, Adrian Cockcroft, Yu Teng, Joshua Patterson, Gregory Kimball, Wes McKin- ney, et al. 2025. Rethinking Analytical Processing in the GPU Era.arXiv preprint arXiv:2508.04701(2025)

work page arXiv 2025
[60]

Qianxi Zhang, Shuotao Xu, Qi Chen, Guoxin Sui, Jiadong Xie, Zhizhen Cai, Yaoqi Chen, Yinxuan He, Yuqing Yang, Fan Yang, Mao Yang, and Lidong Zhou

work page
[61]

In17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)

VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity. In17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23). USENIX Association, Boston, MA, 377–395. https://www.usenix.org/conference/osdi23/presentation/zhang-qianxi

work page
[62]

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou

work page
[63]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models.arXiv preprint arXiv:2506.05176(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[64]

Zili Zhang, Fangyue Liu, Gang Huang, Xuanzhe Liu, and Xin Jin. 2024. Fast vector query processing for large datasets beyond GPU memory with reordered pipelin- ing. InProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation(Santa Clara, CA, USA)(NSDI’24). USENIX Association, USA, Article 2, 18 pages

work page 2024
[65]

Jiaxu Zhu, Jiayu Yuan, Kaiwen Yang, Xiaobao Chen, Shihuan Yu, Hongchang Lv, Yan Li, and Bolong Zheng. 2025. An Experimental Evaluation of Hybrid Querying on Vectors.Proc. VLDB Endow.19, 2 (Oct. 2025), 183–195. https: //doi.org/10.14778/3773749.3773757 14

work page doi:10.14778/3773749.3773757 2025

[1] [1]

DuckDB Vector Similarity Search (VSS) Extension

2024. DuckDB Vector Similarity Search (VSS) Extension. https://github.com/ duckdb/duckdb-vss. Accessed: 2026-05-15

work page 2024

[2] [2]

Apache Software Foundation. 2026. Apache Arrow: A Cross-Language Devel- opment Platform for In-Memory Data. https://arrow.apache.org/. Accessed: 12 2026-04-29

work page 2026

[3] [3]

Felipe Aramburú, William Malpica, Kaouther Abrougui, Amin Aramoon, Ro- mulo Auccapuclla, Claude Brisson, Matthijs Brobbel, Colby Farrell, Pradeep Garigipati, Joost Hoozemans, et al. 2025. Theseus: A Distributed and Scalable GPU-Accelerated Query Processing Platform Optimized for Efficient Data Move- ment.arXiv preprint arXiv:2508.05029(2025)

work page arXiv 2025

[4] [4]

Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2020. ANN- Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems87 (2020), 101374. https://doi.org/10.1016/j.is.2019.02.006

work page doi:10.1016/j.is.2019.02.006 2020

[5] [5]

David Boehme, Todd Gamblin, David Beckingsale, Peer-Timo Bremer, Alfredo Gimenez, Matthew LeGendre, Olga Pearce, and Martin Schulz. 2016. Caliper: performance introspection for HPC software stacks. InProceedings of the Inter- national Conference for High Performance Computing, Networking, Storage and Analysis(Salt Lake City, Utah)(SC ’16). IEEE Press, Art...

work page 2016

[6] [6]

Cheng Chen, Chenzhe Jin, Yunan Zhang, Sasha Podolsky, Chun Wu, Szu- Po Wang, Eric Hanson, Zhou Sun, Robert Walzer, and Jianguo Wang. 2024. SingleStore-V: An Integrated Vector Database System in SingleStore.Proc. VLDB Endow.17, 12 (Aug. 2024), 3772–3785. https://doi.org/10.14778/3685800.3685805

work page doi:10.14778/3685800.3685805 2024

[7] [7]

Yannis Chronis, Helena Caminal, Yannis Papakonstantinou, Fatma Özcan, and Anastasia Ailamaki. 2025. Filtered Vector Search: State-of-the-Art and Research Opportunities.Proc. VLDB Endow.18, 12 (Aug. 2025), 5488–5492. https://doi. org/10.14778/3750601.3750700

work page doi:10.14778/3750601.3750700 2025

[8] [8]

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2026. The Faiss Library.IEEE Transactions on Big Data12, 2 (2026), 346–361. https: //doi.org/10.1109/TBDATA.2025.3618474

work page doi:10.1109/tbdata.2025.3618474 2026

[9] [9]

Luigi Fusco, Mikhail Khalilov, Marcin Chrapek, Giridhar Chukkapalli, Thomas Schulthess, and Torsten Hoefler. 2024. Understanding Data Movement in Tightly Coupled Heterogeneous Systems: A Case Study with the Grace Hopper Superchip. arXiv preprint arXiv:2408.11556(2024)

work page arXiv 2024

[10] [10]

Google Cloud. 2025. ScaNN for AlloyDB. https://services.google.com/fh/files/ misc/scann_for_alloydb_whitepaper.pdf. Accessed: 2026-05-15

work page 2025

[11] [11]

Mark Harris. 2012. How to Optimize Data Transfers in CUDA C/C++. NVIDIA Technical Blog. https://developer.nvidia.com/blog/how-optimize-data-transfers- cuda-cc/ Accessed: 2026-04-30

work page 2012

[12] [12]

Yupeng Hou, Jiacheng Li, Zhankui He, An Yan, Xiusi Chen, and Julian McAuley

work page

[13] [13]

Bridging language and items for retrieval and recommendation.arXiv preprint arXiv:2403.03952(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[14] [14]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2021. Billion-Scale Similarity Search with GPUs.IEEE Transactions on Big Data7, 3 (2021), 535–547. https: //doi.org/10.1109/TBDATA.2019.2921572

work page doi:10.1109/tbdata.2019.2921572 2021

[15] [15]

Marko Kabić, Shriram Chandran, and Gustavo Alonso. 2025. Maximus: A Modular Accelerated Query Engine for Data Analytics on Heterogeneous Systems.Proc. ACM Manag. Data3, 3, Article 187 (June 2025), 25 pages. https://doi.org/10.1145/ 3725324

work page 2025

[16] [16]

Marko Kabić, Bowen Wu, Jonas Dann, and Gustavo Alonso. 2025. Powerful GPUs or Fast Interconnects: Analyzing Relational Workloads on Modern GPUs.Proc. VLDB Endow.18, 11 (July 2025), 4350–4363. https://doi.org/10.14778/3749646. 3749698

work page doi:10.14778/3749646 2025

[17] [17]

Andrew Kane et al. 2025. pgvector: Open-Source Vector Similarity Search for Postgres. https://github.com/pgvector/pgvector. Accessed: 2026-05-15

work page 2025

[18] [18]

Guoxin Kang, Zhongxin Ge, Jingpei Hu, Xueya Zhang, Lei Wang, and Jianfeng Zhan. 2025. BigVectorBench: Heterogeneous Data Embedding and Compound Queries are Essential in Evaluating Vector Databases.Proc. VLDB Endow.18, 5 (Jan. 2025), 1536–1550. https://doi.org/10.14778/3718057.3718078

work page doi:10.14778/3718057.3718078 2025

[19] [19]

Hyunjoon Kim, Chaerim Lim, Hyeonjun An, Rathijit Sen, and Kwanghyun Park

work page

[20] [20]

Exqutor: Extended Query Optimizer for Vector-augmented Analytical Queries.arXiv preprint arXiv:2512.09695(2025)

work page arXiv 2025

[21] [21]

Jiale Lao, Andreas Zimmerer, Olga Ovcharenko, Tianji Cong, Matthew Russo, Gerardo Vitagliano, Michael Cochez, Fatma Özcan, Gautam Gupta, Thibaud Hottelier, et al. 2025. SemBench: A Benchmark for Semantic Query Processing Engines.arXiv preprint arXiv:2511.01716(2025)

work page arXiv 2025

[22] [22]

Yaowen Liu, Xuejia Chen, Anxin Tian, Haoyang Li, Qinbin Li, Xin Zhang, Alexan- der Zhou, Chen Jason Zhang, Qing Li, and Lei Chen. 2026. GPU-Accelerated Algorithms for Graph Vector Search: Taxonomy, Empirical Study, and Research Directions.arXiv preprint arXiv:2602.16719(2026)

work page arXiv 2026

[23] [23]

Clemens Lutz, Sebastian Breß, Steffen Zeuch, Tilmann Rabl, and Volker Markl

work page

[24] [24]

InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data(Portland, OR, USA)(SIGMOD ’20)

Pump Up the Volume: Processing Large Data on GPUs with Fast Inter- connects. InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data(Portland, OR, USA)(SIGMOD ’20). Association for Comput- ing Machinery, New York, NY, USA, 1633–1649. https://doi.org/10.1145/3318464. 3389705

work page doi:10.1145/3318464 2020

[25] [25]

Vasilis Mageirakos, Bowen Wu, and Gustavo Alonso. 2025. Cracking Vector Search Indexes.Proc. VLDB Endow.18, 11 (July 2025), 3951–3964. https://doi. org/10.14778/3749646.3749666

work page doi:10.14778/3749646.3749666 2025

[26] [26]

Malkov and D

Yu A. Malkov and D. A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence42, 4 (2020), 824–

work page 2020

[27] [27]

https://doi.org/10.1109/TPAMI.2018.2889473

work page doi:10.1109/tpami.2018.2889473 2018

[28] [28]

Meta AI Research. 2025. FAISS v1.13.0, gpu/impl/IndexUtils.cu: getMaxKSelection. https://github.com/facebookresearch/faiss/blob/v1.13.0/ faiss/gpu/impl/IndexUtils.cu. Accessed 2026-04-13

work page 2025

[29] [29]

Chenghao Mo, Ben Karsin, Philip Adams, and Minjia Zhang. 2026. VecFlow- Chamfer: A GPU-based Data Management System for High-Performance Multi- Vector Search on Superchips.Proc. ACM Manag. Data4, 1, Article 92 (April 2026), 26 pages. https://doi.org/10.1145/3786706

work page doi:10.1145/3786706 2026

[30] [30]

Hubert Mohr-Daurat, Xuan Sun, and Holger Pirk. 2023. BOSS - An Architecture for Database Kernel Composition.Proc. VLDB Endow.17, 4 (Dec. 2023), 877–890. https://doi.org/10.14778/3636218.3636239

work page doi:10.14778/3636218.3636239 2023

[31] [31]

NVIDIA. 2026. CUDA C++ Programming Guide: Full Unified Memory with Hardware Coherency. https://docs.nvidia.com/cuda/cuda-programming- guide/02-basics/understanding-memory.html#full-unified-memory-with- hardware-coherency. Accessed 2026-04-29

work page 2026

[32] [32]

NVIDIA Corporation. 2023. Matrix Multiplication Background User’s Guide. https://docs.nvidia.com/deeplearning/performance/dl-performance- matrix-multiplication/. NVIDIA Deep Learning Performance Documentation. Accessed: 2026-04-29

work page 2023

[33] [33]

NVIDIA Corporation. 2024. NVIDIA Grace Hopper Superchip. https://www. nvidia.com/en-us/data-center/grace-hopper-superchip/. Accessed: 2026-04-29

work page 2024

[34] [34]

NVIDIA Corporation. 2025. NVIDIA DGX Spark Datasheet. https: //nvdam.widen.net/s/tlzm8smqjx/workstation-datasheet-dgx-spark-gtc25- spring-nvidia-us-3716899-web. GTC 2025 Spring. Accessed: 2026-05-01

work page 2025

[35] [35]

NVIDIA Corporation. 2026. NVIDIA Nsight Systems. https://developer.nvidia. com/nsight-systems. Accessed: 2026-04-29

work page 2026

[36] [36]

Hiroyuki Ootomo, Akira Naruse, Corey Nolet, Ray Wang, Tamas Feher, and Yong Wang. 2024. CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs. In2024 IEEE 40th International Conference on Data Engineering (ICDE). 4236–4247. https://doi.org/10.1109/ICDE60146.2024. 00323

work page doi:10.1109/icde60146.2024 2024

[37] [37]

Oracle Corporation. 2025. Oracle AI Vector Search User’s Guide. https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/ai- vector-search-users-guide.pdf. Accessed: 2026-05-15

work page 2025

[38] [38]

Pinecone. 2025. Pinecone: The Vector Database for AI Search and Retrieval. https://www.pinecone.io/. Accessed: 2026-05-15

work page 2025

[39] [39]

Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: an Embeddable Analytical Database. InProceedings of the 2019 International Conference on Management of Data(Amsterdam, Netherlands)(SIGMOD ’19). Association for Computing Machinery, New York, NY, USA, 1981–1984. https://doi.org/10.1145/3299869. 3320212

work page doi:10.1145/3299869 2019

[40] [40]

RAPIDS Development Team. 2026. cuDF: A GPU DataFrame Library. https: //github.com/rapidsai/cudf. NVIDIA RAPIDS

work page 2026

[41] [41]

RAPIDS Development Team. 2026. cuVS: Vector Search and Clustering on the GPU. https://github.com/rapidsai/cuvs. NVIDIA RAPIDS

work page 2026

[42] [42]

RAPIDS Development Team. 2026. RMM: RAPIDS Memory Manager. https: //github.com/rapidsai/rmm. NVIDIA RAPIDS

work page 2026

[43] [43]

Silva, Walid G

Yasin N. Silva, Walid G. Aref, and Mohamed H. Ali. 2010. The similarity join database operator. In2010 IEEE 26th International Conference on Data Engineering (ICDE 2010). 892–903. https://doi.org/10.1109/ICDE.2010.5447873

work page doi:10.1109/icde.2010.5447873 2010

[44] [44]

Josef Sivic and Andrew Zisserman. 2003. Video Google: A Text Retrieval Ap- proach to Object Matching in Videos. InProceedings of the Ninth IEEE Inter- national Conference on Computer Vision - Volume 2 (ICCV ’03). IEEE Computer Society, USA, 1470

work page 2003

[45] [45]

Michael Stonebraker and Andrew Pavlo. 2024. What Goes Around Comes Around... And Around...SIGMOD Rec.53, 2 (July 2024), 21–37. https://doi.org/ 10.1145/3685980.3685984

work page doi:10.1145/3685980.3685984 2024

[46] [46]

Ji Sun, Guoliang Li, James Pan, Jiang Wang, Yongqing Xie, Ruicheng Liu, and Wen Nie. 2025. GaussDB-Vector: A Large-Scale Persistent Real-Time Vector Database for LLM Applications.Proc. VLDB Endow.18, 12 (Aug. 2025), 4951–4963. https://doi.org/10.14778/3750601.3750619

work page doi:10.14778/3750601.3750619 2025

[47] [47]

2022.TPC Benchmark H (Deci- sion Support) Standard Specification

Transaction Processing Performance Council. 2022.TPC Benchmark H (Deci- sion Support) Standard Specification. Technical Report. Transaction Processing Performance Council (TPC). https://www.tpc.org/TPC_Documents_Current_ Versions/pdf/TPC-H_v3.0.1.pdf Version 3.0.1, Accessed: 2026-05-15

work page 2022

[48] [48]

Transaction Processing Performance Council. 2024. TPC Benchmark DS (TPC- DS) Standard Specification. https://www.tpc.org/tpcds/. Version 4.0.0, Accessed: 2026-04-29

work page 2024

[49] [49]

Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alabdulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, et al. 2025. Siglip 2: Multilingual vision-language encoders with improved semantic understanding, localization, and dense features.arXiv preprint arXiv:2502.14786(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[50] [50]

Nitish Upreti, Harsha Vardhan Simhadri, Hari Sudan Sundar, Krishnan Sundaram, Samer Boshra, Balachandar Perumalswamy, Shivam Atri, Martin Chisholm, Revti Raman Singh, Greg Yang, Tamara Hass, Nitesh Dudhey, Subramanyam Pattipaka, Mark Hildebrand, Magdalen Manohar, Jack Moffitt, Haiyang Xu, Naren 13 Datha, Suryansh Gupta, Ravishankar Krishnaswamy, Prashant ...

work page doi:10.14778/3750601.3750635 2025

[51] [51]

Karthik Venkatasubba, Saim Khan, Somesh Singh, Harsha Vardhan Simhadri, and Jyothi Vedurada. 2025. BANG: Billion-Scale Approximate Nearest Neighbour Search Using a Single GPU.IEEE Transactions on Big Data11, 6 (2025), 3142–3157. https://doi.org/10.1109/TBDATA.2025.3581085

work page doi:10.1109/tbdata.2025.3581085 2025

[52] [52]

Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xiangyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, Kun Yu, Yuxing Yuan, Yinghao Zou, Jiquan Long, Yudong Cai, Zhenxiang Li, Zhifeng Zhang, Yihua Mo, Jun Gu, Ruiyi Jiang, Yi Wei, and Charles Xie. 2021. Milvus: A Purpose-Built Vector Data Management System. InProceedings of the 2021 ...

work page doi:10.1145/3448016.3457550 2021

[53] [53]

Weaviate. 2025. Weaviate Vector Database. https://weaviate.io/. Accessed: 2026-05-15

work page 2025

[54] [54]

Chuangxian Wei, Bin Wu, Sheng Wang, Renjie Lou, Chaoqun Zhan, Feifei Li, and Yuanzhe Cai. 2020. AnalyticDB-V: a hybrid analytical engine towards query fusion for structured and unstructured data.Proc. VLDB Endow.13, 12 (Aug. 2020), 3152–3165. https://doi.org/10.14778/3415478.3415541

work page doi:10.14778/3415478.3415541 2020

[55] [55]

Bowen Wu, Wei Cui, Carlo Curino, Matteo Interlandi, and Rathijit Sen. 2025. Terabyte-Scale Analytics in the Blink of an Eye. arXiv:2506.09226 [cs.DB] https://arxiv.org/abs/2506.09226

work page arXiv 2025

[56] [56]

Bowen Wu, Dimitrios Koutsoukos, and Gustavo Alonso. 2025. Efficiently Pro- cessing Joins and Grouped Aggregations on GPUs.Proc. ACM Manag. Data3, 1, Article 39 (Feb. 2025), 27 pages. https://doi.org/10.1145/3709689

work page doi:10.1145/3709689 2025

[57] [57]

Jingyi Xi, Chenghao Mo, Ben Karsin, Artem Chirkin, Mingqin Li, and Minjia Zhang. 2025. VecFlow: A High-Performance Vector Data Management System for Filtered-Search on GPUs.Proc. ACM Manag. Data3, 4, Article 271 (Sept. 2025), 27 pages. https://doi.org/10.1145/3749189

work page doi:10.1145/3749189 2025

[58] [58]

Jiadong Xie, Jeffrey Xu Yu, and Yingfan Liu. 2025. Fast Approximate Similarity Join in Vector Databases.Proc. ACM Manag. Data3, 3, Article 158 (June 2025), 26 pages. https://doi.org/10.1145/3725403

work page doi:10.1145/3725403 2025

[59] [59]

Bobbi Yogatama, Yifei Yang, Kevin Kristensen, Devesh Sarda, Abigale Kim, Adrian Cockcroft, Yu Teng, Joshua Patterson, Gregory Kimball, Wes McKin- ney, et al. 2025. Rethinking Analytical Processing in the GPU Era.arXiv preprint arXiv:2508.04701(2025)

work page arXiv 2025

[60] [60]

Qianxi Zhang, Shuotao Xu, Qi Chen, Guoxin Sui, Jiadong Xie, Zhizhen Cai, Yaoqi Chen, Yinxuan He, Yuqing Yang, Fan Yang, Mao Yang, and Lidong Zhou

work page

[61] [61]

In17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)

VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity. In17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23). USENIX Association, Boston, MA, 377–395. https://www.usenix.org/conference/osdi23/presentation/zhang-qianxi

work page

[62] [62]

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou

work page

[63] [63]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models.arXiv preprint arXiv:2506.05176(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[64] [64]

Zili Zhang, Fangyue Liu, Gang Huang, Xuanzhe Liu, and Xin Jin. 2024. Fast vector query processing for large datasets beyond GPU memory with reordered pipelin- ing. InProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation(Santa Clara, CA, USA)(NSDI’24). USENIX Association, USA, Article 2, 18 pages

work page 2024

[65] [65]

Jiaxu Zhu, Jiayu Yuan, Kaiwen Yang, Xiaobao Chen, Shihuan Yu, Hongchang Lv, Yan Li, and Bolong Zheng. 2025. An Experimental Evaluation of Hybrid Querying on Vectors.Proc. VLDB Endow.19, 2 (Oct. 2025), 183–195. https: //doi.org/10.14778/3773749.3773757 14

work page doi:10.14778/3773749.3773757 2025