pith. sign in

arxiv: 2604.09173 · v2 · pith:EBPXMTFHnew · submitted 2026-04-10 · 💻 cs.DB · cs.OS

Decoupling Vector Data and Index Storage for Space Efficiency

Pith reviewed 2026-05-19 18:19 UTC · model grok-4.3

classification 💻 cs.DB cs.OS
keywords vector searchANNSgraph indexstorage compressiondisk-residentdata decouplingcomponent-aware compressionnearest neighbor search
0
0 comments X

The pith

COMPASS decouples vector data from index metadata to compress each separately, cutting storage by up to 58.7% while keeping search and update performance competitive.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces COMPASS, a storage framework for disk-resident graph approximate nearest neighbor search systems that manage large vector datasets. It separates the raw vector values from the auxiliary index metadata that guides the search. Because these two components have different patterns of redundancy, each can be losslessly compressed with methods matched to its own characteristics. The framework then adjusts the paths for queries and updates so the compression does not introduce unacceptable slowdowns. This matters for practical deployment because co-located storage has hidden the opportunity for space reduction on billion-scale datasets.

Core claim

COMPASS decouples vector data and auxiliary index metadata in disk-resident graph ANNS systems and applies lossless compression to each component according to its distinct compressibility characteristics. The search and update paths are adapted to the resulting compressed layouts. On real-world public and proprietary billion-scale datasets this yields up to 58.7% storage reduction while delivering search and update performance that is improved or competitive with state-of-the-art disk-resident graph ANNS systems.

What carries the argument

Data-index decoupling, which separates vector data from auxiliary index metadata so each can be compressed independently according to its own compressibility traits.

Load-bearing premise

Vector data and index metadata possess sufficiently distinct compressibility characteristics that separating them for independent compression produces net space savings without unacceptable overhead in search or update operations.

What would settle it

A head-to-head test on the same hardware and datasets showing that query latency or update throughput under the decoupled compressed layout exceeds that of the original co-located storage.

Figures

Figures reproduced from arXiv: 2604.09173 by Di Wu, Juncheng Zhang, Patrick P. C. Lee, Rui Yang, Yanjing Ren, Yuanming Ren.

Figure 1
Figure 1. Figure 1: Basic workflow of disk-based ANNS system, using a graph-based auxiliary index as an example. Third, decoupling destroys the sequential I/O locality from the co-location strategy, so retrieving vector data and aux￾iliary index metadata requires separate I/O operations and increases access overhead. Finally, decoupling complicates consistency maintenance between vector data and auxiliary index metadata durin… view at source ↗
Figure 2
Figure 2. Figure 2: Search latency breakdowns of DiskANN and PipeANN, including CPU time and I/O wait time, normalized to the search la￾tency of DiskANN. The number above each bar represents the search latency in microseconds (µs). We use single-threaded execution to eliminate interference from multi-threading and limit variance. graph traversal. Since vertices are fixed-size and page-aligned, DiskANN can directly compute any… view at source ↗
Figure 4
Figure 4. Figure 4: Normalized value dispersion and information entropy of different datasets. eral novel techniques to address the storage and I/O ineffi￾ciencies described in §2.3. 3.1 Design Overview [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Delta compression for multi-dimensional vectors. Segment file Block-level metadata Block Block Block Full-precision vectors Chunk Chunk Segment file Segment files Chunk-level metadata cached in memory First-block offset # blocks Boundary vector IDs Base vector Become immutable when filled Symbol frequency table Metadata file for a single segment Mutable space Chunk-level metadata Chunk-level metadata [PIT… view at source ↗
Figure 6
Figure 6. Figure 6: Hierarchical storage layout for vector data. gorithm [10, 11] to compactly store the sorted neighbor IDs using a two-level representation: each ID is split into lower and higher bits, where the lower bits have the same width across all IDs and record the exact representation, while the higher bits of all IDs are stored in a single bitmap for compact representation. 3.3 Hierarchical Storage Layouts To suppo… view at source ↗
Figure 7
Figure 7. Figure 7: Exp#1 (Space efficiency). For 100M-scale datasets, sizes are normalized to SPANN. For billion-scale datasets, sizes are nor￾malized to DiskANN. The numbers above the bars indicate the absolute storage sizes in GiB. 5.2 Macrobenchmarks 5.2.1 Space Efficiency We compare the storage sizes of DecoupleVS against DiskANN and SPANN. We omit the results for PipeANN since it shares the identical storage layout and … view at source ↗
Figure 8
Figure 8. Figure 8: Exp#2 (Search throughput). We show the throughput (queries per second) against Recall@10 (%); higher is better. The points from left to right in each curve correspond to increasing candidate list sizes, which generally lead to higher recall. SPANN DiskANN PipeANN DecoupleVS 66 76 87 Recall@10 10 0 10 1 10 2 10 3 P99 Latency (ms) 45 72 100 Recall@10 10 0 10 1 10 2 10 3 45 72 100 Recall@10 10 0 10 1 10 2 10 … view at source ↗
Figure 9
Figure 9. Figure 9: Exp#3 (Search latency). We show the P99 latency (ms) against Recall@10 (%); lower is better. best-first search algorithm, which overlaps disk I/Os with distance computations to enhance search performance. Figures 8(d)-8(e) show the search throughput on billion￾scale datasets. Consistent with the 100 M-scale results, De￾coupleVS achieves higher throughput than DiskANN on both SIFT1B and SPACEV1B. However, D… view at source ↗
Figure 10
Figure 10. Figure 10: Exp#4 (Concurrent search and updates). We show the overall update performance on SIFT100M. This modest increase results from three factors. First, Decou￾pleVS caches compression metadata, with minimal overhead (28 MiB on SIFT100M). Second, DecoupleVS requires addi￾tional memory buffers for compression and decompression during updates. Third, since DecoupleVS processes the auxil￾iary index in batches with … view at source ↗
Figure 11
Figure 11. Figure 11: Exp#6 (Update performance breakdown). We show the average computation and disk I/O times per update operation, nor￾malized to DiskANN. The numbers atop bars indicate the absolute time (in seconds). points into the on-disk index. We decompose each compo￾nent into computation and disk I/O times. We present average results over 10 iterations, with error bars showing 95% confi￾dence intervals based on the Stu… view at source ↗
read the original abstract

Managing large-scale vector datasets with disk-resident graph approximate nearest neighbor search (ANNS) systems incurs substantial storage overhead due to the co-location of vector data and auxiliary index metadata, which prevents the storage layer from exploiting their distinct compressibility. We present COMPASS, a component-aware compressed storage framework for disk-resident graph vector search. Leveraging data-index decoupling as a foundation, COMPASS losslessly compresses each component according to its distinct compressibility characteristics, thereby significantly reducing storage space. It further adapts the search and update paths to preserve their performance under compressed storage layouts. Evaluation on real-world public and proprietary billion-scale datasets shows that COMPASS reduces storage space by up to 58.7%, while delivering improved or competitive search and update performance compared to state-of-the-art disk-resident graph ANNS systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces COMPASS, a component-aware compressed storage framework for disk-resident graph ANNS systems. It decouples vector data from auxiliary index metadata to permit independent lossless compression of each component according to its distinct compressibility characteristics, yielding up to 58.7% storage reduction while adapting search and update paths to maintain or improve performance on billion-scale datasets.

Significance. If the performance claims survive a full accounting of decompression costs, the decoupling approach could meaningfully improve storage efficiency for large-scale disk-resident vector search without sacrificing query throughput. The core idea of exploiting differential compressibility between data and metadata is a practical contribution to vector database storage design.

major comments (2)
  1. [§5 Evaluation] §5 Evaluation: the reported search and update latencies are presented only as end-to-end wall-clock times with no breakdown isolating decompression CPU or extra I/O costs introduced by the compressed layouts. Because every vector fetch and neighbor traversal now crosses the decoupled compressed storage, it is impossible to verify whether the 'improved or competitive' result holds once these costs are charged against the baselines.
  2. [Table 2] Table 2 (storage results): the 58.7% reduction figure is given without per-component compression ratios for vectors versus graph metadata or without reporting the raw index sizes before and after decoupling. This makes it difficult to assess how much of the savings is actually attributable to the decoupling rather than to the choice of compressor.
minor comments (2)
  1. [§3.1] §3.1: the notation distinguishing the compressed vector block layout from the compressed metadata block layout is introduced without a small diagram or example, which would help readers follow the subsequent path adaptations.
  2. [Related work] Related work: several recent papers on learned compression for vectors and on separate storage of graph edges are not cited; adding them would better situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and have revised the manuscript to incorporate additional analysis and data as requested.

read point-by-point responses
  1. Referee: [§5 Evaluation] §5 Evaluation: the reported search and update latencies are presented only as end-to-end wall-clock times with no breakdown isolating decompression CPU or extra I/O costs introduced by the compressed layouts. Because every vector fetch and neighbor traversal now crosses the decoupled compressed storage, it is impossible to verify whether the 'improved or competitive' result holds once these costs are charged against the baselines.

    Authors: We agree that an explicit cost breakdown would strengthen the evaluation. In the revised manuscript we have added a new subsection (5.4) that reports per-operation breakdowns of decompression CPU time and any additional I/O for both search and update paths on the billion-scale datasets. The data show that decompression overhead remains below 8% of total latency and is more than offset by the reduction in I/O volume, preserving the reported performance advantage. revision: yes

  2. Referee: Table 2 (storage results): the 58.7% reduction figure is given without per-component compression ratios for vectors versus graph metadata or without reporting the raw index sizes before and after decoupling. This makes it difficult to assess how much of the savings is actually attributable to the decoupling rather than to the choice of compressor.

    Authors: We acknowledge the value of this granularity. Table 2 has been expanded to include (i) raw storage sizes before and after decoupling, (ii) per-component compression ratios for vector data and for graph metadata separately, and (iii) the contribution of each component to the overall 58.7% reduction. The updated table makes clear that the majority of the savings stems from the higher compressibility of the decoupled metadata, which could not be exploited in the original co-located layout. revision: yes

Circularity Check

0 steps flagged

No significant circularity in COMPASS decoupling and compression claims

full rationale

The paper introduces a storage framework that decouples vector data from graph index metadata to allow independent lossless compression based on their distinct characteristics, then adapts search/update paths accordingly. All load-bearing claims (58.7% space reduction, competitive or improved performance) rest on empirical measurements against external state-of-the-art baselines on public and proprietary billion-scale datasets. No equations or steps reduce by construction to fitted parameters, self-citations, or renamed inputs; the derivation chain consists of standard systems engineering choices followed by external evaluation rather than self-referential definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that data and index components have exploitable compressibility differences and that performance can be preserved after layout changes. No free parameters or invented physical entities are described.

axioms (1)
  • domain assumption Vector data and index metadata have distinct compressibility characteristics that can be leveraged independently after decoupling.
    Explicitly stated as the foundation for the compression approach in the abstract.
invented entities (1)
  • COMPASS framework no independent evidence
    purpose: Component-aware compressed storage with adapted search and update paths for disk-resident graph ANNS.
    New system introduced to solve the co-location storage overhead problem.

pith-pipeline@v0.9.0 · 5674 in / 1200 out tokens · 36268 ms · 2026-05-19T18:19:24.413198+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 2 internal anchors

  1. [1]

    Cassandra

    Apache. Cassandra. https://cassandra.apache. org/, 2025

  2. [2]

    Language models are few-shot learners.Proc

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Nee- lakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Proc. of NeurIPS, 2020

  3. [3]

    Zhichao Cao, Siying Dong, Sagar Vemuri, and David H. C. Du. Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook. InProc. of USENIX FAST, 2020

  4. [4]

    Hsieh, Deborah A

    Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. Bigtable: A distributed storage system for structured data. InProc. of USENIX OSDI, 2006

  5. [5]

    Sptag: A li- brary for fast approximate nearest neighbor search

    Qi Chen, Haidong Wang, Mingqin Li, Gang Ren, Scarlett Li, Jeffery Zhu, Jason Li, Chuanjie Liu, Lin- tao Zhang, and Jingdong Wang. Sptag: A li- brary for fast approximate nearest neighbor search. https://github.com/Microsoft/SPTAG, 2018

  6. [6]

    SPANN: Highly-efficient billion-scale approx- imate nearest neighborhood search.Proc

    Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. SPANN: Highly-efficient billion-scale approx- imate nearest neighborhood search.Proc. of NeurIPS, 2021

  7. [7]

    Yann Collet. LZ4. https://github.com/lz4/lz4, 2011

  8. [8]

    Deep neural networks for YouTube recommendations

    Paul Covington, Jay Adams, and Emre Sargin. Deep neural networks for YouTube recommendations. In Proc. of ACM RecSys, 2016

  9. [9]

    Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding

    Jarek Duda. Asymmetric numeral systems: Entropy coding combining speed of Huffman coding with com- pression rate of arithmetic coding.arXiv preprint arXiv:1311.2540, 2013

  10. [10]

    Efficient storage and retrieval by content and address of static files.Journal of the ACM (JACM), 21(2):246--260, 1974

    Peter Elias. Efficient storage and retrieval by content and address of static files.Journal of the ACM (JACM), 21(2):246--260, 1974

  11. [11]

    Massachusetts Institute of Technology, Project MAC, 1971

    Robert Mario Fano.On the number of bits required to implement an associative memory. Massachusetts Institute of Technology, Project MAC, 1971

  12. [12]

    Fast approximate nearest neighbor search with the navi- gating spreading-out graph.Proceedings of the VLDB Endowment, 12(5):461--474, 2019

    Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. Fast approximate nearest neighbor search with the navi- gating spreading-out graph.Proceedings of the VLDB Endowment, 12(5):461--474, 2019

  13. [13]

    RabitQ: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search.Proc

    Jianyang Gao and Cheng Long. RabitQ: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search.Proc. of ACM SIGMOD, 2(3):1--27, 2024

  14. [14]

    Retrieval-Augmented Generation for Large Language Models: A Survey

    Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997, 2023

  15. [15]

    Op- timized product quantization.IEEE Trans

    Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. Op- timized product quantization.IEEE Trans. on Pat- tern Analysis and Machine Intelligence, 36(4):744--755, 2013

  16. [16]

    Succinct

    Giuseppe Ottaviano. Succinct. https://github. com/ot/succinct, 2017

  17. [17]

    Real-time person- alization using embeddings for search ranking at airbnb

    Mihajlo Grbovic and Haibin Cheng. Real-time person- alization using embeddings for search ranking at airbnb. InProc. of the 24th ACM SIGKDD, 2018

  18. [18]

    Achieving low-latency graph- based vector search via aligning best-first search algo- rithm with SSD

    Hao Guo and Youyou Lu. Achieving low-latency graph- based vector search via aligning best-first search algo- rithm with SSD. InProc. of USENIX OSDI, 2025

  19. [19]

    OdinANN: Direct insert for consistently stable performance in billion-scale graph- based vector search

    Hao Guo and Youyou Lu. OdinANN: Direct insert for consistently stable performance in billion-scale graph- based vector search. InProc. of USENIX FAST, 2026

  20. [20]

    Research talk: Approximate nearest neighbor search systems at scale

    Harsha Simhadri. Research talk: Approximate nearest neighbor search systems at scale. https://youtu.be/ BnYNdSIKibQ?t=179, 2022

  21. [21]

    ZipNN: Lossless compression for AI models

    Moshik Hershcovitch, Andrew Wood, Leshem Choshen, Guy Girmonsky, Roy Leibovitz, Or Ozeri, Ilias Enn- mouri, Michal Malka, Peter Chin, Swaminathan Sun- 13 dararaman, et al. ZipNN: Lossless compression for AI models. InProc. of IEEE CLOUD, 2025

  22. [22]

    AI and the future of unstructured data

    IBM. AI and the future of unstructured data. https: //www.ibm.com/think/insights/unstructured- data-trends, 2025

  23. [23]

    AI data centers are swallowing the world’s memory and storage supply, setting the stage for a pricing apocalypse that could last a decade

    Luke James. AI data centers are swallowing the world’s memory and storage supply, setting the stage for a pricing apocalypse that could last a decade. https://www.tomshardware.com/pc- components/storage/perfect-storm-of- demand-and-supply-driving-up-storage- costs?ref=aisecret.us, 2025

  24. [24]

    DiskANN: Fast accurate billion-point near- est neighbor search on a single node.Proc

    Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vard- han Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. DiskANN: Fast accurate billion-point near- est neighbor search on a single node.Proc. of NeurIPS, 2019

  25. [25]

    Product quantization for nearest neighbor search.IEEE Trans

    Herve Jegou, Matthijs Douze, and Cordelia Schmid. Product quantization for nearest neighbor search.IEEE Trans. on Pattern Analysis and Machine Intelligence, 33(1):117--128, 2010

  26. [26]

    Searching in one billion vectors: re- rank with source coding

    Herv´e J ´egou, Romain Tavenard, Matthijs Douze, and Laurent Amsaleg. Searching in one billion vectors: re- rank with source coding. InProc. of IEEE ICASSP, 2011

  27. [27]

    Introducing the Mil- vus Sizing Tool: Calculating and Optimizing Your Milvus Deployment Resources

    Ken Zhang, Fendy Feng. Introducing the Mil- vus Sizing Tool: Calculating and Optimizing Your Milvus Deployment Resources . https: //milvus.io/blog/introducing-the-milvus- sizing-tool-calculating-and-optimizing- your-milvus-deployment-resources.md, 2025

  28. [28]

    Dynamic Huffman coding.Journal of Algorithms, 6(2):163--180, 1985

    Donald E Knuth. Dynamic Huffman coding.Journal of Algorithms, 6(2):163--180, 1985

  29. [29]

    Datasets for ap- proximate nearest neighbor search

    Laurent Amsaleg and Herv ´e J ´egou. Datasets for ap- proximate nearest neighbor search. http://corpus- texmex.irisa.fr/, 2010

  30. [30]

    VStore: in-storage graph based vector search accelerator

    Shengwen Liang, Ying Wang, Ziming Yuan, Cheng Liu, Huawei Li, and Xiaowei Li. VStore: in-storage graph based vector search accelerator. InProc. of ACM/IEEE DAC, 2022

  31. [31]

    Efficient and robust approximate nearest neighbor search using hier- archical navigable small world graphs.IEEE Trans

    Yu A Malkov and Dmitry A Yashunin. Efficient and robust approximate nearest neighbor search using hier- archical navigable small world graphs.IEEE Trans. on Pattern Analysis and Machine Intelligence, 42(4):824-- 836, 2018

  32. [32]

    SPTAG issue #416: Segmentation fault when building SIFT1B

    Microsoft. SPTAG issue #416: Segmentation fault when building SIFT1B. https://github.com/ microsoft/SPTAG/issues/416, 2024

  33. [33]

    Flann-fast library for approximate nearest neighbors user manual.Computer Science Department, University of British Columbia, Vancouver, BC, Canada, 5(6), 2009

    Marius Muja and David Lowe. Flann-fast library for approximate nearest neighbors user manual.Computer Science Department, University of British Columbia, Vancouver, BC, Canada, 5(6), 2009

  34. [34]

    BIG ANN-Benchmarks

    NeurIPS. BIG ANN-Benchmarks. https://big-ann- benchmarks.com/neurips21.html, 2021

  35. [35]

    Fm- delta: Lossless compression for storing massive fine- tuned foundation models.Proc

    Wanyi Ning, Jingyu Wang, Qi Qi, Mengde Zhu, Haifeng Sun, Daixuan Cheng, Jianxin Liao, and Ce Zhang. Fm- delta: Lossless compression for storing massive fine- tuned foundation models.Proc. of NeurIPS, 2024

  36. [36]

    The ChatGPT Retrieval Plugin lets you easily search and find personal or work documents by ask- ing questions in everyday language

    OpenAI. The ChatGPT Retrieval Plugin lets you easily search and find personal or work documents by ask- ing questions in everyday language. https://github. com/openai/chatgpt-retrieval-plugin, 2023

  37. [37]

    Partitioned elias-fano indexes

    Giuseppe Ottaviano and Rossano Venturini. Partitioned elias-fano indexes. InProceedings of the 37th Interna- tional ACM SIGIR Conference on Research & Develop- ment in Information Retrieval, 2014

  38. [38]

    TiKV.https://tikv.org, 2025

    PinCap. TiKV.https://tikv.org, 2025

  39. [39]

    Sentence-bert: Sen- tence embeddings using siamese bert-networks

    Nils Reimers and Iryna Gurevych. Sentence-bert: Sen- tence embeddings using siamese bert-networks. InProc. of the EMNLP, 2019

  40. [40]

    FPGA-based lossless data compression using Huffman and LZ77 algorithms

    Suzanne Rigler, William Bishop, and Andrew Kennings. FPGA-based lossless data compression using Huffman and LZ77 algorithms. In2007 Canadian conference on electrical and computer engineering, pages 1235--1238. IEEE, 2007

  41. [41]

    A mathematical theory of commu- nication.The Bell system technical journal, 27(3):379-- 423, 1948

    Claude E Shannon. A mathematical theory of commu- nication.The Bell system technical journal, 27(3):379-- 423, 1948

  42. [42]

    FreshDiskANN: A fast and accurate graph-based ANN index for streaming similarity search.arXiv preprint arXiv:2105.09613,

    Aditi Singh, Suhas Jayaram Subramanya, Ravis- hankar Krishnaswamy, and Harsha Vardhan Simhadri. FreshDiskANN: A fast and accurate graph-based ann index for streaming similarity search.arXiv preprint arXiv:2105.09613, 2021

  43. [43]

    Scalable billion-point approximate nearest neighbor search using SmartSSDs

    Bing Tian, Haikun Liu, Zhuohui Duan, Xiaofei Liao, Hai Jin, and Yu Zhang. Scalable billion-point approximate nearest neighbor search using SmartSSDs. InProc. of USENIX ATC, 2024

  44. [44]

    Towards high- throughput and low-latency billion-scale vector search via CPU/GPU collaborative filtering and re-ranking

    Bing Tian, Haikun Liu, Yuhang Tang, Shihai Xiao, Zhuohui Duan, Xiaofei Liao, Hai Jin, Xuecang Zhang, Junhua Zhu, and Yu Zhang. Towards high- throughput and low-latency billion-scale vector search via CPU/GPU collaborative filtering and re-ranking. In Proc. of USENIX FAST, 2025

  45. [45]

    The relative neighbourhood graph of a finite planar set.Pattern recognition, 12(4):261--268, 1980

    Godfried T Toussaint. The relative neighbourhood graph of a finite planar set.Pattern recognition, 12(4):261--268, 1980

  46. [46]

    DiskANN: Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search

    Simhadri Harsha Vardhan, Krishnaswamy Ravishankar, Srinivasa Gopal, Subramanya Suhas Jayaram, Antoni- jevic Andrija, Pryce Dax, Kaczynski David, Williams Shane, Gollapudi Siddarth, Sivashankar Varun, Karia 14 Neel, Singh Aditi, Jaiswal Shikhar, Mahapatro Nee- lam, Adams Philip, Tower Bryan, and Patel Yash. DiskANN: Graph-structured Indices for Scalable, F...

  47. [47]

    MILC: Inverted list compression in memory.Proceedings of the VLDB Endowment, 10(8):853--864, 2017

    Jianguo Wang, Chunbin Lin, Ruining He, Moojin Chae, Yannis Papakonstantinou, and Steven Swanson. MILC: Inverted list compression in memory.Proceedings of the VLDB Endowment, 10(8):853--864, 2017

  48. [48]

    Starling: An I/O-efficient disk-resident graph index framework for high-dimensional vector similarity search on data segment.Proc

    Mengzhao Wang, Weizhi Xu, Xiaomeng Yi, Songlin Wu, Zhangyang Peng, Xiangyu Ke, Yunjun Gao, Xi- aoliang Xu, Rentong Guo, and Charles Xie. Starling: An I/O-efficient disk-resident graph index framework for high-dimensional vector similarity search on data segment.Proc. of ACM SIGMOD, 2024

  49. [49]

    ZipLLM:towards efficient LLM storage reduction via tensor deduplication and delta com- pression

    Zirui Wang, Tingfeng Lan, Zhaoyuan Su, Juncheng Yang, and Yue Cheng. ZipLLM:towards efficient LLM storage reduction via tensor deduplication and delta com- pression. InProc. of USENIX NSDI, 2026

  50. [50]

    Configure replication in Weaviate ANN service

    Weaviate. Configure replication in Weaviate ANN service. https://docs.weaviate.io/deploy/ configuration/replication, 2025

  51. [51]

    In-place updates of a graph in- dex for streaming approximate nearest neighbor search

    Haike Xu, Magdalen Dobson Manohar, Philip A Bern- stein, Badrish Chandramouli, Richard Wen, and Har- sha Vardhan Simhadri. In-place updates of a graph in- dex for streaming approximate nearest neighbor search. arXiv preprint arXiv:2502.13826, 2025

  52. [52]

    SPFresh: Incremental in- place update for billion-scale vector search

    Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, et al. SPFresh: Incremental in- place update for billion-scale vector search. InProc. of ACM SOSP, 2023

  53. [53]

    New Generation Entropy coders

    Yann Collet. New Generation Entropy coders. https: //github.com/Cyan4973/FiniteStateEntropy, 2019

  54. [54]

    Finesse: Fine-grained fea- ture locality based fast resemblance detection for post- deduplication delta compression

    Yucheng Zhang, Wen Xia, Dan Feng, Hong Jiang, Yu Hua, and Qiang Wang. Finesse: Fine-grained fea- ture locality based fast resemblance detection for post- deduplication delta compression. InProc. of USENIX FAST, 2019

  55. [55]

    Fast vector query processing for large datasets beyond GPU memory with reordered pipelining

    Zili Zhang, Fangyue Liu, Gang Huang, Xuanzhe Liu, and Xin Jin. Fast vector query processing for large datasets beyond GPU memory with reordered pipelining. InProc. of USENIX NSDI, 2024. 15