Decoupling Vector Data and Index Storage for Space Efficiency

Di Wu; Juncheng Zhang; Patrick P. C. Lee; Rui Yang; Yanjing Ren; Yuanming Ren

arxiv: 2604.09173 · v2 · pith:EBPXMTFHnew · submitted 2026-04-10 · 💻 cs.DB · cs.OS

Decoupling Vector Data and Index Storage for Space Efficiency

Yuanming Ren , Juncheng Zhang , Yanjing Ren , Rui Yang , Di Wu , Patrick P. C. Lee This is my paper

Pith reviewed 2026-05-19 18:19 UTC · model grok-4.3

classification 💻 cs.DB cs.OS

keywords vector searchANNSgraph indexstorage compressiondisk-residentdata decouplingcomponent-aware compressionnearest neighbor search

0 comments

The pith

COMPASS decouples vector data from index metadata to compress each separately, cutting storage by up to 58.7% while keeping search and update performance competitive.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces COMPASS, a storage framework for disk-resident graph approximate nearest neighbor search systems that manage large vector datasets. It separates the raw vector values from the auxiliary index metadata that guides the search. Because these two components have different patterns of redundancy, each can be losslessly compressed with methods matched to its own characteristics. The framework then adjusts the paths for queries and updates so the compression does not introduce unacceptable slowdowns. This matters for practical deployment because co-located storage has hidden the opportunity for space reduction on billion-scale datasets.

Core claim

COMPASS decouples vector data and auxiliary index metadata in disk-resident graph ANNS systems and applies lossless compression to each component according to its distinct compressibility characteristics. The search and update paths are adapted to the resulting compressed layouts. On real-world public and proprietary billion-scale datasets this yields up to 58.7% storage reduction while delivering search and update performance that is improved or competitive with state-of-the-art disk-resident graph ANNS systems.

What carries the argument

Data-index decoupling, which separates vector data from auxiliary index metadata so each can be compressed independently according to its own compressibility traits.

Load-bearing premise

Vector data and index metadata possess sufficiently distinct compressibility characteristics that separating them for independent compression produces net space savings without unacceptable overhead in search or update operations.

What would settle it

A head-to-head test on the same hardware and datasets showing that query latency or update throughput under the decoupled compressed layout exceeds that of the original co-located storage.

Figures

Figures reproduced from arXiv: 2604.09173 by Di Wu, Juncheng Zhang, Patrick P. C. Lee, Rui Yang, Yanjing Ren, Yuanming Ren.

**Figure 1.** Figure 1: Basic workflow of disk-based ANNS system, using a graph-based auxiliary index as an example. Third, decoupling destroys the sequential I/O locality from the co-location strategy, so retrieving vector data and auxiliary index metadata requires separate I/O operations and increases access overhead. Finally, decoupling complicates consistency maintenance between vector data and auxiliary index metadata durin… view at source ↗

**Figure 2.** Figure 2: Search latency breakdowns of DiskANN and PipeANN, including CPU time and I/O wait time, normalized to the search latency of DiskANN. The number above each bar represents the search latency in microseconds (µs). We use single-threaded execution to eliminate interference from multi-threading and limit variance. graph traversal. Since vertices are fixed-size and page-aligned, DiskANN can directly compute any… view at source ↗

**Figure 4.** Figure 4: Normalized value dispersion and information entropy of different datasets. eral novel techniques to address the storage and I/O inefficiencies described in §2.3. 3.1 Design Overview [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Delta compression for multi-dimensional vectors. Segment file Block-level metadata Block Block Block Full-precision vectors Chunk Chunk Segment file Segment files Chunk-level metadata cached in memory First-block offset # blocks Boundary vector IDs Base vector Become immutable when filled Symbol frequency table Metadata file for a single segment Mutable space Chunk-level metadata Chunk-level metadata [PIT… view at source ↗

**Figure 6.** Figure 6: Hierarchical storage layout for vector data. gorithm [10, 11] to compactly store the sorted neighbor IDs using a two-level representation: each ID is split into lower and higher bits, where the lower bits have the same width across all IDs and record the exact representation, while the higher bits of all IDs are stored in a single bitmap for compact representation. 3.3 Hierarchical Storage Layouts To suppo… view at source ↗

**Figure 7.** Figure 7: Exp#1 (Space efficiency). For 100M-scale datasets, sizes are normalized to SPANN. For billion-scale datasets, sizes are normalized to DiskANN. The numbers above the bars indicate the absolute storage sizes in GiB. 5.2 Macrobenchmarks 5.2.1 Space Efficiency We compare the storage sizes of DecoupleVS against DiskANN and SPANN. We omit the results for PipeANN since it shares the identical storage layout and … view at source ↗

**Figure 8.** Figure 8: Exp#2 (Search throughput). We show the throughput (queries per second) against Recall@10 (%); higher is better. The points from left to right in each curve correspond to increasing candidate list sizes, which generally lead to higher recall. SPANN DiskANN PipeANN DecoupleVS 66 76 87 Recall@10 10 0 10 1 10 2 10 3 P99 Latency (ms) 45 72 100 Recall@10 10 0 10 1 10 2 10 3 45 72 100 Recall@10 10 0 10 1 10 2 10 … view at source ↗

**Figure 9.** Figure 9: Exp#3 (Search latency). We show the P99 latency (ms) against Recall@10 (%); lower is better. best-first search algorithm, which overlaps disk I/Os with distance computations to enhance search performance. Figures 8(d)-8(e) show the search throughput on billionscale datasets. Consistent with the 100 M-scale results, DecoupleVS achieves higher throughput than DiskANN on both SIFT1B and SPACEV1B. However, D… view at source ↗

**Figure 10.** Figure 10: Exp#4 (Concurrent search and updates). We show the overall update performance on SIFT100M. This modest increase results from three factors. First, DecoupleVS caches compression metadata, with minimal overhead (28 MiB on SIFT100M). Second, DecoupleVS requires additional memory buffers for compression and decompression during updates. Third, since DecoupleVS processes the auxiliary index in batches with … view at source ↗

**Figure 11.** Figure 11: Exp#6 (Update performance breakdown). We show the average computation and disk I/O times per update operation, normalized to DiskANN. The numbers atop bars indicate the absolute time (in seconds). points into the on-disk index. We decompose each component into computation and disk I/O times. We present average results over 10 iterations, with error bars showing 95% confidence intervals based on the Stu… view at source ↗

read the original abstract

Managing large-scale vector datasets with disk-resident graph approximate nearest neighbor search (ANNS) systems incurs substantial storage overhead due to the co-location of vector data and auxiliary index metadata, which prevents the storage layer from exploiting their distinct compressibility. We present COMPASS, a component-aware compressed storage framework for disk-resident graph vector search. Leveraging data-index decoupling as a foundation, COMPASS losslessly compresses each component according to its distinct compressibility characteristics, thereby significantly reducing storage space. It further adapts the search and update paths to preserve their performance under compressed storage layouts. Evaluation on real-world public and proprietary billion-scale datasets shows that COMPASS reduces storage space by up to 58.7%, while delivering improved or competitive search and update performance compared to state-of-the-art disk-resident graph ANNS systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces COMPASS, a component-aware compressed storage framework for disk-resident graph ANNS systems. It decouples vector data from auxiliary index metadata to permit independent lossless compression of each component according to its distinct compressibility characteristics, yielding up to 58.7% storage reduction while adapting search and update paths to maintain or improve performance on billion-scale datasets.

Significance. If the performance claims survive a full accounting of decompression costs, the decoupling approach could meaningfully improve storage efficiency for large-scale disk-resident vector search without sacrificing query throughput. The core idea of exploiting differential compressibility between data and metadata is a practical contribution to vector database storage design.

major comments (2)

[§5 Evaluation] §5 Evaluation: the reported search and update latencies are presented only as end-to-end wall-clock times with no breakdown isolating decompression CPU or extra I/O costs introduced by the compressed layouts. Because every vector fetch and neighbor traversal now crosses the decoupled compressed storage, it is impossible to verify whether the 'improved or competitive' result holds once these costs are charged against the baselines.
[Table 2] Table 2 (storage results): the 58.7% reduction figure is given without per-component compression ratios for vectors versus graph metadata or without reporting the raw index sizes before and after decoupling. This makes it difficult to assess how much of the savings is actually attributable to the decoupling rather than to the choice of compressor.

minor comments (2)

[§3.1] §3.1: the notation distinguishing the compressed vector block layout from the compressed metadata block layout is introduced without a small diagram or example, which would help readers follow the subsequent path adaptations.
[Related work] Related work: several recent papers on learned compression for vectors and on separate storage of graph edges are not cited; adding them would better situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and have revised the manuscript to incorporate additional analysis and data as requested.

read point-by-point responses

Referee: [§5 Evaluation] §5 Evaluation: the reported search and update latencies are presented only as end-to-end wall-clock times with no breakdown isolating decompression CPU or extra I/O costs introduced by the compressed layouts. Because every vector fetch and neighbor traversal now crosses the decoupled compressed storage, it is impossible to verify whether the 'improved or competitive' result holds once these costs are charged against the baselines.

Authors: We agree that an explicit cost breakdown would strengthen the evaluation. In the revised manuscript we have added a new subsection (5.4) that reports per-operation breakdowns of decompression CPU time and any additional I/O for both search and update paths on the billion-scale datasets. The data show that decompression overhead remains below 8% of total latency and is more than offset by the reduction in I/O volume, preserving the reported performance advantage. revision: yes
Referee: Table 2 (storage results): the 58.7% reduction figure is given without per-component compression ratios for vectors versus graph metadata or without reporting the raw index sizes before and after decoupling. This makes it difficult to assess how much of the savings is actually attributable to the decoupling rather than to the choice of compressor.

Authors: We acknowledge the value of this granularity. Table 2 has been expanded to include (i) raw storage sizes before and after decoupling, (ii) per-component compression ratios for vector data and for graph metadata separately, and (iii) the contribution of each component to the overall 58.7% reduction. The updated table makes clear that the majority of the savings stems from the higher compressibility of the decoupled metadata, which could not be exploited in the original co-located layout. revision: yes

Circularity Check

0 steps flagged

No significant circularity in COMPASS decoupling and compression claims

full rationale

The paper introduces a storage framework that decouples vector data from graph index metadata to allow independent lossless compression based on their distinct characteristics, then adapts search/update paths accordingly. All load-bearing claims (58.7% space reduction, competitive or improved performance) rest on empirical measurements against external state-of-the-art baselines on public and proprietary billion-scale datasets. No equations or steps reduce by construction to fitted parameters, self-citations, or renamed inputs; the derivation chain consists of standard systems engineering choices followed by external evaluation rather than self-referential definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that data and index components have exploitable compressibility differences and that performance can be preserved after layout changes. No free parameters or invented physical entities are described.

axioms (1)

domain assumption Vector data and index metadata have distinct compressibility characteristics that can be leveraged independently after decoupling.
Explicitly stated as the foundation for the compression approach in the abstract.

invented entities (1)

COMPASS framework no independent evidence
purpose: Component-aware compressed storage with adapted search and update paths for disk-resident graph ANNS.
New system introduced to solve the co-location storage overhead problem.

pith-pipeline@v0.9.0 · 5674 in / 1200 out tokens · 36268 ms · 2026-05-19T18:19:24.413198+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

reduces storage space by up to 58.7% ... competitive or improved search and update performance

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 2 internal anchors

[1]

Cassandra

Apache. Cassandra. https://cassandra.apache. org/, 2025

work page 2025
[2]

Language models are few-shot learners.Proc

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Nee- lakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Proc. of NeurIPS, 2020

work page 2020
[3]

Zhichao Cao, Siying Dong, Sagar Vemuri, and David H. C. Du. Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook. InProc. of USENIX FAST, 2020

work page 2020
[4]

Hsieh, Deborah A

Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. Bigtable: A distributed storage system for structured data. InProc. of USENIX OSDI, 2006

work page 2006
[5]

Sptag: A li- brary for fast approximate nearest neighbor search

Qi Chen, Haidong Wang, Mingqin Li, Gang Ren, Scarlett Li, Jeffery Zhu, Jason Li, Chuanjie Liu, Lin- tao Zhang, and Jingdong Wang. Sptag: A li- brary for fast approximate nearest neighbor search. https://github.com/Microsoft/SPTAG, 2018

work page 2018
[6]

SPANN: Highly-efficient billion-scale approx- imate nearest neighborhood search.Proc

Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. SPANN: Highly-efficient billion-scale approx- imate nearest neighborhood search.Proc. of NeurIPS, 2021

work page 2021
[7]

Yann Collet. LZ4. https://github.com/lz4/lz4, 2011

work page 2011
[8]

Deep neural networks for YouTube recommendations

Paul Covington, Jay Adams, and Emre Sargin. Deep neural networks for YouTube recommendations. In Proc. of ACM RecSys, 2016

work page 2016
[9]

Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding

Jarek Duda. Asymmetric numeral systems: Entropy coding combining speed of Huffman coding with com- pression rate of arithmetic coding.arXiv preprint arXiv:1311.2540, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[10]

Efficient storage and retrieval by content and address of static files.Journal of the ACM (JACM), 21(2):246--260, 1974

Peter Elias. Efficient storage and retrieval by content and address of static files.Journal of the ACM (JACM), 21(2):246--260, 1974

work page 1974
[11]

Massachusetts Institute of Technology, Project MAC, 1971

Robert Mario Fano.On the number of bits required to implement an associative memory. Massachusetts Institute of Technology, Project MAC, 1971

work page 1971
[12]

Fast approximate nearest neighbor search with the navi- gating spreading-out graph.Proceedings of the VLDB Endowment, 12(5):461--474, 2019

Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. Fast approximate nearest neighbor search with the navi- gating spreading-out graph.Proceedings of the VLDB Endowment, 12(5):461--474, 2019

work page 2019
[13]

RabitQ: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search.Proc

Jianyang Gao and Cheng Long. RabitQ: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search.Proc. of ACM SIGMOD, 2(3):1--27, 2024

work page 2024
[14]

Retrieval-Augmented Generation for Large Language Models: A Survey

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[15]

Op- timized product quantization.IEEE Trans

Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. Op- timized product quantization.IEEE Trans. on Pat- tern Analysis and Machine Intelligence, 36(4):744--755, 2013

work page 2013
[16]

Succinct

Giuseppe Ottaviano. Succinct. https://github. com/ot/succinct, 2017

work page 2017
[17]

Real-time person- alization using embeddings for search ranking at airbnb

Mihajlo Grbovic and Haibin Cheng. Real-time person- alization using embeddings for search ranking at airbnb. InProc. of the 24th ACM SIGKDD, 2018

work page 2018
[18]

Achieving low-latency graph- based vector search via aligning best-first search algo- rithm with SSD

Hao Guo and Youyou Lu. Achieving low-latency graph- based vector search via aligning best-first search algo- rithm with SSD. InProc. of USENIX OSDI, 2025

work page 2025
[19]

OdinANN: Direct insert for consistently stable performance in billion-scale graph- based vector search

Hao Guo and Youyou Lu. OdinANN: Direct insert for consistently stable performance in billion-scale graph- based vector search. InProc. of USENIX FAST, 2026

work page 2026
[20]

Research talk: Approximate nearest neighbor search systems at scale

Harsha Simhadri. Research talk: Approximate nearest neighbor search systems at scale. https://youtu.be/ BnYNdSIKibQ?t=179, 2022

work page 2022
[21]

ZipNN: Lossless compression for AI models

Moshik Hershcovitch, Andrew Wood, Leshem Choshen, Guy Girmonsky, Roy Leibovitz, Or Ozeri, Ilias Enn- mouri, Michal Malka, Peter Chin, Swaminathan Sun- 13 dararaman, et al. ZipNN: Lossless compression for AI models. InProc. of IEEE CLOUD, 2025

work page 2025
[22]

AI and the future of unstructured data

IBM. AI and the future of unstructured data. https: //www.ibm.com/think/insights/unstructured- data-trends, 2025

work page 2025
[23]

AI data centers are swallowing the world’s memory and storage supply, setting the stage for a pricing apocalypse that could last a decade

Luke James. AI data centers are swallowing the world’s memory and storage supply, setting the stage for a pricing apocalypse that could last a decade. https://www.tomshardware.com/pc- components/storage/perfect-storm-of- demand-and-supply-driving-up-storage- costs?ref=aisecret.us, 2025

work page 2025
[24]

DiskANN: Fast accurate billion-point near- est neighbor search on a single node.Proc

Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vard- han Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. DiskANN: Fast accurate billion-point near- est neighbor search on a single node.Proc. of NeurIPS, 2019

work page 2019
[25]

Product quantization for nearest neighbor search.IEEE Trans

Herve Jegou, Matthijs Douze, and Cordelia Schmid. Product quantization for nearest neighbor search.IEEE Trans. on Pattern Analysis and Machine Intelligence, 33(1):117--128, 2010

work page 2010
[26]

Searching in one billion vectors: re- rank with source coding

Herv´e J ´egou, Romain Tavenard, Matthijs Douze, and Laurent Amsaleg. Searching in one billion vectors: re- rank with source coding. InProc. of IEEE ICASSP, 2011

work page 2011
[27]

Introducing the Mil- vus Sizing Tool: Calculating and Optimizing Your Milvus Deployment Resources

Ken Zhang, Fendy Feng. Introducing the Mil- vus Sizing Tool: Calculating and Optimizing Your Milvus Deployment Resources . https: //milvus.io/blog/introducing-the-milvus- sizing-tool-calculating-and-optimizing- your-milvus-deployment-resources.md, 2025

work page 2025
[28]

Dynamic Huffman coding.Journal of Algorithms, 6(2):163--180, 1985

Donald E Knuth. Dynamic Huffman coding.Journal of Algorithms, 6(2):163--180, 1985

work page 1985
[29]

Datasets for ap- proximate nearest neighbor search

Laurent Amsaleg and Herv ´e J ´egou. Datasets for ap- proximate nearest neighbor search. http://corpus- texmex.irisa.fr/, 2010

work page 2010
[30]

VStore: in-storage graph based vector search accelerator

Shengwen Liang, Ying Wang, Ziming Yuan, Cheng Liu, Huawei Li, and Xiaowei Li. VStore: in-storage graph based vector search accelerator. InProc. of ACM/IEEE DAC, 2022

work page 2022
[31]

Efficient and robust approximate nearest neighbor search using hier- archical navigable small world graphs.IEEE Trans

Yu A Malkov and Dmitry A Yashunin. Efficient and robust approximate nearest neighbor search using hier- archical navigable small world graphs.IEEE Trans. on Pattern Analysis and Machine Intelligence, 42(4):824-- 836, 2018

work page 2018
[32]

SPTAG issue #416: Segmentation fault when building SIFT1B

Microsoft. SPTAG issue #416: Segmentation fault when building SIFT1B. https://github.com/ microsoft/SPTAG/issues/416, 2024

work page 2024
[33]

Flann-fast library for approximate nearest neighbors user manual.Computer Science Department, University of British Columbia, Vancouver, BC, Canada, 5(6), 2009

Marius Muja and David Lowe. Flann-fast library for approximate nearest neighbors user manual.Computer Science Department, University of British Columbia, Vancouver, BC, Canada, 5(6), 2009

work page 2009
[34]

BIG ANN-Benchmarks

NeurIPS. BIG ANN-Benchmarks. https://big-ann- benchmarks.com/neurips21.html, 2021

work page 2021
[35]

Fm- delta: Lossless compression for storing massive fine- tuned foundation models.Proc

Wanyi Ning, Jingyu Wang, Qi Qi, Mengde Zhu, Haifeng Sun, Daixuan Cheng, Jianxin Liao, and Ce Zhang. Fm- delta: Lossless compression for storing massive fine- tuned foundation models.Proc. of NeurIPS, 2024

work page 2024
[36]

The ChatGPT Retrieval Plugin lets you easily search and find personal or work documents by ask- ing questions in everyday language

OpenAI. The ChatGPT Retrieval Plugin lets you easily search and find personal or work documents by ask- ing questions in everyday language. https://github. com/openai/chatgpt-retrieval-plugin, 2023

work page 2023
[37]

Partitioned elias-fano indexes

Giuseppe Ottaviano and Rossano Venturini. Partitioned elias-fano indexes. InProceedings of the 37th Interna- tional ACM SIGIR Conference on Research & Develop- ment in Information Retrieval, 2014

work page 2014
[38]

TiKV.https://tikv.org, 2025

PinCap. TiKV.https://tikv.org, 2025

work page 2025
[39]

Sentence-bert: Sen- tence embeddings using siamese bert-networks

Nils Reimers and Iryna Gurevych. Sentence-bert: Sen- tence embeddings using siamese bert-networks. InProc. of the EMNLP, 2019

work page 2019
[40]

FPGA-based lossless data compression using Huffman and LZ77 algorithms

Suzanne Rigler, William Bishop, and Andrew Kennings. FPGA-based lossless data compression using Huffman and LZ77 algorithms. In2007 Canadian conference on electrical and computer engineering, pages 1235--1238. IEEE, 2007

work page 2007
[41]

A mathematical theory of commu- nication.The Bell system technical journal, 27(3):379-- 423, 1948

Claude E Shannon. A mathematical theory of commu- nication.The Bell system technical journal, 27(3):379-- 423, 1948

work page 1948
[42]

FreshDiskANN: A fast and accurate graph-based ANN index for streaming similarity search.arXiv preprint arXiv:2105.09613,

Aditi Singh, Suhas Jayaram Subramanya, Ravis- hankar Krishnaswamy, and Harsha Vardhan Simhadri. FreshDiskANN: A fast and accurate graph-based ann index for streaming similarity search.arXiv preprint arXiv:2105.09613, 2021

work page arXiv 2021
[43]

Scalable billion-point approximate nearest neighbor search using SmartSSDs

Bing Tian, Haikun Liu, Zhuohui Duan, Xiaofei Liao, Hai Jin, and Yu Zhang. Scalable billion-point approximate nearest neighbor search using SmartSSDs. InProc. of USENIX ATC, 2024

work page 2024
[44]

Towards high- throughput and low-latency billion-scale vector search via CPU/GPU collaborative filtering and re-ranking

Bing Tian, Haikun Liu, Yuhang Tang, Shihai Xiao, Zhuohui Duan, Xiaofei Liao, Hai Jin, Xuecang Zhang, Junhua Zhu, and Yu Zhang. Towards high- throughput and low-latency billion-scale vector search via CPU/GPU collaborative filtering and re-ranking. In Proc. of USENIX FAST, 2025

work page 2025
[45]

The relative neighbourhood graph of a finite planar set.Pattern recognition, 12(4):261--268, 1980

Godfried T Toussaint. The relative neighbourhood graph of a finite planar set.Pattern recognition, 12(4):261--268, 1980

work page 1980
[46]

DiskANN: Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search

Simhadri Harsha Vardhan, Krishnaswamy Ravishankar, Srinivasa Gopal, Subramanya Suhas Jayaram, Antoni- jevic Andrija, Pryce Dax, Kaczynski David, Williams Shane, Gollapudi Siddarth, Sivashankar Varun, Karia 14 Neel, Singh Aditi, Jaiswal Shikhar, Mahapatro Nee- lam, Adams Philip, Tower Bryan, and Patel Yash. DiskANN: Graph-structured Indices for Scalable, F...

work page 2023
[47]

MILC: Inverted list compression in memory.Proceedings of the VLDB Endowment, 10(8):853--864, 2017

Jianguo Wang, Chunbin Lin, Ruining He, Moojin Chae, Yannis Papakonstantinou, and Steven Swanson. MILC: Inverted list compression in memory.Proceedings of the VLDB Endowment, 10(8):853--864, 2017

work page 2017
[48]

Starling: An I/O-efficient disk-resident graph index framework for high-dimensional vector similarity search on data segment.Proc

Mengzhao Wang, Weizhi Xu, Xiaomeng Yi, Songlin Wu, Zhangyang Peng, Xiangyu Ke, Yunjun Gao, Xi- aoliang Xu, Rentong Guo, and Charles Xie. Starling: An I/O-efficient disk-resident graph index framework for high-dimensional vector similarity search on data segment.Proc. of ACM SIGMOD, 2024

work page 2024
[49]

ZipLLM:towards efficient LLM storage reduction via tensor deduplication and delta com- pression

Zirui Wang, Tingfeng Lan, Zhaoyuan Su, Juncheng Yang, and Yue Cheng. ZipLLM:towards efficient LLM storage reduction via tensor deduplication and delta com- pression. InProc. of USENIX NSDI, 2026

work page 2026
[50]

Configure replication in Weaviate ANN service

Weaviate. Configure replication in Weaviate ANN service. https://docs.weaviate.io/deploy/ configuration/replication, 2025

work page 2025
[51]

In-place updates of a graph in- dex for streaming approximate nearest neighbor search

Haike Xu, Magdalen Dobson Manohar, Philip A Bern- stein, Badrish Chandramouli, Richard Wen, and Har- sha Vardhan Simhadri. In-place updates of a graph in- dex for streaming approximate nearest neighbor search. arXiv preprint arXiv:2502.13826, 2025

work page arXiv 2025
[52]

SPFresh: Incremental in- place update for billion-scale vector search

Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, et al. SPFresh: Incremental in- place update for billion-scale vector search. InProc. of ACM SOSP, 2023

work page 2023
[53]

New Generation Entropy coders

Yann Collet. New Generation Entropy coders. https: //github.com/Cyan4973/FiniteStateEntropy, 2019

work page 2019
[54]

Finesse: Fine-grained fea- ture locality based fast resemblance detection for post- deduplication delta compression

Yucheng Zhang, Wen Xia, Dan Feng, Hong Jiang, Yu Hua, and Qiang Wang. Finesse: Fine-grained fea- ture locality based fast resemblance detection for post- deduplication delta compression. InProc. of USENIX FAST, 2019

work page 2019
[55]

Fast vector query processing for large datasets beyond GPU memory with reordered pipelining

Zili Zhang, Fangyue Liu, Gang Huang, Xuanzhe Liu, and Xin Jin. Fast vector query processing for large datasets beyond GPU memory with reordered pipelining. InProc. of USENIX NSDI, 2024. 15

work page 2024

[1] [1]

Cassandra

Apache. Cassandra. https://cassandra.apache. org/, 2025

work page 2025

[2] [2]

Language models are few-shot learners.Proc

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Nee- lakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Proc. of NeurIPS, 2020

work page 2020

[3] [3]

Zhichao Cao, Siying Dong, Sagar Vemuri, and David H. C. Du. Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook. InProc. of USENIX FAST, 2020

work page 2020

[4] [4]

Hsieh, Deborah A

Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. Bigtable: A distributed storage system for structured data. InProc. of USENIX OSDI, 2006

work page 2006

[5] [5]

Sptag: A li- brary for fast approximate nearest neighbor search

Qi Chen, Haidong Wang, Mingqin Li, Gang Ren, Scarlett Li, Jeffery Zhu, Jason Li, Chuanjie Liu, Lin- tao Zhang, and Jingdong Wang. Sptag: A li- brary for fast approximate nearest neighbor search. https://github.com/Microsoft/SPTAG, 2018

work page 2018

[6] [6]

SPANN: Highly-efficient billion-scale approx- imate nearest neighborhood search.Proc

Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. SPANN: Highly-efficient billion-scale approx- imate nearest neighborhood search.Proc. of NeurIPS, 2021

work page 2021

[7] [7]

Yann Collet. LZ4. https://github.com/lz4/lz4, 2011

work page 2011

[8] [8]

Deep neural networks for YouTube recommendations

Paul Covington, Jay Adams, and Emre Sargin. Deep neural networks for YouTube recommendations. In Proc. of ACM RecSys, 2016

work page 2016

[9] [9]

Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding

Jarek Duda. Asymmetric numeral systems: Entropy coding combining speed of Huffman coding with com- pression rate of arithmetic coding.arXiv preprint arXiv:1311.2540, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[10] [10]

Efficient storage and retrieval by content and address of static files.Journal of the ACM (JACM), 21(2):246--260, 1974

Peter Elias. Efficient storage and retrieval by content and address of static files.Journal of the ACM (JACM), 21(2):246--260, 1974

work page 1974

[11] [11]

Massachusetts Institute of Technology, Project MAC, 1971

Robert Mario Fano.On the number of bits required to implement an associative memory. Massachusetts Institute of Technology, Project MAC, 1971

work page 1971

[12] [12]

Fast approximate nearest neighbor search with the navi- gating spreading-out graph.Proceedings of the VLDB Endowment, 12(5):461--474, 2019

Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. Fast approximate nearest neighbor search with the navi- gating spreading-out graph.Proceedings of the VLDB Endowment, 12(5):461--474, 2019

work page 2019

[13] [13]

RabitQ: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search.Proc

Jianyang Gao and Cheng Long. RabitQ: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search.Proc. of ACM SIGMOD, 2(3):1--27, 2024

work page 2024

[14] [14]

Retrieval-Augmented Generation for Large Language Models: A Survey

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[15] [15]

Op- timized product quantization.IEEE Trans

Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. Op- timized product quantization.IEEE Trans. on Pat- tern Analysis and Machine Intelligence, 36(4):744--755, 2013

work page 2013

[16] [16]

Succinct

Giuseppe Ottaviano. Succinct. https://github. com/ot/succinct, 2017

work page 2017

[17] [17]

Real-time person- alization using embeddings for search ranking at airbnb

Mihajlo Grbovic and Haibin Cheng. Real-time person- alization using embeddings for search ranking at airbnb. InProc. of the 24th ACM SIGKDD, 2018

work page 2018

[18] [18]

Achieving low-latency graph- based vector search via aligning best-first search algo- rithm with SSD

Hao Guo and Youyou Lu. Achieving low-latency graph- based vector search via aligning best-first search algo- rithm with SSD. InProc. of USENIX OSDI, 2025

work page 2025

[19] [19]

OdinANN: Direct insert for consistently stable performance in billion-scale graph- based vector search

Hao Guo and Youyou Lu. OdinANN: Direct insert for consistently stable performance in billion-scale graph- based vector search. InProc. of USENIX FAST, 2026

work page 2026

[20] [20]

Research talk: Approximate nearest neighbor search systems at scale

Harsha Simhadri. Research talk: Approximate nearest neighbor search systems at scale. https://youtu.be/ BnYNdSIKibQ?t=179, 2022

work page 2022

[21] [21]

ZipNN: Lossless compression for AI models

Moshik Hershcovitch, Andrew Wood, Leshem Choshen, Guy Girmonsky, Roy Leibovitz, Or Ozeri, Ilias Enn- mouri, Michal Malka, Peter Chin, Swaminathan Sun- 13 dararaman, et al. ZipNN: Lossless compression for AI models. InProc. of IEEE CLOUD, 2025

work page 2025

[22] [22]

AI and the future of unstructured data

IBM. AI and the future of unstructured data. https: //www.ibm.com/think/insights/unstructured- data-trends, 2025

work page 2025

[23] [23]

AI data centers are swallowing the world’s memory and storage supply, setting the stage for a pricing apocalypse that could last a decade

Luke James. AI data centers are swallowing the world’s memory and storage supply, setting the stage for a pricing apocalypse that could last a decade. https://www.tomshardware.com/pc- components/storage/perfect-storm-of- demand-and-supply-driving-up-storage- costs?ref=aisecret.us, 2025

work page 2025

[24] [24]

DiskANN: Fast accurate billion-point near- est neighbor search on a single node.Proc

Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vard- han Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. DiskANN: Fast accurate billion-point near- est neighbor search on a single node.Proc. of NeurIPS, 2019

work page 2019

[25] [25]

Product quantization for nearest neighbor search.IEEE Trans

Herve Jegou, Matthijs Douze, and Cordelia Schmid. Product quantization for nearest neighbor search.IEEE Trans. on Pattern Analysis and Machine Intelligence, 33(1):117--128, 2010

work page 2010

[26] [26]

Searching in one billion vectors: re- rank with source coding

Herv´e J ´egou, Romain Tavenard, Matthijs Douze, and Laurent Amsaleg. Searching in one billion vectors: re- rank with source coding. InProc. of IEEE ICASSP, 2011

work page 2011

[27] [27]

Introducing the Mil- vus Sizing Tool: Calculating and Optimizing Your Milvus Deployment Resources

Ken Zhang, Fendy Feng. Introducing the Mil- vus Sizing Tool: Calculating and Optimizing Your Milvus Deployment Resources . https: //milvus.io/blog/introducing-the-milvus- sizing-tool-calculating-and-optimizing- your-milvus-deployment-resources.md, 2025

work page 2025

[28] [28]

Dynamic Huffman coding.Journal of Algorithms, 6(2):163--180, 1985

Donald E Knuth. Dynamic Huffman coding.Journal of Algorithms, 6(2):163--180, 1985

work page 1985

[29] [29]

Datasets for ap- proximate nearest neighbor search

Laurent Amsaleg and Herv ´e J ´egou. Datasets for ap- proximate nearest neighbor search. http://corpus- texmex.irisa.fr/, 2010

work page 2010

[30] [30]

VStore: in-storage graph based vector search accelerator

Shengwen Liang, Ying Wang, Ziming Yuan, Cheng Liu, Huawei Li, and Xiaowei Li. VStore: in-storage graph based vector search accelerator. InProc. of ACM/IEEE DAC, 2022

work page 2022

[31] [31]

Efficient and robust approximate nearest neighbor search using hier- archical navigable small world graphs.IEEE Trans

Yu A Malkov and Dmitry A Yashunin. Efficient and robust approximate nearest neighbor search using hier- archical navigable small world graphs.IEEE Trans. on Pattern Analysis and Machine Intelligence, 42(4):824-- 836, 2018

work page 2018

[32] [32]

SPTAG issue #416: Segmentation fault when building SIFT1B

Microsoft. SPTAG issue #416: Segmentation fault when building SIFT1B. https://github.com/ microsoft/SPTAG/issues/416, 2024

work page 2024

[33] [33]

Flann-fast library for approximate nearest neighbors user manual.Computer Science Department, University of British Columbia, Vancouver, BC, Canada, 5(6), 2009

Marius Muja and David Lowe. Flann-fast library for approximate nearest neighbors user manual.Computer Science Department, University of British Columbia, Vancouver, BC, Canada, 5(6), 2009

work page 2009

[34] [34]

BIG ANN-Benchmarks

NeurIPS. BIG ANN-Benchmarks. https://big-ann- benchmarks.com/neurips21.html, 2021

work page 2021

[35] [35]

Fm- delta: Lossless compression for storing massive fine- tuned foundation models.Proc

Wanyi Ning, Jingyu Wang, Qi Qi, Mengde Zhu, Haifeng Sun, Daixuan Cheng, Jianxin Liao, and Ce Zhang. Fm- delta: Lossless compression for storing massive fine- tuned foundation models.Proc. of NeurIPS, 2024

work page 2024

[36] [36]

The ChatGPT Retrieval Plugin lets you easily search and find personal or work documents by ask- ing questions in everyday language

OpenAI. The ChatGPT Retrieval Plugin lets you easily search and find personal or work documents by ask- ing questions in everyday language. https://github. com/openai/chatgpt-retrieval-plugin, 2023

work page 2023

[37] [37]

Partitioned elias-fano indexes

Giuseppe Ottaviano and Rossano Venturini. Partitioned elias-fano indexes. InProceedings of the 37th Interna- tional ACM SIGIR Conference on Research & Develop- ment in Information Retrieval, 2014

work page 2014

[38] [38]

TiKV.https://tikv.org, 2025

PinCap. TiKV.https://tikv.org, 2025

work page 2025

[39] [39]

Sentence-bert: Sen- tence embeddings using siamese bert-networks

Nils Reimers and Iryna Gurevych. Sentence-bert: Sen- tence embeddings using siamese bert-networks. InProc. of the EMNLP, 2019

work page 2019

[40] [40]

FPGA-based lossless data compression using Huffman and LZ77 algorithms

Suzanne Rigler, William Bishop, and Andrew Kennings. FPGA-based lossless data compression using Huffman and LZ77 algorithms. In2007 Canadian conference on electrical and computer engineering, pages 1235--1238. IEEE, 2007

work page 2007

[41] [41]

A mathematical theory of commu- nication.The Bell system technical journal, 27(3):379-- 423, 1948

Claude E Shannon. A mathematical theory of commu- nication.The Bell system technical journal, 27(3):379-- 423, 1948

work page 1948

[42] [42]

FreshDiskANN: A fast and accurate graph-based ANN index for streaming similarity search.arXiv preprint arXiv:2105.09613,

Aditi Singh, Suhas Jayaram Subramanya, Ravis- hankar Krishnaswamy, and Harsha Vardhan Simhadri. FreshDiskANN: A fast and accurate graph-based ann index for streaming similarity search.arXiv preprint arXiv:2105.09613, 2021

work page arXiv 2021

[43] [43]

Scalable billion-point approximate nearest neighbor search using SmartSSDs

Bing Tian, Haikun Liu, Zhuohui Duan, Xiaofei Liao, Hai Jin, and Yu Zhang. Scalable billion-point approximate nearest neighbor search using SmartSSDs. InProc. of USENIX ATC, 2024

work page 2024

[44] [44]

Towards high- throughput and low-latency billion-scale vector search via CPU/GPU collaborative filtering and re-ranking

Bing Tian, Haikun Liu, Yuhang Tang, Shihai Xiao, Zhuohui Duan, Xiaofei Liao, Hai Jin, Xuecang Zhang, Junhua Zhu, and Yu Zhang. Towards high- throughput and low-latency billion-scale vector search via CPU/GPU collaborative filtering and re-ranking. In Proc. of USENIX FAST, 2025

work page 2025

[45] [45]

The relative neighbourhood graph of a finite planar set.Pattern recognition, 12(4):261--268, 1980

Godfried T Toussaint. The relative neighbourhood graph of a finite planar set.Pattern recognition, 12(4):261--268, 1980

work page 1980

[46] [46]

DiskANN: Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search

Simhadri Harsha Vardhan, Krishnaswamy Ravishankar, Srinivasa Gopal, Subramanya Suhas Jayaram, Antoni- jevic Andrija, Pryce Dax, Kaczynski David, Williams Shane, Gollapudi Siddarth, Sivashankar Varun, Karia 14 Neel, Singh Aditi, Jaiswal Shikhar, Mahapatro Nee- lam, Adams Philip, Tower Bryan, and Patel Yash. DiskANN: Graph-structured Indices for Scalable, F...

work page 2023

[47] [47]

MILC: Inverted list compression in memory.Proceedings of the VLDB Endowment, 10(8):853--864, 2017

Jianguo Wang, Chunbin Lin, Ruining He, Moojin Chae, Yannis Papakonstantinou, and Steven Swanson. MILC: Inverted list compression in memory.Proceedings of the VLDB Endowment, 10(8):853--864, 2017

work page 2017

[48] [48]

Starling: An I/O-efficient disk-resident graph index framework for high-dimensional vector similarity search on data segment.Proc

Mengzhao Wang, Weizhi Xu, Xiaomeng Yi, Songlin Wu, Zhangyang Peng, Xiangyu Ke, Yunjun Gao, Xi- aoliang Xu, Rentong Guo, and Charles Xie. Starling: An I/O-efficient disk-resident graph index framework for high-dimensional vector similarity search on data segment.Proc. of ACM SIGMOD, 2024

work page 2024

[49] [49]

ZipLLM:towards efficient LLM storage reduction via tensor deduplication and delta com- pression

Zirui Wang, Tingfeng Lan, Zhaoyuan Su, Juncheng Yang, and Yue Cheng. ZipLLM:towards efficient LLM storage reduction via tensor deduplication and delta com- pression. InProc. of USENIX NSDI, 2026

work page 2026

[50] [50]

Configure replication in Weaviate ANN service

Weaviate. Configure replication in Weaviate ANN service. https://docs.weaviate.io/deploy/ configuration/replication, 2025

work page 2025

[51] [51]

In-place updates of a graph in- dex for streaming approximate nearest neighbor search

Haike Xu, Magdalen Dobson Manohar, Philip A Bern- stein, Badrish Chandramouli, Richard Wen, and Har- sha Vardhan Simhadri. In-place updates of a graph in- dex for streaming approximate nearest neighbor search. arXiv preprint arXiv:2502.13826, 2025

work page arXiv 2025

[52] [52]

SPFresh: Incremental in- place update for billion-scale vector search

Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, et al. SPFresh: Incremental in- place update for billion-scale vector search. InProc. of ACM SOSP, 2023

work page 2023

[53] [53]

New Generation Entropy coders

Yann Collet. New Generation Entropy coders. https: //github.com/Cyan4973/FiniteStateEntropy, 2019

work page 2019

[54] [54]

Finesse: Fine-grained fea- ture locality based fast resemblance detection for post- deduplication delta compression

Yucheng Zhang, Wen Xia, Dan Feng, Hong Jiang, Yu Hua, and Qiang Wang. Finesse: Fine-grained fea- ture locality based fast resemblance detection for post- deduplication delta compression. InProc. of USENIX FAST, 2019

work page 2019

[55] [55]

Fast vector query processing for large datasets beyond GPU memory with reordered pipelining

Zili Zhang, Fangyue Liu, Gang Huang, Xuanzhe Liu, and Xin Jin. Fast vector query processing for large datasets beyond GPU memory with reordered pipelining. InProc. of USENIX NSDI, 2024. 15

work page 2024