Decoupling Vector Data and Index Storage for Space Efficiency
Pith reviewed 2026-05-19 18:19 UTC · model grok-4.3
The pith
COMPASS decouples vector data from index metadata to compress each separately, cutting storage by up to 58.7% while keeping search and update performance competitive.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
COMPASS decouples vector data and auxiliary index metadata in disk-resident graph ANNS systems and applies lossless compression to each component according to its distinct compressibility characteristics. The search and update paths are adapted to the resulting compressed layouts. On real-world public and proprietary billion-scale datasets this yields up to 58.7% storage reduction while delivering search and update performance that is improved or competitive with state-of-the-art disk-resident graph ANNS systems.
What carries the argument
Data-index decoupling, which separates vector data from auxiliary index metadata so each can be compressed independently according to its own compressibility traits.
Load-bearing premise
Vector data and index metadata possess sufficiently distinct compressibility characteristics that separating them for independent compression produces net space savings without unacceptable overhead in search or update operations.
What would settle it
A head-to-head test on the same hardware and datasets showing that query latency or update throughput under the decoupled compressed layout exceeds that of the original co-located storage.
Figures
read the original abstract
Managing large-scale vector datasets with disk-resident graph approximate nearest neighbor search (ANNS) systems incurs substantial storage overhead due to the co-location of vector data and auxiliary index metadata, which prevents the storage layer from exploiting their distinct compressibility. We present COMPASS, a component-aware compressed storage framework for disk-resident graph vector search. Leveraging data-index decoupling as a foundation, COMPASS losslessly compresses each component according to its distinct compressibility characteristics, thereby significantly reducing storage space. It further adapts the search and update paths to preserve their performance under compressed storage layouts. Evaluation on real-world public and proprietary billion-scale datasets shows that COMPASS reduces storage space by up to 58.7%, while delivering improved or competitive search and update performance compared to state-of-the-art disk-resident graph ANNS systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces COMPASS, a component-aware compressed storage framework for disk-resident graph ANNS systems. It decouples vector data from auxiliary index metadata to permit independent lossless compression of each component according to its distinct compressibility characteristics, yielding up to 58.7% storage reduction while adapting search and update paths to maintain or improve performance on billion-scale datasets.
Significance. If the performance claims survive a full accounting of decompression costs, the decoupling approach could meaningfully improve storage efficiency for large-scale disk-resident vector search without sacrificing query throughput. The core idea of exploiting differential compressibility between data and metadata is a practical contribution to vector database storage design.
major comments (2)
- [§5 Evaluation] §5 Evaluation: the reported search and update latencies are presented only as end-to-end wall-clock times with no breakdown isolating decompression CPU or extra I/O costs introduced by the compressed layouts. Because every vector fetch and neighbor traversal now crosses the decoupled compressed storage, it is impossible to verify whether the 'improved or competitive' result holds once these costs are charged against the baselines.
- [Table 2] Table 2 (storage results): the 58.7% reduction figure is given without per-component compression ratios for vectors versus graph metadata or without reporting the raw index sizes before and after decoupling. This makes it difficult to assess how much of the savings is actually attributable to the decoupling rather than to the choice of compressor.
minor comments (2)
- [§3.1] §3.1: the notation distinguishing the compressed vector block layout from the compressed metadata block layout is introduced without a small diagram or example, which would help readers follow the subsequent path adaptations.
- [Related work] Related work: several recent papers on learned compression for vectors and on separate storage of graph edges are not cited; adding them would better situate the contribution.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and have revised the manuscript to incorporate additional analysis and data as requested.
read point-by-point responses
-
Referee: [§5 Evaluation] §5 Evaluation: the reported search and update latencies are presented only as end-to-end wall-clock times with no breakdown isolating decompression CPU or extra I/O costs introduced by the compressed layouts. Because every vector fetch and neighbor traversal now crosses the decoupled compressed storage, it is impossible to verify whether the 'improved or competitive' result holds once these costs are charged against the baselines.
Authors: We agree that an explicit cost breakdown would strengthen the evaluation. In the revised manuscript we have added a new subsection (5.4) that reports per-operation breakdowns of decompression CPU time and any additional I/O for both search and update paths on the billion-scale datasets. The data show that decompression overhead remains below 8% of total latency and is more than offset by the reduction in I/O volume, preserving the reported performance advantage. revision: yes
-
Referee: Table 2 (storage results): the 58.7% reduction figure is given without per-component compression ratios for vectors versus graph metadata or without reporting the raw index sizes before and after decoupling. This makes it difficult to assess how much of the savings is actually attributable to the decoupling rather than to the choice of compressor.
Authors: We acknowledge the value of this granularity. Table 2 has been expanded to include (i) raw storage sizes before and after decoupling, (ii) per-component compression ratios for vector data and for graph metadata separately, and (iii) the contribution of each component to the overall 58.7% reduction. The updated table makes clear that the majority of the savings stems from the higher compressibility of the decoupled metadata, which could not be exploited in the original co-located layout. revision: yes
Circularity Check
No significant circularity in COMPASS decoupling and compression claims
full rationale
The paper introduces a storage framework that decouples vector data from graph index metadata to allow independent lossless compression based on their distinct characteristics, then adapts search/update paths accordingly. All load-bearing claims (58.7% space reduction, competitive or improved performance) rest on empirical measurements against external state-of-the-art baselines on public and proprietary billion-scale datasets. No equations or steps reduce by construction to fitted parameters, self-citations, or renamed inputs; the derivation chain consists of standard systems engineering choices followed by external evaluation rather than self-referential definitions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Vector data and index metadata have distinct compressibility characteristics that can be leveraged independently after decoupling.
invented entities (1)
-
COMPASS framework
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinctionreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
reduces storage space by up to 58.7% ... competitive or improved search and update performance
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
Language models are few-shot learners.Proc
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Nee- lakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Proc. of NeurIPS, 2020
work page 2020
-
[3]
Zhichao Cao, Siying Dong, Sagar Vemuri, and David H. C. Du. Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook. InProc. of USENIX FAST, 2020
work page 2020
-
[4]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. Bigtable: A distributed storage system for structured data. InProc. of USENIX OSDI, 2006
work page 2006
-
[5]
Sptag: A li- brary for fast approximate nearest neighbor search
Qi Chen, Haidong Wang, Mingqin Li, Gang Ren, Scarlett Li, Jeffery Zhu, Jason Li, Chuanjie Liu, Lin- tao Zhang, and Jingdong Wang. Sptag: A li- brary for fast approximate nearest neighbor search. https://github.com/Microsoft/SPTAG, 2018
work page 2018
-
[6]
SPANN: Highly-efficient billion-scale approx- imate nearest neighborhood search.Proc
Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. SPANN: Highly-efficient billion-scale approx- imate nearest neighborhood search.Proc. of NeurIPS, 2021
work page 2021
-
[7]
Yann Collet. LZ4. https://github.com/lz4/lz4, 2011
work page 2011
-
[8]
Deep neural networks for YouTube recommendations
Paul Covington, Jay Adams, and Emre Sargin. Deep neural networks for YouTube recommendations. In Proc. of ACM RecSys, 2016
work page 2016
-
[9]
Jarek Duda. Asymmetric numeral systems: Entropy coding combining speed of Huffman coding with com- pression rate of arithmetic coding.arXiv preprint arXiv:1311.2540, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[10]
Peter Elias. Efficient storage and retrieval by content and address of static files.Journal of the ACM (JACM), 21(2):246--260, 1974
work page 1974
-
[11]
Massachusetts Institute of Technology, Project MAC, 1971
Robert Mario Fano.On the number of bits required to implement an associative memory. Massachusetts Institute of Technology, Project MAC, 1971
work page 1971
-
[12]
Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. Fast approximate nearest neighbor search with the navi- gating spreading-out graph.Proceedings of the VLDB Endowment, 12(5):461--474, 2019
work page 2019
-
[13]
Jianyang Gao and Cheng Long. RabitQ: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search.Proc. of ACM SIGMOD, 2(3):1--27, 2024
work page 2024
-
[14]
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[15]
Op- timized product quantization.IEEE Trans
Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. Op- timized product quantization.IEEE Trans. on Pat- tern Analysis and Machine Intelligence, 36(4):744--755, 2013
work page 2013
- [16]
-
[17]
Real-time person- alization using embeddings for search ranking at airbnb
Mihajlo Grbovic and Haibin Cheng. Real-time person- alization using embeddings for search ranking at airbnb. InProc. of the 24th ACM SIGKDD, 2018
work page 2018
-
[18]
Achieving low-latency graph- based vector search via aligning best-first search algo- rithm with SSD
Hao Guo and Youyou Lu. Achieving low-latency graph- based vector search via aligning best-first search algo- rithm with SSD. InProc. of USENIX OSDI, 2025
work page 2025
-
[19]
Hao Guo and Youyou Lu. OdinANN: Direct insert for consistently stable performance in billion-scale graph- based vector search. InProc. of USENIX FAST, 2026
work page 2026
-
[20]
Research talk: Approximate nearest neighbor search systems at scale
Harsha Simhadri. Research talk: Approximate nearest neighbor search systems at scale. https://youtu.be/ BnYNdSIKibQ?t=179, 2022
work page 2022
-
[21]
ZipNN: Lossless compression for AI models
Moshik Hershcovitch, Andrew Wood, Leshem Choshen, Guy Girmonsky, Roy Leibovitz, Or Ozeri, Ilias Enn- mouri, Michal Malka, Peter Chin, Swaminathan Sun- 13 dararaman, et al. ZipNN: Lossless compression for AI models. InProc. of IEEE CLOUD, 2025
work page 2025
-
[22]
AI and the future of unstructured data
IBM. AI and the future of unstructured data. https: //www.ibm.com/think/insights/unstructured- data-trends, 2025
work page 2025
-
[23]
Luke James. AI data centers are swallowing the world’s memory and storage supply, setting the stage for a pricing apocalypse that could last a decade. https://www.tomshardware.com/pc- components/storage/perfect-storm-of- demand-and-supply-driving-up-storage- costs?ref=aisecret.us, 2025
work page 2025
-
[24]
DiskANN: Fast accurate billion-point near- est neighbor search on a single node.Proc
Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vard- han Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. DiskANN: Fast accurate billion-point near- est neighbor search on a single node.Proc. of NeurIPS, 2019
work page 2019
-
[25]
Product quantization for nearest neighbor search.IEEE Trans
Herve Jegou, Matthijs Douze, and Cordelia Schmid. Product quantization for nearest neighbor search.IEEE Trans. on Pattern Analysis and Machine Intelligence, 33(1):117--128, 2010
work page 2010
-
[26]
Searching in one billion vectors: re- rank with source coding
Herv´e J ´egou, Romain Tavenard, Matthijs Douze, and Laurent Amsaleg. Searching in one billion vectors: re- rank with source coding. InProc. of IEEE ICASSP, 2011
work page 2011
-
[27]
Introducing the Mil- vus Sizing Tool: Calculating and Optimizing Your Milvus Deployment Resources
Ken Zhang, Fendy Feng. Introducing the Mil- vus Sizing Tool: Calculating and Optimizing Your Milvus Deployment Resources . https: //milvus.io/blog/introducing-the-milvus- sizing-tool-calculating-and-optimizing- your-milvus-deployment-resources.md, 2025
work page 2025
-
[28]
Dynamic Huffman coding.Journal of Algorithms, 6(2):163--180, 1985
Donald E Knuth. Dynamic Huffman coding.Journal of Algorithms, 6(2):163--180, 1985
work page 1985
-
[29]
Datasets for ap- proximate nearest neighbor search
Laurent Amsaleg and Herv ´e J ´egou. Datasets for ap- proximate nearest neighbor search. http://corpus- texmex.irisa.fr/, 2010
work page 2010
-
[30]
VStore: in-storage graph based vector search accelerator
Shengwen Liang, Ying Wang, Ziming Yuan, Cheng Liu, Huawei Li, and Xiaowei Li. VStore: in-storage graph based vector search accelerator. InProc. of ACM/IEEE DAC, 2022
work page 2022
-
[31]
Yu A Malkov and Dmitry A Yashunin. Efficient and robust approximate nearest neighbor search using hier- archical navigable small world graphs.IEEE Trans. on Pattern Analysis and Machine Intelligence, 42(4):824-- 836, 2018
work page 2018
-
[32]
SPTAG issue #416: Segmentation fault when building SIFT1B
Microsoft. SPTAG issue #416: Segmentation fault when building SIFT1B. https://github.com/ microsoft/SPTAG/issues/416, 2024
work page 2024
-
[33]
Marius Muja and David Lowe. Flann-fast library for approximate nearest neighbors user manual.Computer Science Department, University of British Columbia, Vancouver, BC, Canada, 5(6), 2009
work page 2009
-
[34]
NeurIPS. BIG ANN-Benchmarks. https://big-ann- benchmarks.com/neurips21.html, 2021
work page 2021
-
[35]
Fm- delta: Lossless compression for storing massive fine- tuned foundation models.Proc
Wanyi Ning, Jingyu Wang, Qi Qi, Mengde Zhu, Haifeng Sun, Daixuan Cheng, Jianxin Liao, and Ce Zhang. Fm- delta: Lossless compression for storing massive fine- tuned foundation models.Proc. of NeurIPS, 2024
work page 2024
-
[36]
OpenAI. The ChatGPT Retrieval Plugin lets you easily search and find personal or work documents by ask- ing questions in everyday language. https://github. com/openai/chatgpt-retrieval-plugin, 2023
work page 2023
-
[37]
Partitioned elias-fano indexes
Giuseppe Ottaviano and Rossano Venturini. Partitioned elias-fano indexes. InProceedings of the 37th Interna- tional ACM SIGIR Conference on Research & Develop- ment in Information Retrieval, 2014
work page 2014
- [38]
-
[39]
Sentence-bert: Sen- tence embeddings using siamese bert-networks
Nils Reimers and Iryna Gurevych. Sentence-bert: Sen- tence embeddings using siamese bert-networks. InProc. of the EMNLP, 2019
work page 2019
-
[40]
FPGA-based lossless data compression using Huffman and LZ77 algorithms
Suzanne Rigler, William Bishop, and Andrew Kennings. FPGA-based lossless data compression using Huffman and LZ77 algorithms. In2007 Canadian conference on electrical and computer engineering, pages 1235--1238. IEEE, 2007
work page 2007
-
[41]
A mathematical theory of commu- nication.The Bell system technical journal, 27(3):379-- 423, 1948
Claude E Shannon. A mathematical theory of commu- nication.The Bell system technical journal, 27(3):379-- 423, 1948
work page 1948
-
[42]
Aditi Singh, Suhas Jayaram Subramanya, Ravis- hankar Krishnaswamy, and Harsha Vardhan Simhadri. FreshDiskANN: A fast and accurate graph-based ann index for streaming similarity search.arXiv preprint arXiv:2105.09613, 2021
-
[43]
Scalable billion-point approximate nearest neighbor search using SmartSSDs
Bing Tian, Haikun Liu, Zhuohui Duan, Xiaofei Liao, Hai Jin, and Yu Zhang. Scalable billion-point approximate nearest neighbor search using SmartSSDs. InProc. of USENIX ATC, 2024
work page 2024
-
[44]
Bing Tian, Haikun Liu, Yuhang Tang, Shihai Xiao, Zhuohui Duan, Xiaofei Liao, Hai Jin, Xuecang Zhang, Junhua Zhu, and Yu Zhang. Towards high- throughput and low-latency billion-scale vector search via CPU/GPU collaborative filtering and re-ranking. In Proc. of USENIX FAST, 2025
work page 2025
-
[45]
The relative neighbourhood graph of a finite planar set.Pattern recognition, 12(4):261--268, 1980
Godfried T Toussaint. The relative neighbourhood graph of a finite planar set.Pattern recognition, 12(4):261--268, 1980
work page 1980
-
[46]
Simhadri Harsha Vardhan, Krishnaswamy Ravishankar, Srinivasa Gopal, Subramanya Suhas Jayaram, Antoni- jevic Andrija, Pryce Dax, Kaczynski David, Williams Shane, Gollapudi Siddarth, Sivashankar Varun, Karia 14 Neel, Singh Aditi, Jaiswal Shikhar, Mahapatro Nee- lam, Adams Philip, Tower Bryan, and Patel Yash. DiskANN: Graph-structured Indices for Scalable, F...
work page 2023
-
[47]
MILC: Inverted list compression in memory.Proceedings of the VLDB Endowment, 10(8):853--864, 2017
Jianguo Wang, Chunbin Lin, Ruining He, Moojin Chae, Yannis Papakonstantinou, and Steven Swanson. MILC: Inverted list compression in memory.Proceedings of the VLDB Endowment, 10(8):853--864, 2017
work page 2017
-
[48]
Mengzhao Wang, Weizhi Xu, Xiaomeng Yi, Songlin Wu, Zhangyang Peng, Xiangyu Ke, Yunjun Gao, Xi- aoliang Xu, Rentong Guo, and Charles Xie. Starling: An I/O-efficient disk-resident graph index framework for high-dimensional vector similarity search on data segment.Proc. of ACM SIGMOD, 2024
work page 2024
-
[49]
ZipLLM:towards efficient LLM storage reduction via tensor deduplication and delta com- pression
Zirui Wang, Tingfeng Lan, Zhaoyuan Su, Juncheng Yang, and Yue Cheng. ZipLLM:towards efficient LLM storage reduction via tensor deduplication and delta com- pression. InProc. of USENIX NSDI, 2026
work page 2026
-
[50]
Configure replication in Weaviate ANN service
Weaviate. Configure replication in Weaviate ANN service. https://docs.weaviate.io/deploy/ configuration/replication, 2025
work page 2025
-
[51]
In-place updates of a graph in- dex for streaming approximate nearest neighbor search
Haike Xu, Magdalen Dobson Manohar, Philip A Bern- stein, Badrish Chandramouli, Richard Wen, and Har- sha Vardhan Simhadri. In-place updates of a graph in- dex for streaming approximate nearest neighbor search. arXiv preprint arXiv:2502.13826, 2025
-
[52]
SPFresh: Incremental in- place update for billion-scale vector search
Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, et al. SPFresh: Incremental in- place update for billion-scale vector search. InProc. of ACM SOSP, 2023
work page 2023
-
[53]
Yann Collet. New Generation Entropy coders. https: //github.com/Cyan4973/FiniteStateEntropy, 2019
work page 2019
-
[54]
Yucheng Zhang, Wen Xia, Dan Feng, Hong Jiang, Yu Hua, and Qiang Wang. Finesse: Fine-grained fea- ture locality based fast resemblance detection for post- deduplication delta compression. InProc. of USENIX FAST, 2019
work page 2019
-
[55]
Fast vector query processing for large datasets beyond GPU memory with reordered pipelining
Zili Zhang, Fangyue Liu, Gang Huang, Xuanzhe Liu, and Xin Jin. Fast vector query processing for large datasets beyond GPU memory with reordered pipelining. InProc. of USENIX NSDI, 2024. 15
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.