pith. machine review for the scientific record. sign in

arxiv: 2604.09173 · v1 · submitted 2026-04-10 · 💻 cs.DB · cs.OS

Recognition: unknown

Decoupling Vector Data and Index Storage for Space Efficiency

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:41 UTC · model grok-4.3

classification 💻 cs.DB cs.OS
keywords vector storagedecoupled storageANNSapproximate nearest neighbordisk-based searchstorage optimizationindex metadatavector datasets
0
0 comments X

The pith

Decoupling vector data from index metadata in disk-based approximate nearest neighbor search systems can reduce storage space by up to 58.7% while maintaining competitive or improved query and update performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that current disk-based ANNS systems pay a high price for storing vector data and index metadata in the same place, which wastes space, forces extra disk reads on every search, and creates heavy write amplification on updates. It introduces DecoupleVS as a framework that stores the two components separately and then applies targeted compression, custom data layouts, and separate handling for queries and updates. This approach cuts overall storage needs on real billion-scale datasets while keeping search accuracy and speed high. A reader would care because vector search now powers many AI and recommendation systems, and the cost of storing and accessing these massive datasets on disk is a growing practical limit.

Core claim

DecoupleVS is a storage management framework that decouples vector data from auxiliary index metadata in disk-based ANNS systems. It applies specialized techniques for compression, data layouts, query processing, and update handling on the separated components, achieving substantial storage reduction while preserving high search and update performance and accuracy on billion-scale datasets.

What carries the argument

DecoupleVS, the framework that separates vector data storage from index metadata storage to allow independent optimizations for compression, layout, and access patterns.

Load-bearing premise

The primary storage, read, and write problems in existing ANNS systems come from co-locating vector data with index metadata rather than from other design choices.

What would settle it

Measuring storage size and query/update latency on the same billion-scale dataset with both DecoupleVS and a monolithic system; if storage savings fall below 20% or performance degrades noticeably, the claimed benefit would not hold.

Figures

Figures reproduced from arXiv: 2604.09173 by Di Wu, Juncheng Zhang, Patrick P. C. Lee, Rui Yang, Yanjing Ren, Yuanming Ren.

Figure 1
Figure 1. Figure 1: Basic workflow of disk-based ANNS system, using a graph-based auxiliary index as an example. Third, decoupling destroys the sequential I/O locality from the co-location strategy, so retrieving vector data and aux￾iliary index metadata requires separate I/O operations and increases access overhead. Finally, decoupling complicates consistency maintenance between vector data and auxiliary index metadata durin… view at source ↗
Figure 2
Figure 2. Figure 2: Search latency breakdowns of DiskANN and PipeANN, including CPU time and I/O wait time, normalized to the search la￾tency of DiskANN. The number above each bar represents the search latency in microseconds (µs). We use single-threaded execution to eliminate interference from multi-threading and limit variance. graph traversal. Since vertices are fixed-size and page-aligned, DiskANN can directly compute any… view at source ↗
Figure 4
Figure 4. Figure 4: Normalized value dispersion and information entropy of different datasets. eral novel techniques to address the storage and I/O ineffi￾ciencies described in §2.3. 3.1 Design Overview [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Delta compression for multi-dimensional vectors. Segment file Block-level metadata Block Block Block Full-precision vectors Chunk Chunk Segment file Segment files Chunk-level metadata cached in memory First-block offset # blocks Boundary vector IDs Base vector Become immutable when filled Symbol frequency table Metadata file for a single segment Mutable space Chunk-level metadata Chunk-level metadata [PIT… view at source ↗
Figure 6
Figure 6. Figure 6: Hierarchical storage layout for vector data. gorithm [10, 11] to compactly store the sorted neighbor IDs using a two-level representation: each ID is split into lower and higher bits, where the lower bits have the same width across all IDs and record the exact representation, while the higher bits of all IDs are stored in a single bitmap for compact representation. 3.3 Hierarchical Storage Layouts To suppo… view at source ↗
Figure 7
Figure 7. Figure 7: Exp#1 (Space efficiency). For 100M-scale datasets, sizes are normalized to SPANN. For billion-scale datasets, sizes are nor￾malized to DiskANN. The numbers above the bars indicate the absolute storage sizes in GiB. 5.2 Macrobenchmarks 5.2.1 Space Efficiency We compare the storage sizes of DecoupleVS against DiskANN and SPANN. We omit the results for PipeANN since it shares the identical storage layout and … view at source ↗
Figure 8
Figure 8. Figure 8: Exp#2 (Search throughput). We show the throughput (queries per second) against Recall@10 (%); higher is better. The points from left to right in each curve correspond to increasing candidate list sizes, which generally lead to higher recall. SPANN DiskANN PipeANN DecoupleVS 66 76 87 Recall@10 10 0 10 1 10 2 10 3 P99 Latency (ms) 45 72 100 Recall@10 10 0 10 1 10 2 10 3 45 72 100 Recall@10 10 0 10 1 10 2 10 … view at source ↗
Figure 9
Figure 9. Figure 9: Exp#3 (Search latency). We show the P99 latency (ms) against Recall@10 (%); lower is better. best-first search algorithm, which overlaps disk I/Os with distance computations to enhance search performance. Figures 8(d)-8(e) show the search throughput on billion￾scale datasets. Consistent with the 100 M-scale results, De￾coupleVS achieves higher throughput than DiskANN on both SIFT1B and SPACEV1B. However, D… view at source ↗
Figure 10
Figure 10. Figure 10: Exp#4 (Concurrent search and updates). We show the overall update performance on SIFT100M. This modest increase results from three factors. First, Decou￾pleVS caches compression metadata, with minimal overhead (28 MiB on SIFT100M). Second, DecoupleVS requires addi￾tional memory buffers for compression and decompression during updates. Third, since DecoupleVS processes the auxil￾iary index in batches with … view at source ↗
Figure 11
Figure 11. Figure 11: Exp#6 (Update performance breakdown). We show the average computation and disk I/O times per update operation, nor￾malized to DiskANN. The numbers atop bars indicate the absolute time (in seconds). points into the on-disk index. We decompose each compo￾nent into computation and disk I/O times. We present average results over 10 iterations, with error bars showing 95% confi￾dence intervals based on the Stu… view at source ↗
read the original abstract

Managing large-scale vector datasets with disk-based approximate nearest neighbor search (ANNS) systems faces critical efficiency challenges stemming from the co-location of vector data and auxiliary index metadata. Our analysis of state-of-the-art ANNS systems reveals that such co-location incurs substantial storage overhead, generates excessive reads during search queries, and causes severe write amplification during updates. We present DecoupleVS, a decoupled vector storage management framework that enables specialized optimizations for vector data and auxiliary index metadata. DecoupleVS incorporates various design techniques for effective compression, data layouts, search queries, and updates, so as to significantly reduce storage space, while maintaining high search and update performance and high search accuracy. Evaluation on real-world public and proprietary billion-scale datasets shows that DecoupleVS reduces storage space by up to 58.7\%, while delivering competitive or improved search query and update performance, compared to state-of-the-art monolithic disk-based ANNS systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper proposes DecoupleVS, a decoupled vector storage management framework for disk-based approximate nearest neighbor search (ANNS) systems. It identifies storage overhead, read amplification during queries, and write amplification during updates as consequences of co-locating vector data with auxiliary index metadata in monolithic designs. DecoupleVS applies specialized compression, data layouts, and separate query/update paths to reduce space while preserving search accuracy and performance. Evaluation on real-world public and proprietary billion-scale datasets reports up to 58.7% storage reduction with competitive or improved query and update performance relative to state-of-the-art monolithic disk-based ANNS systems.

Significance. If the empirical claims hold, the work addresses a practical bottleneck in large-scale vector databases where storage costs dominate. The decoupling strategy enables independent optimization of data and metadata, which is a direct systems contribution. Credit is due for the evaluation on both public and proprietary billion-scale datasets and for reporting non-degraded query/update performance alongside the space savings.

minor comments (3)
  1. The abstract states that DecoupleVS 'incorporates various design techniques for effective compression, data layouts, search queries, and updates'; a concise summary table or diagram early in the paper (e.g., in the system overview section) would clarify which techniques apply to which component and how they interact.
  2. The 58.7% space reduction is presented as the maximum observed; specifying the exact dataset, index type, and compression configuration that achieves this figure would strengthen the central empirical claim.
  3. The paper compares against 'state-of-the-art monolithic disk-based ANNS systems'; an explicit list of the baselines (with version numbers or citations) and a summary table of storage, latency, throughput, and recall metrics would improve readability and reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of DecoupleVS and the recommendation for minor revision. The review correctly identifies the core challenges of co-located vector and index storage in disk-based ANNS and acknowledges the practical value of our decoupling approach, including the evaluation on billion-scale datasets. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper is a systems/empirical contribution that identifies storage co-location overheads in existing disk-based ANNS systems, proposes a decoupled framework (DecoupleVS) with compression, layout, query, and update techniques, and validates the approach via direct evaluation on public and proprietary billion-scale datasets. No derivation chain, fitted parameters, self-referential predictions, or load-bearing self-citations appear in the abstract or described structure. Central claims (up to 58.7% space reduction with competitive performance) rest on external experimental comparison rather than reduction to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no specific free parameters, axioms, or invented entities detailed.

pith-pipeline@v0.9.0 · 5467 in / 951 out tokens · 44202 ms · 2026-05-10T16:41:01.154166+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 4 canonical work pages · 1 internal anchor

  1. [1]

    Cassandra

    Apache. Cassandra. https://cassandra.apache. org/, 2025

  2. [2]

    Language models are few-shot learners.Proc

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Nee- lakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Proc. of NeurIPS, 2020

  3. [3]

    Zhichao Cao, Siying Dong, Sagar Vemuri, and David H. C. Du. Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook. InProc. of USENIX FAST, 2020

  4. [4]

    Hsieh, Deborah A

    Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. Bigtable: A distributed storage system for structured data. InProc. of USENIX OSDI, 2006

  5. [5]

    Sptag: A li- brary for fast approximate nearest neighbor search

    Qi Chen, Haidong Wang, Mingqin Li, Gang Ren, Scarlett Li, Jeffery Zhu, Jason Li, Chuanjie Liu, Lin- tao Zhang, and Jingdong Wang. Sptag: A li- brary for fast approximate nearest neighbor search. https://github.com/Microsoft/SPTAG, 2018

  6. [6]

    SPANN: Highly-efficient billion-scale approx- imate nearest neighborhood search.Proc

    Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. SPANN: Highly-efficient billion-scale approx- imate nearest neighborhood search.Proc. of NeurIPS, 2021

  7. [7]

    Yann Collet. LZ4. https://github.com/lz4/lz4, 2011

  8. [8]

    Deep neural networks for YouTube recommendations

    Paul Covington, Jay Adams, and Emre Sargin. Deep neural networks for YouTube recommendations. In Proc. of ACM RecSys, 2016

  9. [9]

    Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding

    Jarek Duda. Asymmetric numeral systems: Entropy coding combining speed of Huffman coding with com- pression rate of arithmetic coding.arXiv preprint arXiv:1311.2540, 2013

  10. [10]

    Efficient storage and retrieval by content and address of static files.Journal of the ACM (JACM), 21(2):246--260, 1974

    Peter Elias. Efficient storage and retrieval by content and address of static files.Journal of the ACM (JACM), 21(2):246--260, 1974

  11. [11]

    Massachusetts Institute of Technology, Project MAC, 1971

    Robert Mario Fano.On the number of bits required to implement an associative memory. Massachusetts Institute of Technology, Project MAC, 1971

  12. [12]

    Fast approximate nearest neighbor search with the navi- gating spreading-out graph.Proceedings of the VLDB Endowment, 12(5):461--474, 2019

    Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. Fast approximate nearest neighbor search with the navi- gating spreading-out graph.Proceedings of the VLDB Endowment, 12(5):461--474, 2019

  13. [13]

    RabitQ: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search.Proc

    Jianyang Gao and Cheng Long. RabitQ: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search.Proc. of ACM SIGMOD, 2(3):1--27, 2024

  14. [14]

    Retrieval-Augmented Generation for Large Language Models: A Survey

    Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997, 2023

  15. [15]

    Op- timized product quantization.IEEE Trans

    Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. Op- timized product quantization.IEEE Trans. on Pat- tern Analysis and Machine Intelligence, 36(4):744--755, 2013

  16. [16]

    Succinct

    Giuseppe Ottaviano. Succinct. https://github. com/ot/succinct, 2017

  17. [17]

    Real-time person- alization using embeddings for search ranking at airbnb

    Mihajlo Grbovic and Haibin Cheng. Real-time person- alization using embeddings for search ranking at airbnb. InProc. of the 24th ACM SIGKDD, 2018

  18. [18]

    Achieving low-latency graph- based vector search via aligning best-first search algo- rithm with SSD

    Hao Guo and Youyou Lu. Achieving low-latency graph- based vector search via aligning best-first search algo- rithm with SSD. InProc. of USENIX OSDI, 2025

  19. [19]

    OdinANN: Direct insert for consistently stable performance in billion-scale graph- based vector search

    Hao Guo and Youyou Lu. OdinANN: Direct insert for consistently stable performance in billion-scale graph- based vector search. InProc. of USENIX FAST, 2026

  20. [20]

    Research talk: Approximate nearest neighbor search systems at scale

    Harsha Simhadri. Research talk: Approximate nearest neighbor search systems at scale. https://youtu.be/ BnYNdSIKibQ?t=179, 2022

  21. [21]

    ZipNN: Lossless compression for AI models

    Moshik Hershcovitch, Andrew Wood, Leshem Choshen, Guy Girmonsky, Roy Leibovitz, Or Ozeri, Ilias Enn- mouri, Michal Malka, Peter Chin, Swaminathan Sun- 13 dararaman, et al. ZipNN: Lossless compression for AI models. InProc. of IEEE CLOUD, 2025

  22. [22]

    AI and the future of unstructured data

    IBM. AI and the future of unstructured data. https: //www.ibm.com/think/insights/unstructured- data-trends, 2025

  23. [23]

    AI data centers are swallowing the world’s memory and storage supply, setting the stage for a pricing apocalypse that could last a decade

    Luke James. AI data centers are swallowing the world’s memory and storage supply, setting the stage for a pricing apocalypse that could last a decade. https://www.tomshardware.com/pc- components/storage/perfect-storm-of- demand-and-supply-driving-up-storage- costs?ref=aisecret.us, 2025

  24. [24]

    DiskANN: Fast accurate billion-point near- est neighbor search on a single node.Proc

    Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vard- han Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. DiskANN: Fast accurate billion-point near- est neighbor search on a single node.Proc. of NeurIPS, 2019

  25. [25]

    Product quantization for nearest neighbor search.IEEE Trans

    Herve Jegou, Matthijs Douze, and Cordelia Schmid. Product quantization for nearest neighbor search.IEEE Trans. on Pattern Analysis and Machine Intelligence, 33(1):117--128, 2010

  26. [26]

    Searching in one billion vectors: re- rank with source coding

    Herv´e J ´egou, Romain Tavenard, Matthijs Douze, and Laurent Amsaleg. Searching in one billion vectors: re- rank with source coding. InProc. of IEEE ICASSP, 2011

  27. [27]

    Introducing the Mil- vus Sizing Tool: Calculating and Optimizing Your Milvus Deployment Resources

    Ken Zhang, Fendy Feng. Introducing the Mil- vus Sizing Tool: Calculating and Optimizing Your Milvus Deployment Resources . https: //milvus.io/blog/introducing-the-milvus- sizing-tool-calculating-and-optimizing- your-milvus-deployment-resources.md, 2025

  28. [28]

    Dynamic Huffman coding.Journal of Algorithms, 6(2):163--180, 1985

    Donald E Knuth. Dynamic Huffman coding.Journal of Algorithms, 6(2):163--180, 1985

  29. [29]

    Datasets for ap- proximate nearest neighbor search

    Laurent Amsaleg and Herv ´e J ´egou. Datasets for ap- proximate nearest neighbor search. http://corpus- texmex.irisa.fr/, 2010

  30. [30]

    VStore: in-storage graph based vector search accelerator

    Shengwen Liang, Ying Wang, Ziming Yuan, Cheng Liu, Huawei Li, and Xiaowei Li. VStore: in-storage graph based vector search accelerator. InProc. of ACM/IEEE DAC, 2022

  31. [31]

    Efficient and robust approximate nearest neighbor search using hier- archical navigable small world graphs.IEEE Trans

    Yu A Malkov and Dmitry A Yashunin. Efficient and robust approximate nearest neighbor search using hier- archical navigable small world graphs.IEEE Trans. on Pattern Analysis and Machine Intelligence, 42(4):824-- 836, 2018

  32. [32]

    SPTAG issue #416: Segmentation fault when building SIFT1B

    Microsoft. SPTAG issue #416: Segmentation fault when building SIFT1B. https://github.com/ microsoft/SPTAG/issues/416, 2024

  33. [33]

    Flann-fast library for approximate nearest neighbors user manual.Computer Science Department, University of British Columbia, Vancouver, BC, Canada, 5(6), 2009

    Marius Muja and David Lowe. Flann-fast library for approximate nearest neighbors user manual.Computer Science Department, University of British Columbia, Vancouver, BC, Canada, 5(6), 2009

  34. [34]

    BIG ANN-Benchmarks

    NeurIPS. BIG ANN-Benchmarks. https://big-ann- benchmarks.com/neurips21.html, 2021

  35. [35]

    Fm- delta: Lossless compression for storing massive fine- tuned foundation models.Proc

    Wanyi Ning, Jingyu Wang, Qi Qi, Mengde Zhu, Haifeng Sun, Daixuan Cheng, Jianxin Liao, and Ce Zhang. Fm- delta: Lossless compression for storing massive fine- tuned foundation models.Proc. of NeurIPS, 2024

  36. [36]

    The ChatGPT Retrieval Plugin lets you easily search and find personal or work documents by ask- ing questions in everyday language

    OpenAI. The ChatGPT Retrieval Plugin lets you easily search and find personal or work documents by ask- ing questions in everyday language. https://github. com/openai/chatgpt-retrieval-plugin, 2023

  37. [37]

    Partitioned elias-fano indexes

    Giuseppe Ottaviano and Rossano Venturini. Partitioned elias-fano indexes. InProceedings of the 37th Interna- tional ACM SIGIR Conference on Research & Develop- ment in Information Retrieval, 2014

  38. [38]

    TiKV.https://tikv.org, 2025

    PinCap. TiKV.https://tikv.org, 2025

  39. [39]

    Sentence-bert: Sen- tence embeddings using siamese bert-networks

    Nils Reimers and Iryna Gurevych. Sentence-bert: Sen- tence embeddings using siamese bert-networks. InProc. of the EMNLP, 2019

  40. [40]

    FPGA-based lossless data compression using Huffman and LZ77 algorithms

    Suzanne Rigler, William Bishop, and Andrew Kennings. FPGA-based lossless data compression using Huffman and LZ77 algorithms. In2007 Canadian conference on electrical and computer engineering, pages 1235--1238. IEEE, 2007

  41. [41]

    A mathematical theory of commu- nication.The Bell system technical journal, 27(3):379-- 423, 1948

    Claude E Shannon. A mathematical theory of commu- nication.The Bell system technical journal, 27(3):379-- 423, 1948

  42. [42]

    Freshdiskann: A fast and accurate graph-based ann index for streaming similarity search,

    Aditi Singh, Suhas Jayaram Subramanya, Ravis- hankar Krishnaswamy, and Harsha Vardhan Simhadri. FreshDiskANN: A fast and accurate graph-based ann index for streaming similarity search.arXiv preprint arXiv:2105.09613, 2021

  43. [43]

    Scalable billion-point approximate nearest neighbor search using SmartSSDs

    Bing Tian, Haikun Liu, Zhuohui Duan, Xiaofei Liao, Hai Jin, and Yu Zhang. Scalable billion-point approximate nearest neighbor search using SmartSSDs. InProc. of USENIX ATC, 2024

  44. [44]

    Towards high- throughput and low-latency billion-scale vector search via CPU/GPU collaborative filtering and re-ranking

    Bing Tian, Haikun Liu, Yuhang Tang, Shihai Xiao, Zhuohui Duan, Xiaofei Liao, Hai Jin, Xuecang Zhang, Junhua Zhu, and Yu Zhang. Towards high- throughput and low-latency billion-scale vector search via CPU/GPU collaborative filtering and re-ranking. In Proc. of USENIX FAST, 2025

  45. [45]

    The relative neighbourhood graph of a finite planar set.Pattern recognition, 12(4):261--268, 1980

    Godfried T Toussaint. The relative neighbourhood graph of a finite planar set.Pattern recognition, 12(4):261--268, 1980

  46. [46]

    DiskANN: Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search

    Simhadri Harsha Vardhan, Krishnaswamy Ravishankar, Srinivasa Gopal, Subramanya Suhas Jayaram, Antoni- jevic Andrija, Pryce Dax, Kaczynski David, Williams Shane, Gollapudi Siddarth, Sivashankar Varun, Karia 14 Neel, Singh Aditi, Jaiswal Shikhar, Mahapatro Nee- lam, Adams Philip, Tower Bryan, and Patel Yash. DiskANN: Graph-structured Indices for Scalable, F...

  47. [47]

    MILC: Inverted list compression in memory.Proceedings of the VLDB Endowment, 10(8):853--864, 2017

    Jianguo Wang, Chunbin Lin, Ruining He, Moojin Chae, Yannis Papakonstantinou, and Steven Swanson. MILC: Inverted list compression in memory.Proceedings of the VLDB Endowment, 10(8):853--864, 2017

  48. [48]

    Starling: An I/O-efficient disk-resident graph index framework for high-dimensional vector similarity search on data segment.Proc

    Mengzhao Wang, Weizhi Xu, Xiaomeng Yi, Songlin Wu, Zhangyang Peng, Xiangyu Ke, Yunjun Gao, Xi- aoliang Xu, Rentong Guo, and Charles Xie. Starling: An I/O-efficient disk-resident graph index framework for high-dimensional vector similarity search on data segment.Proc. of ACM SIGMOD, 2024

  49. [49]

    ZipLLM:towards efficient LLM storage reduction via tensor deduplication and delta com- pression

    Zirui Wang, Tingfeng Lan, Zhaoyuan Su, Juncheng Yang, and Yue Cheng. ZipLLM:towards efficient LLM storage reduction via tensor deduplication and delta com- pression. InProc. of USENIX NSDI, 2026

  50. [50]

    Configure replication in Weaviate ANN service

    Weaviate. Configure replication in Weaviate ANN service. https://docs.weaviate.io/deploy/ configuration/replication, 2025

  51. [51]

    Bernstein, Badrish Chan- dramouli, Richard Wen, and Harsha Vardhan Simhadri

    Haike Xu, Magdalen Dobson Manohar, Philip A Bern- stein, Badrish Chandramouli, Richard Wen, and Har- sha Vardhan Simhadri. In-place updates of a graph in- dex for streaming approximate nearest neighbor search. arXiv preprint arXiv:2502.13826, 2025

  52. [52]

    SPFresh: Incremental in- place update for billion-scale vector search

    Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, et al. SPFresh: Incremental in- place update for billion-scale vector search. InProc. of ACM SOSP, 2023

  53. [53]

    New Generation Entropy coders

    Yann Collet. New Generation Entropy coders. https: //github.com/Cyan4973/FiniteStateEntropy, 2019

  54. [54]

    Finesse: Fine-grained fea- ture locality based fast resemblance detection for post- deduplication delta compression

    Yucheng Zhang, Wen Xia, Dan Feng, Hong Jiang, Yu Hua, and Qiang Wang. Finesse: Fine-grained fea- ture locality based fast resemblance detection for post- deduplication delta compression. InProc. of USENIX FAST, 2019

  55. [55]

    Fast vector query processing for large datasets beyond GPU memory with reordered pipelining

    Zili Zhang, Fangyue Liu, Gang Huang, Xuanzhe Liu, and Xin Jin. Fast vector query processing for large datasets beyond GPU memory with reordered pipelining. InProc. of USENIX NSDI, 2024. 15