LSM-VEC: A Large-Scale Disk-Based System for Dynamic Vector Search

Shurui Zhong, Dingheng Mo, Siqiang Luo · 2025 · cs.DB · arXiv 2505.17152

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

Vector search underpins modern AI applications by supporting approximate nearest neighbor (ANN) queries over high-dimensional embeddings in tasks like retrieval-augmented generation (RAG), recommendation systems, and multimodal search. Traditional ANN search indices (e.g., HNSW) are limited by memory constraints at large data scale. Disk-based indices such as DiskANN reduce memory overhead but rely on offline graph construction, resulting in costly and inefficient vector updates. The state-of-the-art clustering-based approach SPFresh offers better scalability but suffers from reduced recall due to coarse partitioning. Moreover, SPFresh employs in-place updates to maintain its index structure, limiting its efficiency in handling high-throughput insertions and deletions under dynamic workloads. This paper presents LSM-VEC, a disk-based dynamic vector index that integrates hierarchical graph indexing with LSM-tree storage. By distributing the proximity graph across multiple LSM-tree levels, LSM-VEC supports out-of-place vector updates. It enhances search efficiency via a sampling-based probabilistic search strategy with adaptive neighbor selection, and connectivity-aware graph reordering further reduces I/O without requiring global reconstruction. Experiments on billion-scale datasets demonstrate that LSM-VEC consistently outperforms existing disk-based ANN systems. It achieves higher recall, lower query and update latency, and reduces memory footprint by over 66.2%, making it well-suited for real-world large-scale vector search with dynamic updates.

representative citing papers

CLIP: Lightweight Cosine-Law-Based Inverted-List Pruning for IVF-Based Vector Search

cs.DB · 2026-06-29 · unverdicted · novelty 6.0

CLIP proposes a cosine-law-based pruning method for IVF vector search enabling O(1) cluster and log-time vector pruning with guarantees, plus variants for hierarchical and dynamic settings, showing up to 78% pruning and 69% efficiency gains.

Slipstream: Locality-Aware Graph Index Construction for Streaming Approximate Nearest Neighbor Search

cs.IR · 2026-06-02 · unverdicted · novelty 6.0

Slipstream exploits continuity in vector streams to reduce insertion costs in graph ANNS indexes via prior-insertion candidates and an adaptive controller, delivering up to 30.8x higher throughput at >=0.95 recall@10 on five datasets.

Opal: Private Memory for Personal AI

cs.CR · 2026-04-02 · unverdicted · novelty 6.0

Opal enables private long-term memory for personal AI by decoupling reasoning to a trusted enclave with a lightweight knowledge graph and piggybacking reindexing on ORAM accesses.

ACRONYM: Accelerated Approximate Nearest Neighbor Search in Memory for Dynamic Vector Databases

cs.AR · 2026-06-02 · unverdicted · novelty 5.0

ACRONYM claims a CAM-accelerated platform for dynamic vector databases that delivers over 90% recall at 8 million queries per second using 32 MB memory and 2.56 uJ per query while supporting updates without stalling.

citing papers explorer

Showing 4 of 4 citing papers.

CLIP: Lightweight Cosine-Law-Based Inverted-List Pruning for IVF-Based Vector Search cs.DB · 2026-06-29 · unverdicted · none · ref 58 · internal anchor
CLIP proposes a cosine-law-based pruning method for IVF vector search enabling O(1) cluster and log-time vector pruning with guarantees, plus variants for hierarchical and dynamic settings, showing up to 78% pruning and 69% efficiency gains.
Slipstream: Locality-Aware Graph Index Construction for Streaming Approximate Nearest Neighbor Search cs.IR · 2026-06-02 · unverdicted · none · ref 79 · internal anchor
Slipstream exploits continuity in vector streams to reduce insertion costs in graph ANNS indexes via prior-insertion candidates and an adaptive controller, delivering up to 30.8x higher throughput at >=0.95 recall@10 on five datasets.
Opal: Private Memory for Personal AI cs.CR · 2026-04-02 · unverdicted · none · ref 281 · internal anchor
Opal enables private long-term memory for personal AI by decoupling reasoning to a trusted enclave with a lightweight knowledge graph and piggybacking reindexing on ORAM accesses.
ACRONYM: Accelerated Approximate Nearest Neighbor Search in Memory for Dynamic Vector Databases cs.AR · 2026-06-02 · unverdicted · none · ref 60 · internal anchor
ACRONYM claims a CAM-accelerated platform for dynamic vector databases that delivers over 90% recall at 8 million queries per second using 32 MB memory and 2.56 uJ per query while supporting updates without stalling.

LSM-VEC: A Large-Scale Disk-Based System for Dynamic Vector Search

fields

years

verdicts

representative citing papers

citing papers explorer