CLIP proposes a cosine-law-based pruning method for IVF vector search enabling O(1) cluster and log-time vector pruning with guarantees, plus variants for hierarchical and dynamic settings, showing up to 78% pruning and 69% efficiency gains.
LSM-VEC: A Large-Scale Disk-Based System for Dynamic Vector Search
4 Pith papers cite this work. Polarity classification is still indexing.
abstract
Vector search underpins modern AI applications by supporting approximate nearest neighbor (ANN) queries over high-dimensional embeddings in tasks like retrieval-augmented generation (RAG), recommendation systems, and multimodal search. Traditional ANN search indices (e.g., HNSW) are limited by memory constraints at large data scale. Disk-based indices such as DiskANN reduce memory overhead but rely on offline graph construction, resulting in costly and inefficient vector updates. The state-of-the-art clustering-based approach SPFresh offers better scalability but suffers from reduced recall due to coarse partitioning. Moreover, SPFresh employs in-place updates to maintain its index structure, limiting its efficiency in handling high-throughput insertions and deletions under dynamic workloads. This paper presents LSM-VEC, a disk-based dynamic vector index that integrates hierarchical graph indexing with LSM-tree storage. By distributing the proximity graph across multiple LSM-tree levels, LSM-VEC supports out-of-place vector updates. It enhances search efficiency via a sampling-based probabilistic search strategy with adaptive neighbor selection, and connectivity-aware graph reordering further reduces I/O without requiring global reconstruction. Experiments on billion-scale datasets demonstrate that LSM-VEC consistently outperforms existing disk-based ANN systems. It achieves higher recall, lower query and update latency, and reduces memory footprint by over 66.2%, making it well-suited for real-world large-scale vector search with dynamic updates.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
Slipstream exploits continuity in vector streams to reduce insertion costs in graph ANNS indexes via prior-insertion candidates and an adaptive controller, delivering up to 30.8x higher throughput at >=0.95 recall@10 on five datasets.
Opal enables private long-term memory for personal AI by decoupling reasoning to a trusted enclave with a lightweight knowledge graph and piggybacking reindexing on ORAM accesses.
ACRONYM claims a CAM-accelerated platform for dynamic vector databases that delivers over 90% recall at 8 million queries per second using 32 MB memory and 2.56 uJ per query while supporting updates without stalling.
citing papers explorer
-
CLIP: Lightweight Cosine-Law-Based Inverted-List Pruning for IVF-Based Vector Search
CLIP proposes a cosine-law-based pruning method for IVF vector search enabling O(1) cluster and log-time vector pruning with guarantees, plus variants for hierarchical and dynamic settings, showing up to 78% pruning and 69% efficiency gains.
-
Slipstream: Locality-Aware Graph Index Construction for Streaming Approximate Nearest Neighbor Search
Slipstream exploits continuity in vector streams to reduce insertion costs in graph ANNS indexes via prior-insertion candidates and an adaptive controller, delivering up to 30.8x higher throughput at >=0.95 recall@10 on five datasets.
-
Opal: Private Memory for Personal AI
Opal enables private long-term memory for personal AI by decoupling reasoning to a trusted enclave with a lightweight knowledge graph and piggybacking reindexing on ORAM accesses.
-
ACRONYM: Accelerated Approximate Nearest Neighbor Search in Memory for Dynamic Vector Databases
ACRONYM claims a CAM-accelerated platform for dynamic vector databases that delivers over 90% recall at 8 million queries per second using 32 MB memory and 2.56 uJ per query while supporting updates without stalling.