The Clustering Strikes Back: Building Cost-Effective and High-Performance ANNS at Scale with Helmsman

Baiteng Ma; Chuliang Weng; Erci Xu; Xiao Chen; Xiaocheng Zhong; Yang Shi; Yao Hu; Yiping Sun; Yuchen Huang; Zhiyong Wang

arxiv: 2606.13145 · v1 · pith:M4CB5QLMnew · submitted 2026-06-11 · 💻 cs.IR

The Clustering Strikes Back: Building Cost-Effective and High-Performance ANNS at Scale with Helmsman

Yuchen Huang , Baiteng Ma , Yiping Sun , Yang Shi , Xiao Chen , Xiaocheng Zhong , Zhiyong Wang , Yao Hu

show 2 more authors

Erci Xu Chuliang Weng

This is my paper

Pith reviewed 2026-06-27 05:53 UTC · model grok-4.3

classification 💻 cs.IR

keywords approximate nearest neighbor searchANNSclustering-based indexall-flash storageuserspace I/O stacklearned pruningGPU-accelerated constructioncost reduction

0 comments

The pith

A clustering-based ANNS on all-flash servers matches in-memory HNSW performance while cutting hardware costs by over 90 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that a production social platform can replace its memory-intensive HNSW graph index with a clustering-based approach running on flash storage. It achieves this by pairing the index with a custom userspace I/O stack, a learned pruning module that adapts to query patterns, and GPU pipelines for fast index construction. The result sustains the same low-latency and high-throughput service level agreements required for search, recommendation, and advertising. In live deployment the entire workload now runs on 40 machines instead of the previous fleet of roughly 35,000 cores and 0.35 PB of DRAM.

Core claim

HELMSMAN builds a high-performance clustering-based ANNS by layering an ANNS-oriented userspace storage stack, a leveling-learned pruning module, and GPU-accelerated construction pipelines on top of all-flash servers. This combination removes kernel I/O overhead, replaces fixed pruning with adaptive learned decisions, and accelerates index rebuilds enough to keep billion-scale indexes current. The system therefore delivers the latency and throughput previously available only from in-memory HNSW while reducing hardware requirements by more than 90 percent.

What carries the argument

The ANNS-oriented userspace storage stack together with leveling-learned pruning and GPU-accelerated construction pipelines on a clustering-based index.

If this is right

Billion-scale indexes can be rebuilt from scratch in hours rather than days.
ANNS deployments can expand data volume without a proportional rise in DRAM capacity.
Hardware spend for search, recommendation, and advertising services drops by more than 90 percent.
Clustering methods regain practicality for high-SLA workloads once I/O and pruning overheads are removed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Platforms facing similar memory growth in recommendation or retrieval systems could test whether the same three optimizations transfer to their indexes.
The cost advantage may widen further if flash bandwidth continues to improve relative to DRAM prices.
The learned pruning technique might be portable to other index families if the leveling logic generalizes beyond the current clustering structure.

Load-bearing premise

The flash-based clustering index with its userspace stack and learned pruning will deliver the same latency and throughput SLAs as the prior in-memory HNSW under real production query traffic.

What would settle it

A side-by-side replay of the production query log that shows HELMSMAN violating the existing latency or throughput targets on more than a negligible fraction of requests.

read the original abstract

RedNote (a.k.a., Xiaohongshu, a global-scale social network platform) widely adopts approximate nearest neighbor search (ANNS) to power its search, recommendation, and advertising services. Due to the demanding Service Level Agreements (SLAs), we have to rely on in-memory graph-based ANNS (i.e., HNSW) to provide high throughput and low latency. However, the ever-growing user base and content volume have led to an explosive increase in memory footprint and consequently huge CapEx and OpEx. After exploring various alternatives, we find that building a clustering-based ANNS on top of all-flash servers can be promising. Yet, we still experience severe overheads from the kernel I/O stack, a fixed pruning strategy, and slow index construction. We present HELMSMAN, a high-performance and cost-effective clustering-based ANNS system, which combines an ANNS-oriented userspace storage stack, a leveling-learned pruning module, and GPU-accelerated pipelines of construction. HELMSMAN saves over 90% of hardware costs and enables billion-scale index (re)builds within hours. In the current production deployment, operating stably for several months, 40 machines now host ANNS workloads that previously required about 35,000 cores and 0.35 PB DRAM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HELMSMAN reports a production clustering ANNS deployment at RedNote that cuts hardware by 90% with 40 machines replacing 35k cores, but the abstract supplies no latency or recall numbers to confirm SLA parity.

read the letter

The core thing to know is that this paper describes HELMSMAN, a deployed clustering-based ANNS system at RedNote that uses a userspace I/O stack, learned pruning, and GPU-accelerated construction to move off in-memory HNSW. It claims over 90% hardware cost reduction and stable operation for months, with billion-scale rebuilds in hours.

What the work does is bring those three pieces together for a real global-scale recommendation workload and report concrete machine counts from production. The engineering focus on kernel overhead and pruning is straightforward and addresses known pain points in flash-based ANNS.

The soft spot is the missing metrics. The abstract states the cost savings and stability but gives no p99 latency, QPS, recall, or error bars under actual query patterns. Without those, it is hard to tell whether the new setup truly matches the prior SLAs or whether service levels changed. The stress-test note correctly identifies this gap.

This is for engineers running large vector search in production who need data points on memory versus flash tradeoffs. It is worth a serious referee because the deployment scale is substantial and the claims are specific enough to check with additional tables and distributions.

Referee Report

1 major / 0 minor

Summary. The paper presents HELMSMAN, a clustering-based ANNS system for all-flash servers at RedNote (Xiaohongshu). It replaces in-memory HNSW with a userspace storage stack, leveling-learned pruning, and GPU-accelerated construction to address kernel I/O overhead, fixed pruning, and slow builds. The central claim is a production deployment in which 40 machines now host workloads that previously required ~35,000 cores and 0.35 PB DRAM, delivering >90% hardware cost savings and enabling billion-scale index (re)builds in hours while operating stably for several months.

Significance. If the SLA parity holds, the result demonstrates a viable, large-scale shift from DRAM to flash-based ANNS with custom systems optimizations, offering substantial CapEx/OpEx reductions for social platforms with growing data volumes. The production deployment itself constitutes a concrete, falsifiable outcome that could influence industry practice in recommendation and search infrastructure.

major comments (1)

[Abstract] Abstract: the headline production claim (40 machines replacing ~35k cores / 0.35 PB DRAM with stable multi-month operation) is presented without any reported p99 latency, QPS, recall@K, or latency distribution numbers under production query patterns and data distributions. This directly undermines verification of the weakest assumption that the userspace stack + learned pruning + GPU build preserves the original HNSW SLAs.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and the recognition of the production impact. The comment on the abstract is well-taken; we will strengthen the presentation of our claims with additional metrics while preserving the manuscript's focus on the systems contributions.

read point-by-point responses

Referee: [Abstract] Abstract: the headline production claim (40 machines replacing ~35k cores / 0.35 PB DRAM with stable multi-month operation) is presented without any reported p99 latency, QPS, recall@K, or latency distribution numbers under production query patterns and data distributions. This directly undermines verification of the weakest assumption that the userspace stack + learned pruning + GPU build preserves the original HNSW SLAs.

Authors: We agree that the abstract would benefit from explicit SLA metrics to allow readers to directly assess parity. The body of the paper (evaluation sections) already reports query throughput, recall, and latency distributions from both offline benchmarks and production traces that match the prior HNSW deployment. For the revision we will add a concise sentence to the abstract citing representative production figures (p99 latency, QPS, and recall@K) drawn from those sections, confirming that the observed values remain within the original SLAs. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical systems deployment report

full rationale

The paper is a systems description of a deployed clustering-based ANNS artifact (HELMSMAN) that replaces an in-memory HNSW deployment. It reports hardware savings from production operation but contains no equations, fitted parameters, statistical predictions, or derivation steps that could reduce to inputs by construction. No self-citations, ansatzes, or uniqueness theorems are invoked as load-bearing premises. The central claim is an empirical observation of stable operation under real workloads, which stands or falls on external SLA measurements rather than internal definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical axioms or free parameters are present; the work is an engineering systems paper whose claims rest on the unstated assumption that production query workload and data distribution match the tested conditions.

pith-pipeline@v0.9.1-grok · 5801 in / 1021 out tokens · 20724 ms · 2026-06-27T05:53:17.376546+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references

[1]

https://www.amd.com/en/products/processors/ server/epyc/9005-series.html

5th Generation AMD EPYC™ Server CPUs. https://www.amd.com/en/products/processors/ server/epyc/9005-series.html
[2]

https://americas.kioxia.com/en-us/ business/resources/performance-brief/ cm7-vector-db-r6615-performance-brief.html

Accelerating Vector Database Perfor- mance through Disk-Based Storage. https://americas.kioxia.com/en-us/ business/resources/performance-brief/ cm7-vector-db-r6615-performance-brief.html
[3]

https: //docs.amd.com/api/khub/documents/ gLSrfVtcWNt~1fzExUSiIg/content

AMD Smart Data Cache Injection. https: //docs.amd.com/api/khub/documents/ gLSrfVtcWNt~1fzExUSiIg/content
[4]

AWS EC2 Pricing.https://aws-pricing.com/
[5]

Explainer piece; positions Xiaohongshu as a lifestyle social commerce platform with 300M+ monthly active users

Everything you need to know about xiaohong- shu. Explainer piece; positions Xiaohongshu as a lifestyle social commerce platform with 300M+ monthly active users. https://restofworld.org/ 2025/rednote-xiaohongshu-what-to-know/

2025
[6]

https: //www.intel.com/content/www/us/en/io/ data-direct-i-o-technology.html

Intel Data Direct I/O Technology. https: //www.intel.com/content/www/us/en/io/ data-direct-i-o-technology.html
[7]

https://github

RAPIDS RAFT: Reusable Accelerated Functions and Tools for Vector Search and More. https://github. com/rapidsai/raft
[8]

https://rednotes.co/

RedNote (Xiaohongshu Inc). https://rednotes.co/
[9]

https://opensearch.org/blog/ reduce-cost-with-disk-based-vector-search/

Reduce costs with disk-based vector search. https://opensearch.org/blog/ reduce-cost-with-disk-based-vector-search/
[10]

https: //semiconductor.samsung.com/dram/ddr/ddr5/

Samsung DDR5 Data-centric DRAM Memory. https: //semiconductor.samsung.com/dram/ddr/ddr5/
[11]

https://semiconductor.samsung.com/ssd/ datacenter-ssd/pm9a3/

Samsung PCIe-Gen4.0 PM9A3 Data-centric NVMe SSD. https://semiconductor.samsung.com/ssd/ datacenter-ssd/pm9a3/
[12]

https://semiconductor.samsung.com/ssd/ datacenter-ssd/pm9d3a/

Samsung PCIe-Gen5.0 PM9D3A Data-centric NVMe SSD. https://semiconductor.samsung.com/ssd/ datacenter-ssd/pm9d3a/
[13]

SIFT dataset.http://corpus-texmex.irisa.fr/
[14]

https: //github.com/spdk

Storage Performance Development Kit (SPDK). https: //github.com/spdk
[15]

Re- CANet: A Repeat Consumption-Aware Neural Network for Next Basket Recommendation in Grocery Shopping

Mozhdeh Ariannezhad, Sami Jullien, Ming Li, Min Fang, Sebastian Schelter, and Maarten de Rijke. Re- CANet: A Repeat Consumption-Aware Neural Network for Next Basket Recommendation in Grocery Shopping. InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval, SIGIR, 2022

2022
[16]

Memory Hierarchy for Web Search

Grant Ayers, Jung Ho Ahn, Christos Kozyrakis, and Parthasarathy Ranganathan. Memory Hierarchy for Web Search. In2018 IEEE International Symposium on High Performance Computer Architecture, HPCA, 2018

2018
[17]

SPANN: Highly-efficient Billion-scale Approxi- mate Nearest Neighbor Search

Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. SPANN: Highly-efficient Billion-scale Approxi- mate Nearest Neighbor Search. InAdvances in Neural Information Processing Systems 34, NeurIPS, 2021

2021
[18]

Struc- tured storage for ubiquitous operating systems.SCIEN- TIA SINICA Informationis, 54, 2024

Xiaopeng Fan, Song Yan, and Chuliang Weng. Struc- tured storage for ubiquitous operating systems.SCIEN- TIA SINICA Informationis, 54, 2024

2024
[19]

Friedman

Jerome H. Friedman. Greedy Function Approximation: A Gradient Boosting Machine.Annals of Statistics, 29(5), 2001

2001
[20]

Fast Approximate Nearest Neighbor Search With Navi- gating Spreading-out Graphs.Proceedings of the VLDB Endowment, 12, 2019

Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. Fast Approximate Nearest Neighbor Search With Navi- gating Spreading-out Graphs.Proceedings of the VLDB Endowment, 12, 2019

2019
[21]

Complement Lex- ical Retrieval Model with Semantic Residual Embed- dings

Luyu Gao, Zhuyun Dai, Tongfei Chen, Zhen Fan, Ben- jamin Van Durme, and Jamie Callan. Complement Lex- ical Retrieval Model with Semantic Residual Embed- dings. InAdvances in Information Retrieval (Proceed- ings of ECIR 2021), ECIR, 2021

2021
[22]

Achieving Low-Latency Graph-Based Vector Search via Aligning Best-First Search Algorithm with SSD

Hao Guo and Youyou Lu. Achieving Low-Latency Graph-Based Vector Search via Aligning Best-First Search Algorithm with SSD. InProceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2025

2025
[23]

OdinANN: Direct Insert for Consistently Stable Performance in Billion-Scale Graph- Based Vector Search

Hao Guo and Youyou Lu. OdinANN: Direct Insert for Consistently Stable Performance in Billion-Scale Graph- Based Vector Search. In24th USENIX Conference on File and Storage Technologies, FAST, 2026

2026
[24]

Manu: a cloud native vector database management system.Proceedings of the VLDB Endowment, 15, 2022

Rentong Guo, Xiaofan Luan, Long Xiang, Xiao Yan, Xi- aomeng Yi, Jigao Luo, Qianya Cheng, Weizhi Xu, Jiarui Luo, Frank Liu, Zhenshan Cao, Yanliang Qiao, Ting Wang, Bo Tang, and Charles Xie. Manu: a cloud native vector database management system.Proceedings of the VLDB Endowment, 15, 2022

2022
[25]

What Modern NVMe Storage Can Do, And How To Exploit It: High- Performance I/O for High-Performance Storage Engines

Gabriel Haas and Viktor Leis. What Modern NVMe Storage Can Do, And How To Exploit It: High- Performance I/O for High-Performance Storage Engines. Proceedings of the VLDB Endowment, 2023

2023
[26]

Neos: A NVMe-GPUs Direct Vector Service Buffer in User Space

Yuchen Huang, Xiaopeng Fan, Song Yan, and Chuliang Weng. Neos: A NVMe-GPUs Direct Vector Service Buffer in User Space. InProceedings of the 40th IEEE International Conference on Data Engineering, ICDE, 2024

2024
[27]

Don’t Surrender to Low QPS/$: Fast and Cost- Efficient ANNS with TridentANN

Yuchen Huang, Baiteng Ma, Erci Xu, and Chuliang Weng. Don’t Surrender to Low QPS/$: Fast and Cost- Efficient ANNS with TridentANN. InProceedings of the 53rd Annual International Symposium on Computer Architecture, ISCA, 2026

2026
[28]

High-Throughput, Cost-Effective Billion- Scale Vector Search with a Single GPU

Haodi Jiang, Hao Guo, Minhui Xie, Jiwu Shu, and Youyou Lu. High-Throughput, Cost-Effective Billion- Scale Vector Search with a Single GPU. InProceedings of the 2026 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2026

2026
[29]

RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving

Wenqi Jiang, Suvinay Subramanian, Cat Graves, Gus- tavo Alonso, Amir Yazdanbakhsh, and Vidushi Dadu. RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving. InProceed- ings of the 52nd Annual International Symposium on Computer Architecture, ISCA, 2025

2025
[30]

Billion- Scale Similarity Search with GPUs.IEEE Transactions on Big Data, 7, 2021

Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion- Scale Similarity Search with GPUs.IEEE Transactions on Big Data, 7, 2021

2021
[31]

We ain’t afraid of no file frag- mentation: causes and prevention of its performance impact on modern flash SSDs

Yuhun Jun, Shinhyun Park, Jeong-Uk Kang, Sang-Hoon Kim, and Euiseong Seo. We ain’t afraid of no file frag- mentation: causes and prevention of its performance impact on modern flash SSDs. InProceedings of the 22nd USENIX Conference on File and Storage Tech- nologies, FAST, 2024

2024
[32]

Light- GBM: A Highly Efficient Gradient Boosting Decision Tree

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Light- GBM: A Highly Efficient Gradient Boosting Decision Tree. InAdvances in Neural Information Processing Systems 30, NeurIPS, 2017

2017
[33]

Serenade - Low-Latency Session-Based Rec- ommendation in e-Commerce at Scale

Barrie Kersbergen, Olivier Sprangers, and Sebastian Schelter. Serenade - Low-Latency Session-Based Rec- ommendation in e-Commerce at Scale. InProceedings of the 2022 International Conference on Management of Data, SIGMOD, 2022

2022
[34]

High-Performance Query Pro- cessing with NVMe Arrays: Spilling without Killing Performance

Maximilian Kuschewski, Jana Giceva, Thomas Neu- mann, and Viktor Leis. High-Performance Query Pro- cessing with NVMe Arrays: Spilling without Killing Performance. InProceedings of the 2025 ACM SIG- MOD International Conference on Management of Data, SIGMOD, 2025

2025
[35]

KVell: The Design and Implementation of a Fast Persistent Key-Value Store

Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. KVell: The Design and Implementation of a Fast Persistent Key-Value Store. InProceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP, 2019

2019
[36]

Andersen, and Yuxiong He

Conglong Li, Minjia Zhang, David G. Andersen, and Yuxiong He. Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2020

2020
[37]

Embedding- based Product Retrieval in Taobao Search

Sen Li, Fuyu Lv, Taiwei Jin, Guli Lin, Keping Yang, Xi- aoyi Zeng, Xiao-Ming Wu, and Qianli Ma. Embedding- based Product Retrieval in Taobao Search. InProceed- ings of the 27th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining, KDD, 2021

2021
[38]

ANSMET: Approximate Nearest Neigh- bor Search with Near-Memory Processing and Hybrid Early Termination

Yiwei Li, Yuxin Jin, Boyu Tian, Huanchen Zhang, and Mingyu Gao. ANSMET: Approximate Nearest Neigh- bor Search with Near-Memory Processing and Hybrid Early Termination. InProceedings of the 52nd An- nual International Symposium on Computer Architec- ture, ISCA, 2025

2025
[39]

HVS: Hierarchical Graph Structure Based on V oronoi Diagrams for Solving Approximate Nearest Neighbor Search.Proceedings of the VLDB Endowment, 15, 2021

Kejing Lu, Mineichi Kudo, Chuan Xiao, and Yoshiharu Ishikawa. HVS: Hierarchical Graph Structure Based on V oronoi Diagrams for Solving Approximate Nearest Neighbor Search.Proceedings of the VLDB Endowment, 15, 2021

2021
[40]

Malkov and D

Yu A. Malkov and D. A. Yashunin. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchi- cal Navigable Small World Graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2020

2020
[41]

Blel- loch, Laxman Dhulipala, Yan Gu, Harsha Vardhan Simhadri, and Yihan Sun

Magdalen Dobson Manohar, Zheqi Shen, Guy E. Blel- loch, Laxman Dhulipala, Yan Gu, Harsha Vardhan Simhadri, and Yihan Sun. ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Near- est Neighbor Search Algorithms. InProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP, 2024

2024
[42]

Quick- Update: a Real-Time Personalization System for Large- Scale Recommendation Models

Kiran Kumar Matam, Hani Ramezani, Fan Wang, Zeliang Chen, Yue Dong, Maomao Ding, Zhiwei Zhao, Zhengyu Zhang, Ellie Wen, and Assaf Eisenman. Quick- Update: a Real-Time Personalization System for Large- Scale Recommendation Models. InProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2024

2024
[43]

Ilyas, Theodoros Rekatsinas, and Shivaram Venkataraman

Jason Mohoney, Devesh Sarda, Mengze Tang, Shi- habur Rahman Chowdhury, Anil Pacaci, Ihab F. Ilyas, Theodoros Rekatsinas, and Shivaram Venkataraman. Quake: Adaptive Indexing for Vector Search. In19th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2025

2025
[44]

Diab, and Virginia Smith

Aashiq Muhamed, Mona T. Diab, and Virginia Smith. CoRAG: Collaborative Retrieval-Augmented Genera- tion. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technolo- gies (Volume 2: Short Papers), NAACL, 2025

2025
[45]

A Stacking-based Efficient Method for Toxic Language Detection on Live Streaming Chat

Yuto Oikawa, Yuki Nakayama, and Koji Murakami. A Stacking-based Efficient Method for Toxic Language Detection on Live Streaming Chat. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP, 2022

2022
[46]

CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs

Hiroyuki Ootomo, Akira Naruse, Corey Nolet, Ray Wang, Tamas Feher, and Yong Wang. CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs. InProceedings of the 40th IEEE International Conference on Data Engineering, ICDE, 2024

2024
[47]

Ekko: A Large- Scale Deep Learning Recommender System with Low- Latency Model Update

Chijun Sima, Yao Fu, Man-Kit Sit, Liyi Guo, Xuri Gong, Feng Lin, Junyu Wu, Yongsheng Li, Haidong Rong, Pierre-Louis Aublin, and Luo Mai. Ekko: A Large- Scale Deep Learning Recommender System with Low- Latency Model Update. InProceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2022

2022
[48]

Results of the NeurIPS’21 Challenge on Billion-Scale Approximate Nearest Neigh- bor Search

Harsha Vardhan Simhadri, George Williams, Martin Aumüller, Matthijs Douze, Artem Babenko, Dmitry Baranchuk, Qi Chen, Lucas Hosseini, Ravishankar Kr- ishnaswamy, Gopal Srinivasa, Suhas Jayaram Subra- manya, and Jingdong Wang. Results of the NeurIPS’21 Challenge on Billion-Scale Approximate Nearest Neigh- bor Search. InProceedings of the NeurIPS 2021 Com- p...

2021
[49]

DiskANN: Fast Accurate Billion-point Near- est Neighbor Search on a Single Node

Suhas Jayaram Subramanya, Devvrit, Harsha Vard- han Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. DiskANN: Fast Accurate Billion-point Near- est Neighbor Search on a Single Node. InAdvances in Neural Information Processing Systems 32, NeurIPS, 2019

2019
[50]

A Real-Time Adaptive Multi-Stream GPU System For Online Ap- proximate Nearest Neighborhood Search

Yiping Sun, Yang Shi, and Jiaolong Du. A Real-Time Adaptive Multi-Stream GPU System For Online Ap- proximate Nearest Neighborhood Search. InProceed- ings of the 33rd ACM International Conference on In- formation and Knowledge Management, CIKM, 2024

2024
[51]

FusionANNS: An Efficient CPU/GPU Cooperative Processing Architecture for Billion-scale Approximate Nearest Neighbor Search

Bing Tian, Haikun Liu, Yuhang Tang, Shihai Xiao, Zhuo- hui Duan, Xiaofei Liao, Xuecang Zhang, Junhua Zhu, and Yu Zhang. FusionANNS: An Efficient CPU/GPU Cooperative Processing Architecture for Billion-scale Approximate Nearest Neighbor Search. In23rd USENIX Conference on File and Storage Technologies, FAST, 2025

2025
[52]

Toussaint

Godfried T. Toussaint. The Relative Neighborhood Graph of a Finite Planar Set.Pattern Recognition, 12, 1980

1980
[53]

Explainable Rec- ommendation for Repeat Consumption

Kosetsu Tsukuda and Masataka Goto. Explainable Rec- ommendation for Repeat Consumption. InProceedings of the 14th ACM Conference on Recommender Systems, RecSys, 2020

2020
[54]

Modeling Item-Specific Temporal Dynamics of Repeat Consumption for Recommender Systems

Chenyang Wang, Min Zhang, Weizhi Ma, Yiqun Liu, and Shaoping Ma. Modeling Item-Specific Temporal Dynamics of Repeat Consumption for Recommender Systems. InProceedings of the World Wide Web Con- ference, WWW, 2019

2019
[55]

Milvus: A Purpose-Built Vector Data Management System

Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xiangyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, Kun Yu, Yuxing Yuan, Yinghao Zou, Jiquan Long, Yudong Cai, Zhenxiang Li, Zhifeng Zhang, Yihua Mo, Jun Gu, Ruiyi Jiang, Yi Wei, and Charles Xie. Milvus: A Purpose-Built Vector Data Management System. InProceedings of the 2021 ACM SI...

2021
[56]

Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment

Mengzhao Wang, Weizhi Xu, Xiaomeng Yi, Songlin Wu, Zhangyang Peng, Xiangyu Ke, Yunjun Gao, Xiao- liang Xu, Rentong Guo, and Charles Xie. Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment. InProceedings of the 2024 ACM SIGMOD International Conference on Management of Data, SIG- MOD, 2024

2024
[57]

FlashANNS: GPU-Driven Asynchronous I/O Pipelining for Eliminating Storage-Compute Bottle- necks in Billion-Scale Similarity Search

Yang Xiao, Mo Sun, Ziyu Song, Bing Tian, Jie Sun, Jie Zhang, Zeke Wang, Zonghui Wang, Wenzhi Chen, and Fei Wu. FlashANNS: GPU-Driven Asynchronous I/O Pipelining for Eliminating Storage-Compute Bottle- necks in Billion-Scale Similarity Search. InProceedings of the 2026 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2026

2026
[58]

SPFresh: Incremen- tal In-Place Update for Billion-Scale Vector Search

Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, Peng Cheng, and Mao Yang. SPFresh: Incremen- tal In-Place Update for Billion-Scale Vector Search. In Proceedings of the 29th Symposium on Operating Sys- tems Principles, SOSP, 2023

2023
[59]

Agile and Accurate CTR Prediction Model Training for Massive-Scale On- line Advertising Systems

Zhiqiang Xu, Dong Li, Weijie Zhao, Xing Shen, Tianbo Huang, Xiaoyun Li, and Ping Li. Agile and Accurate CTR Prediction Model Training for Massive-Scale On- line Advertising Systems. InProceedings of the 2021 International Conference on Management of Data, SIG- MOD, 2021

2021
[60]

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

Shuo Yang, Haocheng Xi, Yilong Zhao, Muyang Li, Xiaoze Fan, Jintao Zhang, Han Cai, Yujun Lin, Xiuyu Li, Kurt Keutzer, Song Han, Chenfeng Xu, and Ion Sto- ica. Flash-KMeans: Fast and Memory-Efficient Exact K-Means. InarXiv, 2026

2026
[61]

Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Ajit Kumthekar, Zhe Zhao, Li Wei, and Ed H. Chi. Sampling-Bias-Corrected Neu- ral Modeling for Large Corpus Item Recommendations. InProceedings of the 13th ACM Conference on Recom- mender Systems, RecSys, 2019

2019
[62]

KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question An- swering

Donghan Yu, Chenguang Zhu, Yuwei Fang, Wenhao Yu, Shuohang Wang, Yichong Xu, Xiang Ren, Yiming Yang, and Michael Zeng. KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question An- swering. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL, 2022

2022
[63]

VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity

Qianxi Zhang, Shuotao Xu, Qi Chen, Guoxin Sui, Ji- adong Xie, Zhizhen Cai, Yaoqi Chen, Yinxuan He, Yuqing Yang, Fan Yang, Mao Yang, and Lidong Zhou. VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity. InPro- ceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2023

2023
[64]

Fast, Approximate Vector Queries on Very Large Unstructured Datasets

Zili Zhang, Chao Jin, Linpeng Tang, Xuanzhe Liu, and Xin Jin. Fast, Approximate Vector Queries on Very Large Unstructured Datasets. In20th USENIX Sympo- sium on Networked Systems Design and Implementation, NSDI, 2023

2023

[1] [1]

https://www.amd.com/en/products/processors/ server/epyc/9005-series.html

5th Generation AMD EPYC™ Server CPUs. https://www.amd.com/en/products/processors/ server/epyc/9005-series.html

[2] [2]

https://americas.kioxia.com/en-us/ business/resources/performance-brief/ cm7-vector-db-r6615-performance-brief.html

Accelerating Vector Database Perfor- mance through Disk-Based Storage. https://americas.kioxia.com/en-us/ business/resources/performance-brief/ cm7-vector-db-r6615-performance-brief.html

[3] [3]

https: //docs.amd.com/api/khub/documents/ gLSrfVtcWNt~1fzExUSiIg/content

AMD Smart Data Cache Injection. https: //docs.amd.com/api/khub/documents/ gLSrfVtcWNt~1fzExUSiIg/content

[4] [4]

AWS EC2 Pricing.https://aws-pricing.com/

[5] [5]

Explainer piece; positions Xiaohongshu as a lifestyle social commerce platform with 300M+ monthly active users

Everything you need to know about xiaohong- shu. Explainer piece; positions Xiaohongshu as a lifestyle social commerce platform with 300M+ monthly active users. https://restofworld.org/ 2025/rednote-xiaohongshu-what-to-know/

2025

[6] [6]

https: //www.intel.com/content/www/us/en/io/ data-direct-i-o-technology.html

Intel Data Direct I/O Technology. https: //www.intel.com/content/www/us/en/io/ data-direct-i-o-technology.html

[7] [7]

https://github

RAPIDS RAFT: Reusable Accelerated Functions and Tools for Vector Search and More. https://github. com/rapidsai/raft

[8] [8]

https://rednotes.co/

RedNote (Xiaohongshu Inc). https://rednotes.co/

[9] [9]

https://opensearch.org/blog/ reduce-cost-with-disk-based-vector-search/

Reduce costs with disk-based vector search. https://opensearch.org/blog/ reduce-cost-with-disk-based-vector-search/

[10] [10]

https: //semiconductor.samsung.com/dram/ddr/ddr5/

Samsung DDR5 Data-centric DRAM Memory. https: //semiconductor.samsung.com/dram/ddr/ddr5/

[11] [11]

https://semiconductor.samsung.com/ssd/ datacenter-ssd/pm9a3/

Samsung PCIe-Gen4.0 PM9A3 Data-centric NVMe SSD. https://semiconductor.samsung.com/ssd/ datacenter-ssd/pm9a3/

[12] [12]

https://semiconductor.samsung.com/ssd/ datacenter-ssd/pm9d3a/

Samsung PCIe-Gen5.0 PM9D3A Data-centric NVMe SSD. https://semiconductor.samsung.com/ssd/ datacenter-ssd/pm9d3a/

[13] [13]

SIFT dataset.http://corpus-texmex.irisa.fr/

[14] [14]

https: //github.com/spdk

Storage Performance Development Kit (SPDK). https: //github.com/spdk

[15] [15]

Re- CANet: A Repeat Consumption-Aware Neural Network for Next Basket Recommendation in Grocery Shopping

Mozhdeh Ariannezhad, Sami Jullien, Ming Li, Min Fang, Sebastian Schelter, and Maarten de Rijke. Re- CANet: A Repeat Consumption-Aware Neural Network for Next Basket Recommendation in Grocery Shopping. InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval, SIGIR, 2022

2022

[16] [16]

Memory Hierarchy for Web Search

Grant Ayers, Jung Ho Ahn, Christos Kozyrakis, and Parthasarathy Ranganathan. Memory Hierarchy for Web Search. In2018 IEEE International Symposium on High Performance Computer Architecture, HPCA, 2018

2018

[17] [17]

SPANN: Highly-efficient Billion-scale Approxi- mate Nearest Neighbor Search

Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. SPANN: Highly-efficient Billion-scale Approxi- mate Nearest Neighbor Search. InAdvances in Neural Information Processing Systems 34, NeurIPS, 2021

2021

[18] [18]

Struc- tured storage for ubiquitous operating systems.SCIEN- TIA SINICA Informationis, 54, 2024

Xiaopeng Fan, Song Yan, and Chuliang Weng. Struc- tured storage for ubiquitous operating systems.SCIEN- TIA SINICA Informationis, 54, 2024

2024

[19] [19]

Friedman

Jerome H. Friedman. Greedy Function Approximation: A Gradient Boosting Machine.Annals of Statistics, 29(5), 2001

2001

[20] [20]

Fast Approximate Nearest Neighbor Search With Navi- gating Spreading-out Graphs.Proceedings of the VLDB Endowment, 12, 2019

Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. Fast Approximate Nearest Neighbor Search With Navi- gating Spreading-out Graphs.Proceedings of the VLDB Endowment, 12, 2019

2019

[21] [21]

Complement Lex- ical Retrieval Model with Semantic Residual Embed- dings

Luyu Gao, Zhuyun Dai, Tongfei Chen, Zhen Fan, Ben- jamin Van Durme, and Jamie Callan. Complement Lex- ical Retrieval Model with Semantic Residual Embed- dings. InAdvances in Information Retrieval (Proceed- ings of ECIR 2021), ECIR, 2021

2021

[22] [22]

Achieving Low-Latency Graph-Based Vector Search via Aligning Best-First Search Algorithm with SSD

Hao Guo and Youyou Lu. Achieving Low-Latency Graph-Based Vector Search via Aligning Best-First Search Algorithm with SSD. InProceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2025

2025

[23] [23]

OdinANN: Direct Insert for Consistently Stable Performance in Billion-Scale Graph- Based Vector Search

Hao Guo and Youyou Lu. OdinANN: Direct Insert for Consistently Stable Performance in Billion-Scale Graph- Based Vector Search. In24th USENIX Conference on File and Storage Technologies, FAST, 2026

2026

[24] [24]

Manu: a cloud native vector database management system.Proceedings of the VLDB Endowment, 15, 2022

Rentong Guo, Xiaofan Luan, Long Xiang, Xiao Yan, Xi- aomeng Yi, Jigao Luo, Qianya Cheng, Weizhi Xu, Jiarui Luo, Frank Liu, Zhenshan Cao, Yanliang Qiao, Ting Wang, Bo Tang, and Charles Xie. Manu: a cloud native vector database management system.Proceedings of the VLDB Endowment, 15, 2022

2022

[25] [25]

What Modern NVMe Storage Can Do, And How To Exploit It: High- Performance I/O for High-Performance Storage Engines

Gabriel Haas and Viktor Leis. What Modern NVMe Storage Can Do, And How To Exploit It: High- Performance I/O for High-Performance Storage Engines. Proceedings of the VLDB Endowment, 2023

2023

[26] [26]

Neos: A NVMe-GPUs Direct Vector Service Buffer in User Space

Yuchen Huang, Xiaopeng Fan, Song Yan, and Chuliang Weng. Neos: A NVMe-GPUs Direct Vector Service Buffer in User Space. InProceedings of the 40th IEEE International Conference on Data Engineering, ICDE, 2024

2024

[27] [27]

Don’t Surrender to Low QPS/$: Fast and Cost- Efficient ANNS with TridentANN

Yuchen Huang, Baiteng Ma, Erci Xu, and Chuliang Weng. Don’t Surrender to Low QPS/$: Fast and Cost- Efficient ANNS with TridentANN. InProceedings of the 53rd Annual International Symposium on Computer Architecture, ISCA, 2026

2026

[28] [28]

High-Throughput, Cost-Effective Billion- Scale Vector Search with a Single GPU

Haodi Jiang, Hao Guo, Minhui Xie, Jiwu Shu, and Youyou Lu. High-Throughput, Cost-Effective Billion- Scale Vector Search with a Single GPU. InProceedings of the 2026 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2026

2026

[29] [29]

RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving

Wenqi Jiang, Suvinay Subramanian, Cat Graves, Gus- tavo Alonso, Amir Yazdanbakhsh, and Vidushi Dadu. RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving. InProceed- ings of the 52nd Annual International Symposium on Computer Architecture, ISCA, 2025

2025

[30] [30]

Billion- Scale Similarity Search with GPUs.IEEE Transactions on Big Data, 7, 2021

Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion- Scale Similarity Search with GPUs.IEEE Transactions on Big Data, 7, 2021

2021

[31] [31]

We ain’t afraid of no file frag- mentation: causes and prevention of its performance impact on modern flash SSDs

Yuhun Jun, Shinhyun Park, Jeong-Uk Kang, Sang-Hoon Kim, and Euiseong Seo. We ain’t afraid of no file frag- mentation: causes and prevention of its performance impact on modern flash SSDs. InProceedings of the 22nd USENIX Conference on File and Storage Tech- nologies, FAST, 2024

2024

[32] [32]

Light- GBM: A Highly Efficient Gradient Boosting Decision Tree

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Light- GBM: A Highly Efficient Gradient Boosting Decision Tree. InAdvances in Neural Information Processing Systems 30, NeurIPS, 2017

2017

[33] [33]

Serenade - Low-Latency Session-Based Rec- ommendation in e-Commerce at Scale

Barrie Kersbergen, Olivier Sprangers, and Sebastian Schelter. Serenade - Low-Latency Session-Based Rec- ommendation in e-Commerce at Scale. InProceedings of the 2022 International Conference on Management of Data, SIGMOD, 2022

2022

[34] [34]

High-Performance Query Pro- cessing with NVMe Arrays: Spilling without Killing Performance

Maximilian Kuschewski, Jana Giceva, Thomas Neu- mann, and Viktor Leis. High-Performance Query Pro- cessing with NVMe Arrays: Spilling without Killing Performance. InProceedings of the 2025 ACM SIG- MOD International Conference on Management of Data, SIGMOD, 2025

2025

[35] [35]

KVell: The Design and Implementation of a Fast Persistent Key-Value Store

Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. KVell: The Design and Implementation of a Fast Persistent Key-Value Store. InProceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP, 2019

2019

[36] [36]

Andersen, and Yuxiong He

Conglong Li, Minjia Zhang, David G. Andersen, and Yuxiong He. Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2020

2020

[37] [37]

Embedding- based Product Retrieval in Taobao Search

Sen Li, Fuyu Lv, Taiwei Jin, Guli Lin, Keping Yang, Xi- aoyi Zeng, Xiao-Ming Wu, and Qianli Ma. Embedding- based Product Retrieval in Taobao Search. InProceed- ings of the 27th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining, KDD, 2021

2021

[38] [38]

ANSMET: Approximate Nearest Neigh- bor Search with Near-Memory Processing and Hybrid Early Termination

Yiwei Li, Yuxin Jin, Boyu Tian, Huanchen Zhang, and Mingyu Gao. ANSMET: Approximate Nearest Neigh- bor Search with Near-Memory Processing and Hybrid Early Termination. InProceedings of the 52nd An- nual International Symposium on Computer Architec- ture, ISCA, 2025

2025

[39] [39]

HVS: Hierarchical Graph Structure Based on V oronoi Diagrams for Solving Approximate Nearest Neighbor Search.Proceedings of the VLDB Endowment, 15, 2021

Kejing Lu, Mineichi Kudo, Chuan Xiao, and Yoshiharu Ishikawa. HVS: Hierarchical Graph Structure Based on V oronoi Diagrams for Solving Approximate Nearest Neighbor Search.Proceedings of the VLDB Endowment, 15, 2021

2021

[40] [40]

Malkov and D

Yu A. Malkov and D. A. Yashunin. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchi- cal Navigable Small World Graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2020

2020

[41] [41]

Blel- loch, Laxman Dhulipala, Yan Gu, Harsha Vardhan Simhadri, and Yihan Sun

Magdalen Dobson Manohar, Zheqi Shen, Guy E. Blel- loch, Laxman Dhulipala, Yan Gu, Harsha Vardhan Simhadri, and Yihan Sun. ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Near- est Neighbor Search Algorithms. InProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP, 2024

2024

[42] [42]

Quick- Update: a Real-Time Personalization System for Large- Scale Recommendation Models

Kiran Kumar Matam, Hani Ramezani, Fan Wang, Zeliang Chen, Yue Dong, Maomao Ding, Zhiwei Zhao, Zhengyu Zhang, Ellie Wen, and Assaf Eisenman. Quick- Update: a Real-Time Personalization System for Large- Scale Recommendation Models. InProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2024

2024

[43] [43]

Ilyas, Theodoros Rekatsinas, and Shivaram Venkataraman

Jason Mohoney, Devesh Sarda, Mengze Tang, Shi- habur Rahman Chowdhury, Anil Pacaci, Ihab F. Ilyas, Theodoros Rekatsinas, and Shivaram Venkataraman. Quake: Adaptive Indexing for Vector Search. In19th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2025

2025

[44] [44]

Diab, and Virginia Smith

Aashiq Muhamed, Mona T. Diab, and Virginia Smith. CoRAG: Collaborative Retrieval-Augmented Genera- tion. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technolo- gies (Volume 2: Short Papers), NAACL, 2025

2025

[45] [45]

A Stacking-based Efficient Method for Toxic Language Detection on Live Streaming Chat

Yuto Oikawa, Yuki Nakayama, and Koji Murakami. A Stacking-based Efficient Method for Toxic Language Detection on Live Streaming Chat. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP, 2022

2022

[46] [46]

CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs

Hiroyuki Ootomo, Akira Naruse, Corey Nolet, Ray Wang, Tamas Feher, and Yong Wang. CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs. InProceedings of the 40th IEEE International Conference on Data Engineering, ICDE, 2024

2024

[47] [47]

Ekko: A Large- Scale Deep Learning Recommender System with Low- Latency Model Update

Chijun Sima, Yao Fu, Man-Kit Sit, Liyi Guo, Xuri Gong, Feng Lin, Junyu Wu, Yongsheng Li, Haidong Rong, Pierre-Louis Aublin, and Luo Mai. Ekko: A Large- Scale Deep Learning Recommender System with Low- Latency Model Update. InProceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2022

2022

[48] [48]

Results of the NeurIPS’21 Challenge on Billion-Scale Approximate Nearest Neigh- bor Search

Harsha Vardhan Simhadri, George Williams, Martin Aumüller, Matthijs Douze, Artem Babenko, Dmitry Baranchuk, Qi Chen, Lucas Hosseini, Ravishankar Kr- ishnaswamy, Gopal Srinivasa, Suhas Jayaram Subra- manya, and Jingdong Wang. Results of the NeurIPS’21 Challenge on Billion-Scale Approximate Nearest Neigh- bor Search. InProceedings of the NeurIPS 2021 Com- p...

2021

[49] [49]

DiskANN: Fast Accurate Billion-point Near- est Neighbor Search on a Single Node

Suhas Jayaram Subramanya, Devvrit, Harsha Vard- han Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. DiskANN: Fast Accurate Billion-point Near- est Neighbor Search on a Single Node. InAdvances in Neural Information Processing Systems 32, NeurIPS, 2019

2019

[50] [50]

A Real-Time Adaptive Multi-Stream GPU System For Online Ap- proximate Nearest Neighborhood Search

Yiping Sun, Yang Shi, and Jiaolong Du. A Real-Time Adaptive Multi-Stream GPU System For Online Ap- proximate Nearest Neighborhood Search. InProceed- ings of the 33rd ACM International Conference on In- formation and Knowledge Management, CIKM, 2024

2024

[51] [51]

FusionANNS: An Efficient CPU/GPU Cooperative Processing Architecture for Billion-scale Approximate Nearest Neighbor Search

Bing Tian, Haikun Liu, Yuhang Tang, Shihai Xiao, Zhuo- hui Duan, Xiaofei Liao, Xuecang Zhang, Junhua Zhu, and Yu Zhang. FusionANNS: An Efficient CPU/GPU Cooperative Processing Architecture for Billion-scale Approximate Nearest Neighbor Search. In23rd USENIX Conference on File and Storage Technologies, FAST, 2025

2025

[52] [52]

Toussaint

Godfried T. Toussaint. The Relative Neighborhood Graph of a Finite Planar Set.Pattern Recognition, 12, 1980

1980

[53] [53]

Explainable Rec- ommendation for Repeat Consumption

Kosetsu Tsukuda and Masataka Goto. Explainable Rec- ommendation for Repeat Consumption. InProceedings of the 14th ACM Conference on Recommender Systems, RecSys, 2020

2020

[54] [54]

Modeling Item-Specific Temporal Dynamics of Repeat Consumption for Recommender Systems

Chenyang Wang, Min Zhang, Weizhi Ma, Yiqun Liu, and Shaoping Ma. Modeling Item-Specific Temporal Dynamics of Repeat Consumption for Recommender Systems. InProceedings of the World Wide Web Con- ference, WWW, 2019

2019

[55] [55]

Milvus: A Purpose-Built Vector Data Management System

Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xiangyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, Kun Yu, Yuxing Yuan, Yinghao Zou, Jiquan Long, Yudong Cai, Zhenxiang Li, Zhifeng Zhang, Yihua Mo, Jun Gu, Ruiyi Jiang, Yi Wei, and Charles Xie. Milvus: A Purpose-Built Vector Data Management System. InProceedings of the 2021 ACM SI...

2021

[56] [56]

Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment

Mengzhao Wang, Weizhi Xu, Xiaomeng Yi, Songlin Wu, Zhangyang Peng, Xiangyu Ke, Yunjun Gao, Xiao- liang Xu, Rentong Guo, and Charles Xie. Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment. InProceedings of the 2024 ACM SIGMOD International Conference on Management of Data, SIG- MOD, 2024

2024

[57] [57]

FlashANNS: GPU-Driven Asynchronous I/O Pipelining for Eliminating Storage-Compute Bottle- necks in Billion-Scale Similarity Search

Yang Xiao, Mo Sun, Ziyu Song, Bing Tian, Jie Sun, Jie Zhang, Zeke Wang, Zonghui Wang, Wenzhi Chen, and Fei Wu. FlashANNS: GPU-Driven Asynchronous I/O Pipelining for Eliminating Storage-Compute Bottle- necks in Billion-Scale Similarity Search. InProceedings of the 2026 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2026

2026

[58] [58]

SPFresh: Incremen- tal In-Place Update for Billion-Scale Vector Search

Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, Peng Cheng, and Mao Yang. SPFresh: Incremen- tal In-Place Update for Billion-Scale Vector Search. In Proceedings of the 29th Symposium on Operating Sys- tems Principles, SOSP, 2023

2023

[59] [59]

Agile and Accurate CTR Prediction Model Training for Massive-Scale On- line Advertising Systems

Zhiqiang Xu, Dong Li, Weijie Zhao, Xing Shen, Tianbo Huang, Xiaoyun Li, and Ping Li. Agile and Accurate CTR Prediction Model Training for Massive-Scale On- line Advertising Systems. InProceedings of the 2021 International Conference on Management of Data, SIG- MOD, 2021

2021

[60] [60]

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

Shuo Yang, Haocheng Xi, Yilong Zhao, Muyang Li, Xiaoze Fan, Jintao Zhang, Han Cai, Yujun Lin, Xiuyu Li, Kurt Keutzer, Song Han, Chenfeng Xu, and Ion Sto- ica. Flash-KMeans: Fast and Memory-Efficient Exact K-Means. InarXiv, 2026

2026

[61] [61]

Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Ajit Kumthekar, Zhe Zhao, Li Wei, and Ed H. Chi. Sampling-Bias-Corrected Neu- ral Modeling for Large Corpus Item Recommendations. InProceedings of the 13th ACM Conference on Recom- mender Systems, RecSys, 2019

2019

[62] [62]

KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question An- swering

Donghan Yu, Chenguang Zhu, Yuwei Fang, Wenhao Yu, Shuohang Wang, Yichong Xu, Xiang Ren, Yiming Yang, and Michael Zeng. KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question An- swering. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL, 2022

2022

[63] [63]

VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity

Qianxi Zhang, Shuotao Xu, Qi Chen, Guoxin Sui, Ji- adong Xie, Zhizhen Cai, Yaoqi Chen, Yinxuan He, Yuqing Yang, Fan Yang, Mao Yang, and Lidong Zhou. VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity. InPro- ceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2023

2023

[64] [64]

Fast, Approximate Vector Queries on Very Large Unstructured Datasets

Zili Zhang, Chao Jin, Linpeng Tang, Xuanzhe Liu, and Xin Jin. Fast, Approximate Vector Queries on Very Large Unstructured Datasets. In20th USENIX Sympo- sium on Networked Systems Design and Implementation, NSDI, 2023

2023