The Clustering Strikes Back: Building Cost-Effective and High-Performance ANNS at Scale with Helmsman
Pith reviewed 2026-06-27 05:53 UTC · model grok-4.3
The pith
A clustering-based ANNS on all-flash servers matches in-memory HNSW performance while cutting hardware costs by over 90 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HELMSMAN builds a high-performance clustering-based ANNS by layering an ANNS-oriented userspace storage stack, a leveling-learned pruning module, and GPU-accelerated construction pipelines on top of all-flash servers. This combination removes kernel I/O overhead, replaces fixed pruning with adaptive learned decisions, and accelerates index rebuilds enough to keep billion-scale indexes current. The system therefore delivers the latency and throughput previously available only from in-memory HNSW while reducing hardware requirements by more than 90 percent.
What carries the argument
The ANNS-oriented userspace storage stack together with leveling-learned pruning and GPU-accelerated construction pipelines on a clustering-based index.
If this is right
- Billion-scale indexes can be rebuilt from scratch in hours rather than days.
- ANNS deployments can expand data volume without a proportional rise in DRAM capacity.
- Hardware spend for search, recommendation, and advertising services drops by more than 90 percent.
- Clustering methods regain practicality for high-SLA workloads once I/O and pruning overheads are removed.
Where Pith is reading between the lines
- Platforms facing similar memory growth in recommendation or retrieval systems could test whether the same three optimizations transfer to their indexes.
- The cost advantage may widen further if flash bandwidth continues to improve relative to DRAM prices.
- The learned pruning technique might be portable to other index families if the leveling logic generalizes beyond the current clustering structure.
Load-bearing premise
The flash-based clustering index with its userspace stack and learned pruning will deliver the same latency and throughput SLAs as the prior in-memory HNSW under real production query traffic.
What would settle it
A side-by-side replay of the production query log that shows HELMSMAN violating the existing latency or throughput targets on more than a negligible fraction of requests.
read the original abstract
RedNote (a.k.a., Xiaohongshu, a global-scale social network platform) widely adopts approximate nearest neighbor search (ANNS) to power its search, recommendation, and advertising services. Due to the demanding Service Level Agreements (SLAs), we have to rely on in-memory graph-based ANNS (i.e., HNSW) to provide high throughput and low latency. However, the ever-growing user base and content volume have led to an explosive increase in memory footprint and consequently huge CapEx and OpEx. After exploring various alternatives, we find that building a clustering-based ANNS on top of all-flash servers can be promising. Yet, we still experience severe overheads from the kernel I/O stack, a fixed pruning strategy, and slow index construction. We present HELMSMAN, a high-performance and cost-effective clustering-based ANNS system, which combines an ANNS-oriented userspace storage stack, a leveling-learned pruning module, and GPU-accelerated pipelines of construction. HELMSMAN saves over 90% of hardware costs and enables billion-scale index (re)builds within hours. In the current production deployment, operating stably for several months, 40 machines now host ANNS workloads that previously required about 35,000 cores and 0.35 PB DRAM.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents HELMSMAN, a clustering-based ANNS system for all-flash servers at RedNote (Xiaohongshu). It replaces in-memory HNSW with a userspace storage stack, leveling-learned pruning, and GPU-accelerated construction to address kernel I/O overhead, fixed pruning, and slow builds. The central claim is a production deployment in which 40 machines now host workloads that previously required ~35,000 cores and 0.35 PB DRAM, delivering >90% hardware cost savings and enabling billion-scale index (re)builds in hours while operating stably for several months.
Significance. If the SLA parity holds, the result demonstrates a viable, large-scale shift from DRAM to flash-based ANNS with custom systems optimizations, offering substantial CapEx/OpEx reductions for social platforms with growing data volumes. The production deployment itself constitutes a concrete, falsifiable outcome that could influence industry practice in recommendation and search infrastructure.
major comments (1)
- [Abstract] Abstract: the headline production claim (40 machines replacing ~35k cores / 0.35 PB DRAM with stable multi-month operation) is presented without any reported p99 latency, QPS, recall@K, or latency distribution numbers under production query patterns and data distributions. This directly undermines verification of the weakest assumption that the userspace stack + learned pruning + GPU build preserves the original HNSW SLAs.
Simulated Author's Rebuttal
We thank the referee for the detailed review and the recognition of the production impact. The comment on the abstract is well-taken; we will strengthen the presentation of our claims with additional metrics while preserving the manuscript's focus on the systems contributions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline production claim (40 machines replacing ~35k cores / 0.35 PB DRAM with stable multi-month operation) is presented without any reported p99 latency, QPS, recall@K, or latency distribution numbers under production query patterns and data distributions. This directly undermines verification of the weakest assumption that the userspace stack + learned pruning + GPU build preserves the original HNSW SLAs.
Authors: We agree that the abstract would benefit from explicit SLA metrics to allow readers to directly assess parity. The body of the paper (evaluation sections) already reports query throughput, recall, and latency distributions from both offline benchmarks and production traces that match the prior HNSW deployment. For the revision we will add a concise sentence to the abstract citing representative production figures (p99 latency, QPS, and recall@K) drawn from those sections, confirming that the observed values remain within the original SLAs. revision: yes
Circularity Check
No circularity: empirical systems deployment report
full rationale
The paper is a systems description of a deployed clustering-based ANNS artifact (HELMSMAN) that replaces an in-memory HNSW deployment. It reports hardware savings from production operation but contains no equations, fitted parameters, statistical predictions, or derivation steps that could reduce to inputs by construction. No self-citations, ansatzes, or uniqueness theorems are invoked as load-bearing premises. The central claim is an empirical observation of stable operation under real workloads, which stands or falls on external SLA measurements rather than internal definitional equivalence.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
https://www.amd.com/en/products/processors/ server/epyc/9005-series.html
5th Generation AMD EPYC™ Server CPUs. https://www.amd.com/en/products/processors/ server/epyc/9005-series.html
-
[2]
https://americas.kioxia.com/en-us/ business/resources/performance-brief/ cm7-vector-db-r6615-performance-brief.html
Accelerating Vector Database Perfor- mance through Disk-Based Storage. https://americas.kioxia.com/en-us/ business/resources/performance-brief/ cm7-vector-db-r6615-performance-brief.html
-
[3]
https: //docs.amd.com/api/khub/documents/ gLSrfVtcWNt~1fzExUSiIg/content
AMD Smart Data Cache Injection. https: //docs.amd.com/api/khub/documents/ gLSrfVtcWNt~1fzExUSiIg/content
-
[4]
AWS EC2 Pricing.https://aws-pricing.com/
-
[5]
Explainer piece; positions Xiaohongshu as a lifestyle social commerce platform with 300M+ monthly active users
Everything you need to know about xiaohong- shu. Explainer piece; positions Xiaohongshu as a lifestyle social commerce platform with 300M+ monthly active users. https://restofworld.org/ 2025/rednote-xiaohongshu-what-to-know/
2025
-
[6]
https: //www.intel.com/content/www/us/en/io/ data-direct-i-o-technology.html
Intel Data Direct I/O Technology. https: //www.intel.com/content/www/us/en/io/ data-direct-i-o-technology.html
-
[7]
https://github
RAPIDS RAFT: Reusable Accelerated Functions and Tools for Vector Search and More. https://github. com/rapidsai/raft
-
[8]
https://rednotes.co/
RedNote (Xiaohongshu Inc). https://rednotes.co/
-
[9]
https://opensearch.org/blog/ reduce-cost-with-disk-based-vector-search/
Reduce costs with disk-based vector search. https://opensearch.org/blog/ reduce-cost-with-disk-based-vector-search/
-
[10]
https: //semiconductor.samsung.com/dram/ddr/ddr5/
Samsung DDR5 Data-centric DRAM Memory. https: //semiconductor.samsung.com/dram/ddr/ddr5/
-
[11]
https://semiconductor.samsung.com/ssd/ datacenter-ssd/pm9a3/
Samsung PCIe-Gen4.0 PM9A3 Data-centric NVMe SSD. https://semiconductor.samsung.com/ssd/ datacenter-ssd/pm9a3/
-
[12]
https://semiconductor.samsung.com/ssd/ datacenter-ssd/pm9d3a/
Samsung PCIe-Gen5.0 PM9D3A Data-centric NVMe SSD. https://semiconductor.samsung.com/ssd/ datacenter-ssd/pm9d3a/
-
[13]
SIFT dataset.http://corpus-texmex.irisa.fr/
-
[14]
https: //github.com/spdk
Storage Performance Development Kit (SPDK). https: //github.com/spdk
-
[15]
Re- CANet: A Repeat Consumption-Aware Neural Network for Next Basket Recommendation in Grocery Shopping
Mozhdeh Ariannezhad, Sami Jullien, Ming Li, Min Fang, Sebastian Schelter, and Maarten de Rijke. Re- CANet: A Repeat Consumption-Aware Neural Network for Next Basket Recommendation in Grocery Shopping. InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval, SIGIR, 2022
2022
-
[16]
Memory Hierarchy for Web Search
Grant Ayers, Jung Ho Ahn, Christos Kozyrakis, and Parthasarathy Ranganathan. Memory Hierarchy for Web Search. In2018 IEEE International Symposium on High Performance Computer Architecture, HPCA, 2018
2018
-
[17]
SPANN: Highly-efficient Billion-scale Approxi- mate Nearest Neighbor Search
Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. SPANN: Highly-efficient Billion-scale Approxi- mate Nearest Neighbor Search. InAdvances in Neural Information Processing Systems 34, NeurIPS, 2021
2021
-
[18]
Struc- tured storage for ubiquitous operating systems.SCIEN- TIA SINICA Informationis, 54, 2024
Xiaopeng Fan, Song Yan, and Chuliang Weng. Struc- tured storage for ubiquitous operating systems.SCIEN- TIA SINICA Informationis, 54, 2024
2024
-
[19]
Friedman
Jerome H. Friedman. Greedy Function Approximation: A Gradient Boosting Machine.Annals of Statistics, 29(5), 2001
2001
-
[20]
Fast Approximate Nearest Neighbor Search With Navi- gating Spreading-out Graphs.Proceedings of the VLDB Endowment, 12, 2019
Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. Fast Approximate Nearest Neighbor Search With Navi- gating Spreading-out Graphs.Proceedings of the VLDB Endowment, 12, 2019
2019
-
[21]
Complement Lex- ical Retrieval Model with Semantic Residual Embed- dings
Luyu Gao, Zhuyun Dai, Tongfei Chen, Zhen Fan, Ben- jamin Van Durme, and Jamie Callan. Complement Lex- ical Retrieval Model with Semantic Residual Embed- dings. InAdvances in Information Retrieval (Proceed- ings of ECIR 2021), ECIR, 2021
2021
-
[22]
Achieving Low-Latency Graph-Based Vector Search via Aligning Best-First Search Algorithm with SSD
Hao Guo and Youyou Lu. Achieving Low-Latency Graph-Based Vector Search via Aligning Best-First Search Algorithm with SSD. InProceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2025
2025
-
[23]
OdinANN: Direct Insert for Consistently Stable Performance in Billion-Scale Graph- Based Vector Search
Hao Guo and Youyou Lu. OdinANN: Direct Insert for Consistently Stable Performance in Billion-Scale Graph- Based Vector Search. In24th USENIX Conference on File and Storage Technologies, FAST, 2026
2026
-
[24]
Manu: a cloud native vector database management system.Proceedings of the VLDB Endowment, 15, 2022
Rentong Guo, Xiaofan Luan, Long Xiang, Xiao Yan, Xi- aomeng Yi, Jigao Luo, Qianya Cheng, Weizhi Xu, Jiarui Luo, Frank Liu, Zhenshan Cao, Yanliang Qiao, Ting Wang, Bo Tang, and Charles Xie. Manu: a cloud native vector database management system.Proceedings of the VLDB Endowment, 15, 2022
2022
-
[25]
What Modern NVMe Storage Can Do, And How To Exploit It: High- Performance I/O for High-Performance Storage Engines
Gabriel Haas and Viktor Leis. What Modern NVMe Storage Can Do, And How To Exploit It: High- Performance I/O for High-Performance Storage Engines. Proceedings of the VLDB Endowment, 2023
2023
-
[26]
Neos: A NVMe-GPUs Direct Vector Service Buffer in User Space
Yuchen Huang, Xiaopeng Fan, Song Yan, and Chuliang Weng. Neos: A NVMe-GPUs Direct Vector Service Buffer in User Space. InProceedings of the 40th IEEE International Conference on Data Engineering, ICDE, 2024
2024
-
[27]
Don’t Surrender to Low QPS/$: Fast and Cost- Efficient ANNS with TridentANN
Yuchen Huang, Baiteng Ma, Erci Xu, and Chuliang Weng. Don’t Surrender to Low QPS/$: Fast and Cost- Efficient ANNS with TridentANN. InProceedings of the 53rd Annual International Symposium on Computer Architecture, ISCA, 2026
2026
-
[28]
High-Throughput, Cost-Effective Billion- Scale Vector Search with a Single GPU
Haodi Jiang, Hao Guo, Minhui Xie, Jiwu Shu, and Youyou Lu. High-Throughput, Cost-Effective Billion- Scale Vector Search with a Single GPU. InProceedings of the 2026 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2026
2026
-
[29]
RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving
Wenqi Jiang, Suvinay Subramanian, Cat Graves, Gus- tavo Alonso, Amir Yazdanbakhsh, and Vidushi Dadu. RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving. InProceed- ings of the 52nd Annual International Symposium on Computer Architecture, ISCA, 2025
2025
-
[30]
Billion- Scale Similarity Search with GPUs.IEEE Transactions on Big Data, 7, 2021
Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion- Scale Similarity Search with GPUs.IEEE Transactions on Big Data, 7, 2021
2021
-
[31]
We ain’t afraid of no file frag- mentation: causes and prevention of its performance impact on modern flash SSDs
Yuhun Jun, Shinhyun Park, Jeong-Uk Kang, Sang-Hoon Kim, and Euiseong Seo. We ain’t afraid of no file frag- mentation: causes and prevention of its performance impact on modern flash SSDs. InProceedings of the 22nd USENIX Conference on File and Storage Tech- nologies, FAST, 2024
2024
-
[32]
Light- GBM: A Highly Efficient Gradient Boosting Decision Tree
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Light- GBM: A Highly Efficient Gradient Boosting Decision Tree. InAdvances in Neural Information Processing Systems 30, NeurIPS, 2017
2017
-
[33]
Serenade - Low-Latency Session-Based Rec- ommendation in e-Commerce at Scale
Barrie Kersbergen, Olivier Sprangers, and Sebastian Schelter. Serenade - Low-Latency Session-Based Rec- ommendation in e-Commerce at Scale. InProceedings of the 2022 International Conference on Management of Data, SIGMOD, 2022
2022
-
[34]
High-Performance Query Pro- cessing with NVMe Arrays: Spilling without Killing Performance
Maximilian Kuschewski, Jana Giceva, Thomas Neu- mann, and Viktor Leis. High-Performance Query Pro- cessing with NVMe Arrays: Spilling without Killing Performance. InProceedings of the 2025 ACM SIG- MOD International Conference on Management of Data, SIGMOD, 2025
2025
-
[35]
KVell: The Design and Implementation of a Fast Persistent Key-Value Store
Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. KVell: The Design and Implementation of a Fast Persistent Key-Value Store. InProceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP, 2019
2019
-
[36]
Andersen, and Yuxiong He
Conglong Li, Minjia Zhang, David G. Andersen, and Yuxiong He. Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2020
2020
-
[37]
Embedding- based Product Retrieval in Taobao Search
Sen Li, Fuyu Lv, Taiwei Jin, Guli Lin, Keping Yang, Xi- aoyi Zeng, Xiao-Ming Wu, and Qianli Ma. Embedding- based Product Retrieval in Taobao Search. InProceed- ings of the 27th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining, KDD, 2021
2021
-
[38]
ANSMET: Approximate Nearest Neigh- bor Search with Near-Memory Processing and Hybrid Early Termination
Yiwei Li, Yuxin Jin, Boyu Tian, Huanchen Zhang, and Mingyu Gao. ANSMET: Approximate Nearest Neigh- bor Search with Near-Memory Processing and Hybrid Early Termination. InProceedings of the 52nd An- nual International Symposium on Computer Architec- ture, ISCA, 2025
2025
-
[39]
HVS: Hierarchical Graph Structure Based on V oronoi Diagrams for Solving Approximate Nearest Neighbor Search.Proceedings of the VLDB Endowment, 15, 2021
Kejing Lu, Mineichi Kudo, Chuan Xiao, and Yoshiharu Ishikawa. HVS: Hierarchical Graph Structure Based on V oronoi Diagrams for Solving Approximate Nearest Neighbor Search.Proceedings of the VLDB Endowment, 15, 2021
2021
-
[40]
Malkov and D
Yu A. Malkov and D. A. Yashunin. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchi- cal Navigable Small World Graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2020
2020
-
[41]
Blel- loch, Laxman Dhulipala, Yan Gu, Harsha Vardhan Simhadri, and Yihan Sun
Magdalen Dobson Manohar, Zheqi Shen, Guy E. Blel- loch, Laxman Dhulipala, Yan Gu, Harsha Vardhan Simhadri, and Yihan Sun. ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Near- est Neighbor Search Algorithms. InProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP, 2024
2024
-
[42]
Quick- Update: a Real-Time Personalization System for Large- Scale Recommendation Models
Kiran Kumar Matam, Hani Ramezani, Fan Wang, Zeliang Chen, Yue Dong, Maomao Ding, Zhiwei Zhao, Zhengyu Zhang, Ellie Wen, and Assaf Eisenman. Quick- Update: a Real-Time Personalization System for Large- Scale Recommendation Models. InProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2024
2024
-
[43]
Ilyas, Theodoros Rekatsinas, and Shivaram Venkataraman
Jason Mohoney, Devesh Sarda, Mengze Tang, Shi- habur Rahman Chowdhury, Anil Pacaci, Ihab F. Ilyas, Theodoros Rekatsinas, and Shivaram Venkataraman. Quake: Adaptive Indexing for Vector Search. In19th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2025
2025
-
[44]
Diab, and Virginia Smith
Aashiq Muhamed, Mona T. Diab, and Virginia Smith. CoRAG: Collaborative Retrieval-Augmented Genera- tion. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technolo- gies (Volume 2: Short Papers), NAACL, 2025
2025
-
[45]
A Stacking-based Efficient Method for Toxic Language Detection on Live Streaming Chat
Yuto Oikawa, Yuki Nakayama, and Koji Murakami. A Stacking-based Efficient Method for Toxic Language Detection on Live Streaming Chat. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP, 2022
2022
-
[46]
CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs
Hiroyuki Ootomo, Akira Naruse, Corey Nolet, Ray Wang, Tamas Feher, and Yong Wang. CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs. InProceedings of the 40th IEEE International Conference on Data Engineering, ICDE, 2024
2024
-
[47]
Ekko: A Large- Scale Deep Learning Recommender System with Low- Latency Model Update
Chijun Sima, Yao Fu, Man-Kit Sit, Liyi Guo, Xuri Gong, Feng Lin, Junyu Wu, Yongsheng Li, Haidong Rong, Pierre-Louis Aublin, and Luo Mai. Ekko: A Large- Scale Deep Learning Recommender System with Low- Latency Model Update. InProceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2022
2022
-
[48]
Results of the NeurIPS’21 Challenge on Billion-Scale Approximate Nearest Neigh- bor Search
Harsha Vardhan Simhadri, George Williams, Martin Aumüller, Matthijs Douze, Artem Babenko, Dmitry Baranchuk, Qi Chen, Lucas Hosseini, Ravishankar Kr- ishnaswamy, Gopal Srinivasa, Suhas Jayaram Subra- manya, and Jingdong Wang. Results of the NeurIPS’21 Challenge on Billion-Scale Approximate Nearest Neigh- bor Search. InProceedings of the NeurIPS 2021 Com- p...
2021
-
[49]
DiskANN: Fast Accurate Billion-point Near- est Neighbor Search on a Single Node
Suhas Jayaram Subramanya, Devvrit, Harsha Vard- han Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. DiskANN: Fast Accurate Billion-point Near- est Neighbor Search on a Single Node. InAdvances in Neural Information Processing Systems 32, NeurIPS, 2019
2019
-
[50]
A Real-Time Adaptive Multi-Stream GPU System For Online Ap- proximate Nearest Neighborhood Search
Yiping Sun, Yang Shi, and Jiaolong Du. A Real-Time Adaptive Multi-Stream GPU System For Online Ap- proximate Nearest Neighborhood Search. InProceed- ings of the 33rd ACM International Conference on In- formation and Knowledge Management, CIKM, 2024
2024
-
[51]
FusionANNS: An Efficient CPU/GPU Cooperative Processing Architecture for Billion-scale Approximate Nearest Neighbor Search
Bing Tian, Haikun Liu, Yuhang Tang, Shihai Xiao, Zhuo- hui Duan, Xiaofei Liao, Xuecang Zhang, Junhua Zhu, and Yu Zhang. FusionANNS: An Efficient CPU/GPU Cooperative Processing Architecture for Billion-scale Approximate Nearest Neighbor Search. In23rd USENIX Conference on File and Storage Technologies, FAST, 2025
2025
-
[52]
Toussaint
Godfried T. Toussaint. The Relative Neighborhood Graph of a Finite Planar Set.Pattern Recognition, 12, 1980
1980
-
[53]
Explainable Rec- ommendation for Repeat Consumption
Kosetsu Tsukuda and Masataka Goto. Explainable Rec- ommendation for Repeat Consumption. InProceedings of the 14th ACM Conference on Recommender Systems, RecSys, 2020
2020
-
[54]
Modeling Item-Specific Temporal Dynamics of Repeat Consumption for Recommender Systems
Chenyang Wang, Min Zhang, Weizhi Ma, Yiqun Liu, and Shaoping Ma. Modeling Item-Specific Temporal Dynamics of Repeat Consumption for Recommender Systems. InProceedings of the World Wide Web Con- ference, WWW, 2019
2019
-
[55]
Milvus: A Purpose-Built Vector Data Management System
Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xiangyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, Kun Yu, Yuxing Yuan, Yinghao Zou, Jiquan Long, Yudong Cai, Zhenxiang Li, Zhifeng Zhang, Yihua Mo, Jun Gu, Ruiyi Jiang, Yi Wei, and Charles Xie. Milvus: A Purpose-Built Vector Data Management System. InProceedings of the 2021 ACM SI...
2021
-
[56]
Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment
Mengzhao Wang, Weizhi Xu, Xiaomeng Yi, Songlin Wu, Zhangyang Peng, Xiangyu Ke, Yunjun Gao, Xiao- liang Xu, Rentong Guo, and Charles Xie. Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment. InProceedings of the 2024 ACM SIGMOD International Conference on Management of Data, SIG- MOD, 2024
2024
-
[57]
FlashANNS: GPU-Driven Asynchronous I/O Pipelining for Eliminating Storage-Compute Bottle- necks in Billion-Scale Similarity Search
Yang Xiao, Mo Sun, Ziyu Song, Bing Tian, Jie Sun, Jie Zhang, Zeke Wang, Zonghui Wang, Wenzhi Chen, and Fei Wu. FlashANNS: GPU-Driven Asynchronous I/O Pipelining for Eliminating Storage-Compute Bottle- necks in Billion-Scale Similarity Search. InProceedings of the 2026 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2026
2026
-
[58]
SPFresh: Incremen- tal In-Place Update for Billion-Scale Vector Search
Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, Peng Cheng, and Mao Yang. SPFresh: Incremen- tal In-Place Update for Billion-Scale Vector Search. In Proceedings of the 29th Symposium on Operating Sys- tems Principles, SOSP, 2023
2023
-
[59]
Agile and Accurate CTR Prediction Model Training for Massive-Scale On- line Advertising Systems
Zhiqiang Xu, Dong Li, Weijie Zhao, Xing Shen, Tianbo Huang, Xiaoyun Li, and Ping Li. Agile and Accurate CTR Prediction Model Training for Massive-Scale On- line Advertising Systems. InProceedings of the 2021 International Conference on Management of Data, SIG- MOD, 2021
2021
-
[60]
Flash-KMeans: Fast and Memory-Efficient Exact K-Means
Shuo Yang, Haocheng Xi, Yilong Zhao, Muyang Li, Xiaoze Fan, Jintao Zhang, Han Cai, Yujun Lin, Xiuyu Li, Kurt Keutzer, Song Han, Chenfeng Xu, and Ion Sto- ica. Flash-KMeans: Fast and Memory-Efficient Exact K-Means. InarXiv, 2026
2026
-
[61]
Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Ajit Kumthekar, Zhe Zhao, Li Wei, and Ed H. Chi. Sampling-Bias-Corrected Neu- ral Modeling for Large Corpus Item Recommendations. InProceedings of the 13th ACM Conference on Recom- mender Systems, RecSys, 2019
2019
-
[62]
KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question An- swering
Donghan Yu, Chenguang Zhu, Yuwei Fang, Wenhao Yu, Shuohang Wang, Yichong Xu, Xiang Ren, Yiming Yang, and Michael Zeng. KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question An- swering. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL, 2022
2022
-
[63]
VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity
Qianxi Zhang, Shuotao Xu, Qi Chen, Guoxin Sui, Ji- adong Xie, Zhizhen Cai, Yaoqi Chen, Yinxuan He, Yuqing Yang, Fan Yang, Mao Yang, and Lidong Zhou. VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity. InPro- ceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2023
2023
-
[64]
Fast, Approximate Vector Queries on Very Large Unstructured Datasets
Zili Zhang, Chao Jin, Linpeng Tang, Xuanzhe Liu, and Xin Jin. Fast, Approximate Vector Queries on Very Large Unstructured Datasets. In20th USENIX Sympo- sium on Networked Systems Design and Implementation, NSDI, 2023
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.