pith. sign in

arxiv: 2606.13145 · v1 · pith:M4CB5QLMnew · submitted 2026-06-11 · 💻 cs.IR

The Clustering Strikes Back: Building Cost-Effective and High-Performance ANNS at Scale with Helmsman

Pith reviewed 2026-06-27 05:53 UTC · model grok-4.3

classification 💻 cs.IR
keywords approximate nearest neighbor searchANNSclustering-based indexall-flash storageuserspace I/O stacklearned pruningGPU-accelerated constructioncost reduction
0
0 comments X

The pith

A clustering-based ANNS on all-flash servers matches in-memory HNSW performance while cutting hardware costs by over 90 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that a production social platform can replace its memory-intensive HNSW graph index with a clustering-based approach running on flash storage. It achieves this by pairing the index with a custom userspace I/O stack, a learned pruning module that adapts to query patterns, and GPU pipelines for fast index construction. The result sustains the same low-latency and high-throughput service level agreements required for search, recommendation, and advertising. In live deployment the entire workload now runs on 40 machines instead of the previous fleet of roughly 35,000 cores and 0.35 PB of DRAM.

Core claim

HELMSMAN builds a high-performance clustering-based ANNS by layering an ANNS-oriented userspace storage stack, a leveling-learned pruning module, and GPU-accelerated construction pipelines on top of all-flash servers. This combination removes kernel I/O overhead, replaces fixed pruning with adaptive learned decisions, and accelerates index rebuilds enough to keep billion-scale indexes current. The system therefore delivers the latency and throughput previously available only from in-memory HNSW while reducing hardware requirements by more than 90 percent.

What carries the argument

The ANNS-oriented userspace storage stack together with leveling-learned pruning and GPU-accelerated construction pipelines on a clustering-based index.

If this is right

  • Billion-scale indexes can be rebuilt from scratch in hours rather than days.
  • ANNS deployments can expand data volume without a proportional rise in DRAM capacity.
  • Hardware spend for search, recommendation, and advertising services drops by more than 90 percent.
  • Clustering methods regain practicality for high-SLA workloads once I/O and pruning overheads are removed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Platforms facing similar memory growth in recommendation or retrieval systems could test whether the same three optimizations transfer to their indexes.
  • The cost advantage may widen further if flash bandwidth continues to improve relative to DRAM prices.
  • The learned pruning technique might be portable to other index families if the leveling logic generalizes beyond the current clustering structure.

Load-bearing premise

The flash-based clustering index with its userspace stack and learned pruning will deliver the same latency and throughput SLAs as the prior in-memory HNSW under real production query traffic.

What would settle it

A side-by-side replay of the production query log that shows HELMSMAN violating the existing latency or throughput targets on more than a negligible fraction of requests.

read the original abstract

RedNote (a.k.a., Xiaohongshu, a global-scale social network platform) widely adopts approximate nearest neighbor search (ANNS) to power its search, recommendation, and advertising services. Due to the demanding Service Level Agreements (SLAs), we have to rely on in-memory graph-based ANNS (i.e., HNSW) to provide high throughput and low latency. However, the ever-growing user base and content volume have led to an explosive increase in memory footprint and consequently huge CapEx and OpEx. After exploring various alternatives, we find that building a clustering-based ANNS on top of all-flash servers can be promising. Yet, we still experience severe overheads from the kernel I/O stack, a fixed pruning strategy, and slow index construction. We present HELMSMAN, a high-performance and cost-effective clustering-based ANNS system, which combines an ANNS-oriented userspace storage stack, a leveling-learned pruning module, and GPU-accelerated pipelines of construction. HELMSMAN saves over 90% of hardware costs and enables billion-scale index (re)builds within hours. In the current production deployment, operating stably for several months, 40 machines now host ANNS workloads that previously required about 35,000 cores and 0.35 PB DRAM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper presents HELMSMAN, a clustering-based ANNS system for all-flash servers at RedNote (Xiaohongshu). It replaces in-memory HNSW with a userspace storage stack, leveling-learned pruning, and GPU-accelerated construction to address kernel I/O overhead, fixed pruning, and slow builds. The central claim is a production deployment in which 40 machines now host workloads that previously required ~35,000 cores and 0.35 PB DRAM, delivering >90% hardware cost savings and enabling billion-scale index (re)builds in hours while operating stably for several months.

Significance. If the SLA parity holds, the result demonstrates a viable, large-scale shift from DRAM to flash-based ANNS with custom systems optimizations, offering substantial CapEx/OpEx reductions for social platforms with growing data volumes. The production deployment itself constitutes a concrete, falsifiable outcome that could influence industry practice in recommendation and search infrastructure.

major comments (1)
  1. [Abstract] Abstract: the headline production claim (40 machines replacing ~35k cores / 0.35 PB DRAM with stable multi-month operation) is presented without any reported p99 latency, QPS, recall@K, or latency distribution numbers under production query patterns and data distributions. This directly undermines verification of the weakest assumption that the userspace stack + learned pruning + GPU build preserves the original HNSW SLAs.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and the recognition of the production impact. The comment on the abstract is well-taken; we will strengthen the presentation of our claims with additional metrics while preserving the manuscript's focus on the systems contributions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline production claim (40 machines replacing ~35k cores / 0.35 PB DRAM with stable multi-month operation) is presented without any reported p99 latency, QPS, recall@K, or latency distribution numbers under production query patterns and data distributions. This directly undermines verification of the weakest assumption that the userspace stack + learned pruning + GPU build preserves the original HNSW SLAs.

    Authors: We agree that the abstract would benefit from explicit SLA metrics to allow readers to directly assess parity. The body of the paper (evaluation sections) already reports query throughput, recall, and latency distributions from both offline benchmarks and production traces that match the prior HNSW deployment. For the revision we will add a concise sentence to the abstract citing representative production figures (p99 latency, QPS, and recall@K) drawn from those sections, confirming that the observed values remain within the original SLAs. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical systems deployment report

full rationale

The paper is a systems description of a deployed clustering-based ANNS artifact (HELMSMAN) that replaces an in-memory HNSW deployment. It reports hardware savings from production operation but contains no equations, fitted parameters, statistical predictions, or derivation steps that could reduce to inputs by construction. No self-citations, ansatzes, or uniqueness theorems are invoked as load-bearing premises. The central claim is an empirical observation of stable operation under real workloads, which stands or falls on external SLA measurements rather than internal definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical axioms or free parameters are present; the work is an engineering systems paper whose claims rest on the unstated assumption that production query workload and data distribution match the tested conditions.

pith-pipeline@v0.9.1-grok · 5801 in / 1021 out tokens · 20724 ms · 2026-06-27T05:53:17.376546+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references

  1. [1]

    https://www.amd.com/en/products/processors/ server/epyc/9005-series.html

    5th Generation AMD EPYC™ Server CPUs. https://www.amd.com/en/products/processors/ server/epyc/9005-series.html

  2. [2]

    https://americas.kioxia.com/en-us/ business/resources/performance-brief/ cm7-vector-db-r6615-performance-brief.html

    Accelerating Vector Database Perfor- mance through Disk-Based Storage. https://americas.kioxia.com/en-us/ business/resources/performance-brief/ cm7-vector-db-r6615-performance-brief.html

  3. [3]

    https: //docs.amd.com/api/khub/documents/ gLSrfVtcWNt~1fzExUSiIg/content

    AMD Smart Data Cache Injection. https: //docs.amd.com/api/khub/documents/ gLSrfVtcWNt~1fzExUSiIg/content

  4. [4]

    AWS EC2 Pricing.https://aws-pricing.com/

  5. [5]

    Explainer piece; positions Xiaohongshu as a lifestyle social commerce platform with 300M+ monthly active users

    Everything you need to know about xiaohong- shu. Explainer piece; positions Xiaohongshu as a lifestyle social commerce platform with 300M+ monthly active users. https://restofworld.org/ 2025/rednote-xiaohongshu-what-to-know/

  6. [6]

    https: //www.intel.com/content/www/us/en/io/ data-direct-i-o-technology.html

    Intel Data Direct I/O Technology. https: //www.intel.com/content/www/us/en/io/ data-direct-i-o-technology.html

  7. [7]

    https://github

    RAPIDS RAFT: Reusable Accelerated Functions and Tools for Vector Search and More. https://github. com/rapidsai/raft

  8. [8]

    https://rednotes.co/

    RedNote (Xiaohongshu Inc). https://rednotes.co/

  9. [9]

    https://opensearch.org/blog/ reduce-cost-with-disk-based-vector-search/

    Reduce costs with disk-based vector search. https://opensearch.org/blog/ reduce-cost-with-disk-based-vector-search/

  10. [10]

    https: //semiconductor.samsung.com/dram/ddr/ddr5/

    Samsung DDR5 Data-centric DRAM Memory. https: //semiconductor.samsung.com/dram/ddr/ddr5/

  11. [11]

    https://semiconductor.samsung.com/ssd/ datacenter-ssd/pm9a3/

    Samsung PCIe-Gen4.0 PM9A3 Data-centric NVMe SSD. https://semiconductor.samsung.com/ssd/ datacenter-ssd/pm9a3/

  12. [12]

    https://semiconductor.samsung.com/ssd/ datacenter-ssd/pm9d3a/

    Samsung PCIe-Gen5.0 PM9D3A Data-centric NVMe SSD. https://semiconductor.samsung.com/ssd/ datacenter-ssd/pm9d3a/

  13. [13]

    SIFT dataset.http://corpus-texmex.irisa.fr/

  14. [14]

    https: //github.com/spdk

    Storage Performance Development Kit (SPDK). https: //github.com/spdk

  15. [15]

    Re- CANet: A Repeat Consumption-Aware Neural Network for Next Basket Recommendation in Grocery Shopping

    Mozhdeh Ariannezhad, Sami Jullien, Ming Li, Min Fang, Sebastian Schelter, and Maarten de Rijke. Re- CANet: A Repeat Consumption-Aware Neural Network for Next Basket Recommendation in Grocery Shopping. InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval, SIGIR, 2022

  16. [16]

    Memory Hierarchy for Web Search

    Grant Ayers, Jung Ho Ahn, Christos Kozyrakis, and Parthasarathy Ranganathan. Memory Hierarchy for Web Search. In2018 IEEE International Symposium on High Performance Computer Architecture, HPCA, 2018

  17. [17]

    SPANN: Highly-efficient Billion-scale Approxi- mate Nearest Neighbor Search

    Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. SPANN: Highly-efficient Billion-scale Approxi- mate Nearest Neighbor Search. InAdvances in Neural Information Processing Systems 34, NeurIPS, 2021

  18. [18]

    Struc- tured storage for ubiquitous operating systems.SCIEN- TIA SINICA Informationis, 54, 2024

    Xiaopeng Fan, Song Yan, and Chuliang Weng. Struc- tured storage for ubiquitous operating systems.SCIEN- TIA SINICA Informationis, 54, 2024

  19. [19]

    Friedman

    Jerome H. Friedman. Greedy Function Approximation: A Gradient Boosting Machine.Annals of Statistics, 29(5), 2001

  20. [20]

    Fast Approximate Nearest Neighbor Search With Navi- gating Spreading-out Graphs.Proceedings of the VLDB Endowment, 12, 2019

    Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. Fast Approximate Nearest Neighbor Search With Navi- gating Spreading-out Graphs.Proceedings of the VLDB Endowment, 12, 2019

  21. [21]

    Complement Lex- ical Retrieval Model with Semantic Residual Embed- dings

    Luyu Gao, Zhuyun Dai, Tongfei Chen, Zhen Fan, Ben- jamin Van Durme, and Jamie Callan. Complement Lex- ical Retrieval Model with Semantic Residual Embed- dings. InAdvances in Information Retrieval (Proceed- ings of ECIR 2021), ECIR, 2021

  22. [22]

    Achieving Low-Latency Graph-Based Vector Search via Aligning Best-First Search Algorithm with SSD

    Hao Guo and Youyou Lu. Achieving Low-Latency Graph-Based Vector Search via Aligning Best-First Search Algorithm with SSD. InProceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2025

  23. [23]

    OdinANN: Direct Insert for Consistently Stable Performance in Billion-Scale Graph- Based Vector Search

    Hao Guo and Youyou Lu. OdinANN: Direct Insert for Consistently Stable Performance in Billion-Scale Graph- Based Vector Search. In24th USENIX Conference on File and Storage Technologies, FAST, 2026

  24. [24]

    Manu: a cloud native vector database management system.Proceedings of the VLDB Endowment, 15, 2022

    Rentong Guo, Xiaofan Luan, Long Xiang, Xiao Yan, Xi- aomeng Yi, Jigao Luo, Qianya Cheng, Weizhi Xu, Jiarui Luo, Frank Liu, Zhenshan Cao, Yanliang Qiao, Ting Wang, Bo Tang, and Charles Xie. Manu: a cloud native vector database management system.Proceedings of the VLDB Endowment, 15, 2022

  25. [25]

    What Modern NVMe Storage Can Do, And How To Exploit It: High- Performance I/O for High-Performance Storage Engines

    Gabriel Haas and Viktor Leis. What Modern NVMe Storage Can Do, And How To Exploit It: High- Performance I/O for High-Performance Storage Engines. Proceedings of the VLDB Endowment, 2023

  26. [26]

    Neos: A NVMe-GPUs Direct Vector Service Buffer in User Space

    Yuchen Huang, Xiaopeng Fan, Song Yan, and Chuliang Weng. Neos: A NVMe-GPUs Direct Vector Service Buffer in User Space. InProceedings of the 40th IEEE International Conference on Data Engineering, ICDE, 2024

  27. [27]

    Don’t Surrender to Low QPS/$: Fast and Cost- Efficient ANNS with TridentANN

    Yuchen Huang, Baiteng Ma, Erci Xu, and Chuliang Weng. Don’t Surrender to Low QPS/$: Fast and Cost- Efficient ANNS with TridentANN. InProceedings of the 53rd Annual International Symposium on Computer Architecture, ISCA, 2026

  28. [28]

    High-Throughput, Cost-Effective Billion- Scale Vector Search with a Single GPU

    Haodi Jiang, Hao Guo, Minhui Xie, Jiwu Shu, and Youyou Lu. High-Throughput, Cost-Effective Billion- Scale Vector Search with a Single GPU. InProceedings of the 2026 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2026

  29. [29]

    RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving

    Wenqi Jiang, Suvinay Subramanian, Cat Graves, Gus- tavo Alonso, Amir Yazdanbakhsh, and Vidushi Dadu. RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving. InProceed- ings of the 52nd Annual International Symposium on Computer Architecture, ISCA, 2025

  30. [30]

    Billion- Scale Similarity Search with GPUs.IEEE Transactions on Big Data, 7, 2021

    Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion- Scale Similarity Search with GPUs.IEEE Transactions on Big Data, 7, 2021

  31. [31]

    We ain’t afraid of no file frag- mentation: causes and prevention of its performance impact on modern flash SSDs

    Yuhun Jun, Shinhyun Park, Jeong-Uk Kang, Sang-Hoon Kim, and Euiseong Seo. We ain’t afraid of no file frag- mentation: causes and prevention of its performance impact on modern flash SSDs. InProceedings of the 22nd USENIX Conference on File and Storage Tech- nologies, FAST, 2024

  32. [32]

    Light- GBM: A Highly Efficient Gradient Boosting Decision Tree

    Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Light- GBM: A Highly Efficient Gradient Boosting Decision Tree. InAdvances in Neural Information Processing Systems 30, NeurIPS, 2017

  33. [33]

    Serenade - Low-Latency Session-Based Rec- ommendation in e-Commerce at Scale

    Barrie Kersbergen, Olivier Sprangers, and Sebastian Schelter. Serenade - Low-Latency Session-Based Rec- ommendation in e-Commerce at Scale. InProceedings of the 2022 International Conference on Management of Data, SIGMOD, 2022

  34. [34]

    High-Performance Query Pro- cessing with NVMe Arrays: Spilling without Killing Performance

    Maximilian Kuschewski, Jana Giceva, Thomas Neu- mann, and Viktor Leis. High-Performance Query Pro- cessing with NVMe Arrays: Spilling without Killing Performance. InProceedings of the 2025 ACM SIG- MOD International Conference on Management of Data, SIGMOD, 2025

  35. [35]

    KVell: The Design and Implementation of a Fast Persistent Key-Value Store

    Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. KVell: The Design and Implementation of a Fast Persistent Key-Value Store. InProceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP, 2019

  36. [36]

    Andersen, and Yuxiong He

    Conglong Li, Minjia Zhang, David G. Andersen, and Yuxiong He. Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2020

  37. [37]

    Embedding- based Product Retrieval in Taobao Search

    Sen Li, Fuyu Lv, Taiwei Jin, Guli Lin, Keping Yang, Xi- aoyi Zeng, Xiao-Ming Wu, and Qianli Ma. Embedding- based Product Retrieval in Taobao Search. InProceed- ings of the 27th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining, KDD, 2021

  38. [38]

    ANSMET: Approximate Nearest Neigh- bor Search with Near-Memory Processing and Hybrid Early Termination

    Yiwei Li, Yuxin Jin, Boyu Tian, Huanchen Zhang, and Mingyu Gao. ANSMET: Approximate Nearest Neigh- bor Search with Near-Memory Processing and Hybrid Early Termination. InProceedings of the 52nd An- nual International Symposium on Computer Architec- ture, ISCA, 2025

  39. [39]

    HVS: Hierarchical Graph Structure Based on V oronoi Diagrams for Solving Approximate Nearest Neighbor Search.Proceedings of the VLDB Endowment, 15, 2021

    Kejing Lu, Mineichi Kudo, Chuan Xiao, and Yoshiharu Ishikawa. HVS: Hierarchical Graph Structure Based on V oronoi Diagrams for Solving Approximate Nearest Neighbor Search.Proceedings of the VLDB Endowment, 15, 2021

  40. [40]

    Malkov and D

    Yu A. Malkov and D. A. Yashunin. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchi- cal Navigable Small World Graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2020

  41. [41]

    Blel- loch, Laxman Dhulipala, Yan Gu, Harsha Vardhan Simhadri, and Yihan Sun

    Magdalen Dobson Manohar, Zheqi Shen, Guy E. Blel- loch, Laxman Dhulipala, Yan Gu, Harsha Vardhan Simhadri, and Yihan Sun. ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Near- est Neighbor Search Algorithms. InProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP, 2024

  42. [42]

    Quick- Update: a Real-Time Personalization System for Large- Scale Recommendation Models

    Kiran Kumar Matam, Hani Ramezani, Fan Wang, Zeliang Chen, Yue Dong, Maomao Ding, Zhiwei Zhao, Zhengyu Zhang, Ellie Wen, and Assaf Eisenman. Quick- Update: a Real-Time Personalization System for Large- Scale Recommendation Models. InProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2024

  43. [43]

    Ilyas, Theodoros Rekatsinas, and Shivaram Venkataraman

    Jason Mohoney, Devesh Sarda, Mengze Tang, Shi- habur Rahman Chowdhury, Anil Pacaci, Ihab F. Ilyas, Theodoros Rekatsinas, and Shivaram Venkataraman. Quake: Adaptive Indexing for Vector Search. In19th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2025

  44. [44]

    Diab, and Virginia Smith

    Aashiq Muhamed, Mona T. Diab, and Virginia Smith. CoRAG: Collaborative Retrieval-Augmented Genera- tion. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technolo- gies (Volume 2: Short Papers), NAACL, 2025

  45. [45]

    A Stacking-based Efficient Method for Toxic Language Detection on Live Streaming Chat

    Yuto Oikawa, Yuki Nakayama, and Koji Murakami. A Stacking-based Efficient Method for Toxic Language Detection on Live Streaming Chat. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP, 2022

  46. [46]

    CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs

    Hiroyuki Ootomo, Akira Naruse, Corey Nolet, Ray Wang, Tamas Feher, and Yong Wang. CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs. InProceedings of the 40th IEEE International Conference on Data Engineering, ICDE, 2024

  47. [47]

    Ekko: A Large- Scale Deep Learning Recommender System with Low- Latency Model Update

    Chijun Sima, Yao Fu, Man-Kit Sit, Liyi Guo, Xuri Gong, Feng Lin, Junyu Wu, Yongsheng Li, Haidong Rong, Pierre-Louis Aublin, and Luo Mai. Ekko: A Large- Scale Deep Learning Recommender System with Low- Latency Model Update. InProceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2022

  48. [48]

    Results of the NeurIPS’21 Challenge on Billion-Scale Approximate Nearest Neigh- bor Search

    Harsha Vardhan Simhadri, George Williams, Martin Aumüller, Matthijs Douze, Artem Babenko, Dmitry Baranchuk, Qi Chen, Lucas Hosseini, Ravishankar Kr- ishnaswamy, Gopal Srinivasa, Suhas Jayaram Subra- manya, and Jingdong Wang. Results of the NeurIPS’21 Challenge on Billion-Scale Approximate Nearest Neigh- bor Search. InProceedings of the NeurIPS 2021 Com- p...

  49. [49]

    DiskANN: Fast Accurate Billion-point Near- est Neighbor Search on a Single Node

    Suhas Jayaram Subramanya, Devvrit, Harsha Vard- han Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. DiskANN: Fast Accurate Billion-point Near- est Neighbor Search on a Single Node. InAdvances in Neural Information Processing Systems 32, NeurIPS, 2019

  50. [50]

    A Real-Time Adaptive Multi-Stream GPU System For Online Ap- proximate Nearest Neighborhood Search

    Yiping Sun, Yang Shi, and Jiaolong Du. A Real-Time Adaptive Multi-Stream GPU System For Online Ap- proximate Nearest Neighborhood Search. InProceed- ings of the 33rd ACM International Conference on In- formation and Knowledge Management, CIKM, 2024

  51. [51]

    FusionANNS: An Efficient CPU/GPU Cooperative Processing Architecture for Billion-scale Approximate Nearest Neighbor Search

    Bing Tian, Haikun Liu, Yuhang Tang, Shihai Xiao, Zhuo- hui Duan, Xiaofei Liao, Xuecang Zhang, Junhua Zhu, and Yu Zhang. FusionANNS: An Efficient CPU/GPU Cooperative Processing Architecture for Billion-scale Approximate Nearest Neighbor Search. In23rd USENIX Conference on File and Storage Technologies, FAST, 2025

  52. [52]

    Toussaint

    Godfried T. Toussaint. The Relative Neighborhood Graph of a Finite Planar Set.Pattern Recognition, 12, 1980

  53. [53]

    Explainable Rec- ommendation for Repeat Consumption

    Kosetsu Tsukuda and Masataka Goto. Explainable Rec- ommendation for Repeat Consumption. InProceedings of the 14th ACM Conference on Recommender Systems, RecSys, 2020

  54. [54]

    Modeling Item-Specific Temporal Dynamics of Repeat Consumption for Recommender Systems

    Chenyang Wang, Min Zhang, Weizhi Ma, Yiqun Liu, and Shaoping Ma. Modeling Item-Specific Temporal Dynamics of Repeat Consumption for Recommender Systems. InProceedings of the World Wide Web Con- ference, WWW, 2019

  55. [55]

    Milvus: A Purpose-Built Vector Data Management System

    Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xiangyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, Kun Yu, Yuxing Yuan, Yinghao Zou, Jiquan Long, Yudong Cai, Zhenxiang Li, Zhifeng Zhang, Yihua Mo, Jun Gu, Ruiyi Jiang, Yi Wei, and Charles Xie. Milvus: A Purpose-Built Vector Data Management System. InProceedings of the 2021 ACM SI...

  56. [56]

    Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment

    Mengzhao Wang, Weizhi Xu, Xiaomeng Yi, Songlin Wu, Zhangyang Peng, Xiangyu Ke, Yunjun Gao, Xiao- liang Xu, Rentong Guo, and Charles Xie. Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment. InProceedings of the 2024 ACM SIGMOD International Conference on Management of Data, SIG- MOD, 2024

  57. [57]

    FlashANNS: GPU-Driven Asynchronous I/O Pipelining for Eliminating Storage-Compute Bottle- necks in Billion-Scale Similarity Search

    Yang Xiao, Mo Sun, Ziyu Song, Bing Tian, Jie Sun, Jie Zhang, Zeke Wang, Zonghui Wang, Wenzhi Chen, and Fei Wu. FlashANNS: GPU-Driven Asynchronous I/O Pipelining for Eliminating Storage-Compute Bottle- necks in Billion-Scale Similarity Search. InProceedings of the 2026 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2026

  58. [58]

    SPFresh: Incremen- tal In-Place Update for Billion-Scale Vector Search

    Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, Peng Cheng, and Mao Yang. SPFresh: Incremen- tal In-Place Update for Billion-Scale Vector Search. In Proceedings of the 29th Symposium on Operating Sys- tems Principles, SOSP, 2023

  59. [59]

    Agile and Accurate CTR Prediction Model Training for Massive-Scale On- line Advertising Systems

    Zhiqiang Xu, Dong Li, Weijie Zhao, Xing Shen, Tianbo Huang, Xiaoyun Li, and Ping Li. Agile and Accurate CTR Prediction Model Training for Massive-Scale On- line Advertising Systems. InProceedings of the 2021 International Conference on Management of Data, SIG- MOD, 2021

  60. [60]

    Flash-KMeans: Fast and Memory-Efficient Exact K-Means

    Shuo Yang, Haocheng Xi, Yilong Zhao, Muyang Li, Xiaoze Fan, Jintao Zhang, Han Cai, Yujun Lin, Xiuyu Li, Kurt Keutzer, Song Han, Chenfeng Xu, and Ion Sto- ica. Flash-KMeans: Fast and Memory-Efficient Exact K-Means. InarXiv, 2026

  61. [61]

    Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Ajit Kumthekar, Zhe Zhao, Li Wei, and Ed H. Chi. Sampling-Bias-Corrected Neu- ral Modeling for Large Corpus Item Recommendations. InProceedings of the 13th ACM Conference on Recom- mender Systems, RecSys, 2019

  62. [62]

    KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question An- swering

    Donghan Yu, Chenguang Zhu, Yuwei Fang, Wenhao Yu, Shuohang Wang, Yichong Xu, Xiang Ren, Yiming Yang, and Michael Zeng. KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question An- swering. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL, 2022

  63. [63]

    VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity

    Qianxi Zhang, Shuotao Xu, Qi Chen, Guoxin Sui, Ji- adong Xie, Zhizhen Cai, Yaoqi Chen, Yinxuan He, Yuqing Yang, Fan Yang, Mao Yang, and Lidong Zhou. VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity. InPro- ceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2023

  64. [64]

    Fast, Approximate Vector Queries on Very Large Unstructured Datasets

    Zili Zhang, Chao Jin, Linpeng Tang, Xuanzhe Liu, and Xin Jin. Fast, Approximate Vector Queries on Very Large Unstructured Datasets. In20th USENIX Sympo- sium on Networked Systems Design and Implementation, NSDI, 2023