pith. machine review for the scientific record. sign in

arxiv: 2604.03007 · v1 · submitted 2026-04-03 · 💻 cs.DC · cs.DB

Recognition: no theorem link

CIDER: Boosting Memory-Disaggregated Key-Value Stores with Pessimistic Synchronization

Authors on Pith no claims yet

Pith reviewed 2026-05-13 18:19 UTC · model grok-4.3

classification 💻 cs.DC cs.DB
keywords memory disaggregationkey-value storespessimistic synchronizationredundant I/Oswrite-combiningcontention-aware synchronizationthroughput optimizationYCSB benchmark
0
0 comments X

The pith

Switching to pessimistic synchronization cuts redundant I/Os that bottleneck memory-disaggregated key-value stores.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that memory-disaggregated KV stores create too many redundant network I/Os because their optimistic synchronization clashes with highly concurrent workloads. CIDER replaces this with pessimistic synchronization on the compute side, adds global write-combining to merge updates, and uses a contention-aware scheme to avoid unnecessary overhead when conflicts are rare. If the approach holds, limited network bandwidth between compute and memory pools stops being the main limiter on throughput. A sympathetic reader cares because disaggregated memory promises flexible resource allocation but currently wastes that flexibility on avoidable traffic. The authors demonstrate the change lifts throughput of existing systems by up to 6.6 times on standard YCSB workloads.

Core claim

CIDER demonstrates that pessimistic synchronization, paired with global write-combining and contention-aware mechanisms, directly addresses the root cause of redundant I/Os in memory-disaggregated KV stores by aligning access control with high-concurrency patterns on disaggregated memory.

What carries the argument

CIDER framework that applies pessimistic synchronization together with global write-combining to merge cross-node writes and a contention-aware scheme that adapts locking behavior under varying conflict rates.

If this is right

  • Throughput of existing memory-disaggregated KV stores rises by up to 6.6 times under YCSB benchmarks.
  • Network traffic between compute and memory pools falls because redundant cross-node I/Os are eliminated.
  • Performance remains stable even when workloads shift between high and low contention levels.
  • No hardware changes are required; gains come from compute-side changes only.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same pessimistic-plus-write-combining pattern could be tested on disaggregated databases or object stores facing similar network bottlenecks.
  • An adaptive version that switches between optimistic and pessimistic modes based on measured contention might extend the approach to mixed workloads.
  • Future systems with faster interconnects could still benefit if write-combining reduces total data movement rather than just latency.
  • Measuring energy use on the memory nodes before and after CIDER would show whether lower I/O volume also cuts power draw.

Load-bearing premise

The root cause of redundant I/Os is the mismatch between optimistic synchronization of existing memory-disaggregated KV stores and the highly concurrent workloads on DM.

What would settle it

Run the same high-concurrency YCSB workload on a state-of-the-art memory-disaggregated KV store with and without CIDER, then compare the exact count of remote I/Os generated; a large drop would support the claim.

Figures

Figures reproduced from arXiv: 2604.03007 by Jiacheng Shen, Xin Wang, Xuchuan Luo, Yangfan Zhou, Yuxuan Du.

Figure 1
Figure 1. Figure 1: The throughput and retry count of the pointer array with opti￾mistic synchronization under a highly-contented write-intensive workload. 16 32 64 128 256 512 Number of Clients 0 2 4 6 Throughput (Mops/s) Optimistic ShiftLock [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 7
Figure 7. Figure 7: The workflow of global WC. Based on this observation, we propose global WC over the MCS lock. In the rest of this section, we first introduce how CIDER achieves UPDATE operations in Section 4.2.1. INSERT and DELETE operations are separately introduced in Section 4.2.2 since they are handled differently to guarantee correctness. 4.2.1 Combining UPDATE operations [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The structures of the lock node, lock entry, and data pointer. [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: The workflows of SEARCH, INSERT, and UPDATE operations under the optimistic mode. The contention-aware synchronization scheme enables fine￾grained and seamless transitions between synchronization modes for individual KV pairs. For hot KV pairs, pessimistic synchro￾nization with global WC eliminates redundant IDU operations by batching modifications and preventing unnecessary retries. For cold KV pairs, opt… view at source ↗
Figure 10
Figure 10. Figure 10: The workflow of the UPDATE and DELETE operations under the pessimistic mode. The solid lines indicate RDMA_READ and RDMA_WRITE, and the dashed lines indicate atomic RDMA_CAS and RDMA_FAA. acquiring the lock, the client conducts the remote out-of-place up￾date, i.e., writes the new KV data (○3 ) and performs an RDMA_CAS to update the pointer (○4 ). After that, the client releases the lock (○5 ). The client… view at source ↗
Figure 11
Figure 11. Figure 11: The throughput comparison on a pointer array. [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗
Figure 13
Figure 13. Figure 13: The throughput and latency of CIDER and base [PITH_FULL_IMAGE:figures/full_fig_p010_13.png] view at source ↗
Figure 16
Figure 16. Figure 16: The end-to-end throughput on RACE. 32 64 128256512 Number of Clients 0 50 100 150 P50 Latency (us) CIDER P50 O-SYNC P50 CAS P50 ShiftLock P50 32 64 128256512 Number of Clients 0 50 100 150 32 64 128256512 Number of Clients 0 50 100 150 10 1 10 2 10 3 10 4 CIDER P99 O-SYNC P99 CAS P99 ShiftLock P99 0 50 100 150 10 1 10 2 10 3 10 4 P99 Latency (us) (a) Write-intensive (b) Read-intensive (c) Write-only [PIT… view at source ↗
Figure 18
Figure 18. Figure 18: The end-to-end throughput on SMART. 32 64 128256512 Number of Clients 0 50 100 150 P50 Latency (us) CIDER P50 O-SYNC P50 CAS P50 ShiftLock P50 32 64 128256512 Number of Clients 0 50 100 150 32 64 128256512 Number of Clients 0 50 100 150 10 1 10 2 10 3 10 4 CIDER P50 O-SYNC P50 CAS P50 ShiftLock P50 0 25 50 75 10 1 10 2 10 3 10 4 P99 Latency (us) (a) Write-intensive (b) Read-intensive (c) Write-only [PITH… view at source ↗
Figure 21
Figure 21. Figure 21: The efficiency compari￾son of different WC mechanisms. TPC-C 0 0.6 1.2 Throughput (Mtxns/s) CAS CIDER ShiftLock TATP 0 4 8 [PITH_FULL_IMAGE:figures/full_fig_p011_21.png] view at source ↗
Figure 23
Figure 23. Figure 23: The performance comparison as a function of the array [PITH_FULL_IMAGE:figures/full_fig_p015_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: The performance comparison as a function of the value [PITH_FULL_IMAGE:figures/full_fig_p015_24.png] view at source ↗
read the original abstract

Memory-disaggregated key-value (KV) stores suffer from a severe performance bottleneck due to their I/O redundancy issues. A huge amount of redundant I/Os are generated when synchronizing concurrent data accesses, making the limited network between the compute and memory pools of DM a performance bottleneck. We identify the root cause for the redundant I/O lies in the mismatch between the optimistic synchronization of existing memory-disaggregated KV stores and the highly concurrent workloads on DM. In this paper, we propose to boost memory-disaggregated KV stores with pessimistic synchronization. We propose CIDER, a compute-side I/O optimization framework, to verify our idea. CIDER adopts a global write-combining technique to further reduce cross-node redundant I/Os. A contention-aware synchronization scheme is designed to improve the performance of pessimistic synchronization under low contention scenarios. Experimental results show that CIDER effectively improves the throughput of state-of-the-art memory-disaggregated KV stores by up to $6.6\times$ under the YCSB benchmark.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that memory-disaggregated KV stores incur severe redundant cross-node I/Os under high concurrency because their optimistic synchronization protocols retry on conflicts, and proposes CIDER, a compute-side framework that switches to pessimistic synchronization, adds global write-combining to batch updates, and uses a contention-aware fallback to avoid pessimism overhead under low contention, yielding up to 6.6× throughput on YCSB.

Significance. If the experimental claims hold, the work supplies a concrete, deployable alternative to optimistic designs that directly targets the network bottleneck in disaggregated memory; the combination of pessimistic locking with write-combining is a practical insight that could be adopted by systems such as FaRM or HERD variants and is strengthened by the reproducible YCSB evaluation.

major comments (2)
  1. [§4] §4.1–4.3 and Figure 7: the reported 6.6× throughput gain is presented without naming the exact baseline implementations, the precise YCSB workload mix (read/write ratio, key distribution), number of runs, or error bars; without these the central performance claim cannot be fully assessed.
  2. [§3.2] §3.2, Algorithm 1: the global write-combining logic is described at a high level but lacks a proof or invariant showing that it eliminates the redundant I/Os identified in §2 without introducing new ordering or consistency violations under the assumed DM model.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'state-of-the-art memory-disaggregated KV stores' should list the concrete systems (e.g., 'FaRM, HERD, and KVell') to allow readers to map the 6.6× claim immediately.
  2. [§2.2] §2.2: the definition of 'redundant I/O' is informal; a short equation or pseudocode quantifying the extra round-trips per conflict would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment of our work and the recommendation for minor revision. We address the two major comments below, providing clarifications and committing to improvements in the revised manuscript.

read point-by-point responses
  1. Referee: [§4] §4.1–4.3 and Figure 7: the reported 6.6× throughput gain is presented without naming the exact baseline implementations, the precise YCSB workload mix (read/write ratio, key distribution), number of runs, or error bars; without these the central performance claim cannot be fully assessed.

    Authors: We agree that these experimental details are essential for full assessment and reproducibility. In the revised version we will explicitly name the baseline implementations (the specific state-of-the-art optimistic memory-disaggregated KV stores), state the exact YCSB parameters (50/50 read/write ratio, Zipfian key distribution with the reported skew), report that all results are averages over 5 independent runs, and add error bars to Figure 7. These additions will be incorporated without altering any performance numbers. revision: yes

  2. Referee: [§3.2] §3.2, Algorithm 1: the global write-combining logic is described at a high level but lacks a proof or invariant showing that it eliminates the redundant I/Os identified in §2 without introducing new ordering or consistency violations under the assumed DM model.

    Authors: We acknowledge that a more explicit invariant would strengthen the section. Under the DM model the memory pool is passive and all operations are performed via RDMA; the global write-combiner serializes updates to each key at the compute side before issuing a single write, thereby removing the redundant read-modify-write sequences that arise from optimistic retries. Because pessimistic locks already enforce mutual exclusion and the combiner produces a total order on writes to the same key, no new ordering or consistency violations are introduced. We will add a short paragraph in §3.2 stating this invariant and its relation to the DM assumptions. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an experimental systems contribution. It identifies a root cause (optimistic synchronization generating redundant cross-node I/Os under high concurrency on disaggregated memory), proposes CIDER with pessimistic synchronization plus global write-combining and contention-aware fallback, and validates the design via YCSB throughput measurements (up to 6.6×). No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the abstract or described derivation. The central claims rest on external benchmark results rather than reducing to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces CIDER as a new optimization framework but does not explicitly list free parameters, axioms, or invented entities; it relies on standard distributed-systems assumptions about network I/O costs and contention behavior.

pith-pipeline@v0.9.0 · 5484 in / 1171 out tokens · 40519 ms · 2026-05-13T18:19:19.584361+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages

  1. [1]

    TATP Benchmark

    2025. TATP Benchmark. https://tatpbenchmark.sourceforge.net/. Accessed: 2025

  2. [2]

    Aguilera, Naama Ben-David, Rachid Guerraoui, Antoine Murat, Athanasios Xygkis, and Igor Zablotchi

    Marcos K. Aguilera, Naama Ben-David, Rachid Guerraoui, Antoine Murat, Athanasios Xygkis, and Igor Zablotchi. 2023. uBFT: Microsecond-Scale BFT using Disaggregated Memory. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Vol- ume 2, ASPLOS 2023, Vancouver, BC, Canada, March 25-...

  3. [3]

    Aguilera, Aurojit Panda, Sylvia Ratnasamy, and Scott Shenker

    Emmanuel Amaro, Christopher Branner-Augmon, Zhihong Luo, Amy Ouster- hout, Marcos K. Aguilera, Aurojit Panda, Sylvia Ratnasamy, and Scott Shenker

  4. [4]

    InEuroSys ’20: Fifteenth Eu- roSys Conference 2020, Heraklion, Greece, April 27-30, 2020

    Can far memory improve job throughput?. InEuroSys ’20: Fifteenth Eu- roSys Conference 2020, Heraklion, Greece, April 27-30, 2020. ACM, 14:1–14:16. https://doi.org/10.1145/3342195.3387522

  5. [5]

    Aguilera

    Emmanuel Amaro, Stephanie Wang, Aurojit Panda, and Marcos K. Aguilera. 2023. Logical Memory Pools: Flexible and Local Disaggregated Memory. InProceedings of the 22nd ACM Workshop on Hot Topics in Networks, HotNets 2023, Cambridge, MA, USA, November 28-29, 2023. ACM, 25–32. https://doi.org/10.1145/3626111. 3628201

  6. [6]

    Hang An, Fang Wang, Dan Feng, Xiaomin Zou, Zefeng Liu, and Jianshun Zhang

  7. [7]

    InProceedings of the 52nd International Conference on Parallel Processing, ICPP 2023, Salt Lake City, UT, USA, August 7-10, 2023

    Marlin: A Concurrent and Write-Optimized B+-tree Index on Disaggregated Memory. InProceedings of the 52nd International Conference on Parallel Processing, ICPP 2023, Salt Lake City, UT, USA, August 7-10, 2023. ACM, 695–704. https: //doi.org/10.1145/3605573.3605576

  8. [8]

    Accessed: 2025

    InfiniBand Trade Association. Accessed: 2025. Enabling the Modern Data Center – RDMA for the Enterprise. https://www.infinibandta.org

  9. [9]

    Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny

  10. [10]

    InProceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems

    Workload analysis of a large-scale key-value store. InProceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems. 53–64

  11. [11]

    Shai Bergman, Priyank Faldu, Boris Grot, Lluís Vilanova, and Mark Silber- stein. 2022. Reconsidering OS memory optimizations in the presence of dis- aggregated memory. InISMM ’22: ACM SIGPLAN International Symposium on Memory Management, San Diego, CA, USA, 14 June 2022. ACM, 1–14. https://doi.org/10.1145/3520263.3534650

  12. [12]

    Rethinking software runtimes for disaggregated memory,

    Irina Calciu, M. Talha Imran, Ivan Puddu, Sanidhya Kashyap, Hasan Al Maruf, Onur Mutlu, and Aasheesh Kolli. 2021. Rethinking software runtimes for disaggre- gated memory. InASPLOS ’21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Virtual Event, USA, April 19-23, 2021, Tim Sherwood, Emery D. Be...

  13. [13]

    Lei Chen, Shi Liu, Chenxi Wang, Haoran Ma, Yifan Qiao, Zhe Wang, Chenggang Wu, Youyou Lu, Xiaobing Feng, Huimin Cui, Shan Lu, and Harry Xu. 2024. A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient Far-Memory Applications. In18th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2024, Santa Clara, CA, USA, July 10-12, 2024. U...

  14. [14]

    Zhangyu Chen, Yu Hua, Bo Ding, and Pengfei Zuo. 2020. Lock-free Concurrent Level Hashing for Persistent Memory. InProceedings of the 2020 USENIX Annual Technical Conference, USENIX ATC 2020, July 15-17, 2020. USENIX Association, 799–812. https://www.usenix.org/conference/atc20/presentation/chen

  15. [15]

    Dah-Ming Chiu and Raj Jain. 1989. Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks.Comput. Networks 17 (1989), 1–14. https://doi.org/10.1016/0169-7552(89)90019-6

  16. [16]

    Accessed: 2025

    CXL Consortium. Accessed: 2025. Compute Express Link. https://www.comput eexpresslink.org

  17. [17]

    Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears

    Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. InProceedings of the 1st ACM Symposium on Cloud Computing, SoCC 2010, Indianapolis, Indiana, USA, June 10-11, 2010. ACM, 143–154. https://doi.org/10.1145/1807128.1807152

  18. [18]

    Accessed: 2025

    NVIDIA Corporation. Accessed: 2025. Advanced Transport. https://docs.nvidia. com/networking/display/ofedv502180/advanced+transport

  19. [19]

    Ananth Devulapalli and Pete Wyckoff. 2005. Distributed Queue-Based Locking Using Advanced Network Features. In34th International Conference on Parallel Processing (ICPP 2005), 14-17 June 2005, Oslo, Norway. IEEE Computer Society, 408–415. https://doi.org/10.1109/ICPP.2005.34

  20. [20]

    Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong, Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David Johnson, Kirk Webb, Aditya Akella, Kuang-Ching Wang, Glenn Ricart, Larry Landweber, Chip Elliott, Michael Zink, Emmanuel Cecchet, Snigdhaswin Kar, and Prabodh Mishra. 2019. The Design and Operation of CloudLab. In2019 USENIX Annual T...

  21. [21]

    Andersen, and Michael Kaminsky

    Bin Fan, David G. Andersen, and Michael Kaminsky. 2013. MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2013, Lombard, IL, USA, April 2-5, 2013. USENIX Association, 371–384. https://www.usenix.org/conference/nsdi13/technical-ses...

  22. [22]

    Jian Gao, Qing Wang, and Jiwu Shu. 2025. ShiftLock: Mitigate One-sided RDMA Lock Contention via Handover. In23rd USENIX Conference on File and Storage Technologies, FAST 2025, Santa Clara, CA, February 25-27, 2025. USENIX Associa- tion, 355–372. https://www.usenix.org/conference/fast25/presentation/gao

  23. [23]

    Zhiyuan Guo, Yizhou Shan, Xuhao Luo, Yutong Huang, and Yiying Zhang. 2022. Clio: a hardware-software co-designed disaggregated memory system. InASPLOS ’22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022 - 4 March 2022. ACM, 417–433. https://doi.org/10.1145...

  24. [24]

    Junhyeok Jang, Hanjin Choi, Hanyeoreum Bae, Seungjun Lee, Miryeong Kwon, and Myoungsoo Jung. 2023. CXL-ANNS: Software-Hardware Collaborative Memory Disaggregation and Computation for Billion-Scale Approximate Nearest Neighbor Search. InProceedings of the 2023 USENIX Annual Technical Conference, USENIX ATC 2023, Boston, MA, USA, July 10-12, 2023. USENIX As...

  25. [25]

    https://www.usenix.org/conference/atc23/presentation/jang

  26. [26]

    Seung-Seob Lee, Yanpeng Yu, Yupeng Tang, Anurag Khandelwal, Lin Zhong, and Abhishek Bhattacharjee. 2021. MIND: In-Network Memory Management for Disaggregated Data Centers. InSOSP ’21: ACM SIGOPS 28th Symposium on Operating Systems Principles, Virtual Event / Koblenz, Germany, October 26-29,

  27. [27]

    https://doi.org/10.1145/3477132.3483561

    ACM, 488–504. https://doi.org/10.1145/3477132.3483561

  28. [28]

    Aguilera, Kimberly Keeton, and Vijay Chidambaram

    Se Kwon Lee, Soujanya Ponnapalli, Sharad Singhal, Marcos K. Aguilera, Kimberly Keeton, and Vijay Chidambaram. 2022. DINOMO: An Elastic, Scalable, High- Performance Key-Value Store for Disaggregated Persistent Memory.Proc. VLDB Endow.15, 13 (2022), 4023–4037. https://www.vldb.org/pvldb/vol15/p4023- lee.pdf

  29. [29]

    Pengfei Li, Yu Hua, Pengfei Zuo, Zhangyu Chen, and Jiajie Sheng. 2023. ROLEX: A Scalable RDMA-oriented Learned Key-Value Store for Disaggregated Memory Systems. In21st USENIX Conference on File and Storage Technologies, FAST 2023, Santa Clara, CA, USA, February 21-23, 2023. USENIX Association, 99–114. https://www.usenix.org/conference/fast23/presentation/...

  30. [30]

    Lyu, and Yang- fan Zhou

    Xuchuan Luo, Jiacheng Shen, Pengfei Zuo, Xin Wang, Michael R. Lyu, and Yang- fan Zhou. 2024. CHIME: A Cache-Efficient and High-Performance Hybrid Index on Disaggregated Memory. InProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, SOSP 2024, Austin, TX, USA, November 4-6, 2024. ACM, 110–126. https://doi.org/10.1145/3694715.3695959

  31. [31]

    Lyu, and Yangfan Zhou

    Xuchuan Luo, Pengfei Zuo, Jiacheng Shen, Jiazhen Gu, Xin Wang, Michael R. Lyu, and Yangfan Zhou. 2023. SMART: A High-Performance Adaptive Radix Tree for Disaggregated Memory. In17th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2023, Boston, MA, USA, July 10-12, 2023. USENIX Association, 553–571. https://www.usenix.org/conference/o...

  32. [32]

    Lynch and A.A

    N.A. Lynch and A.A. Shvartsman. 1997. Robust emulation of shared memory using dynamic quorum-acknowledged broadcasts. InProceedings of IEEE 27th International Symposium on Fault Tolerant Computing. 272–281

  33. [33]

    Haoran Ma, Yifan Qiao, Shi Liu, Shan Yu, Yuanjiang Ni, Qingda Lu, Jiesheng Wu, Yiying Zhang, Miryung Kim, and Harry Xu. 2024. DRust: Language-Guided Distributed Shared Memory with Fine Granularity, Full Transparency, and Ultra Efficiency. In18th USENIX Symposium on Operating Systems Design and Imple- mentation, OSDI 2024, Santa Clara, CA, USA, July 10-12,...

  34. [34]

    Waldspurger

    Hasan Al Maruf, Yuhong Zhong, Hongyi Wang, Mosharaf Chowdhury, Asaf Cidon, and Carl A. Waldspurger. 2023. Memtrade: Marketplace for Disaggregated Memory Clouds.Proc. ACM Meas. Anal. Comput. Syst.7, 2 (2023), 41:1–41:27. https://doi.org/10.1145/3589985

  35. [35]

    Mellor-Crummey and Michael L

    John M. Mellor-Crummey and Michael L. Scott. 1991. Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors.ACM Trans. Comput. Syst. 9, 1 (1991), 21–65. https://doi.org/10.1145/103727.103729

  36. [36]

    Memcached Development Team. 2025. Memcached: a distributed memory object caching system. https://memcached.org/. Accessed: 2025

  37. [37]

    Xinhao Min, Kai Lu, Pengyu Liu, Jiguang Wan, Changsheng Xie, Daohui Wang, Ting Yao, and Huatao Wu. 2024. SepHash: A Write-Optimized Hash Index On Disaggregated Memory via Separate Segment Structure.Proc. VLDB Endow.17, 5 (2024), 1091–1104. https://www.vldb.org/pvldb/vol17/p1091-lu.pdf

  38. [38]

    Sumit Kumar Monga, Sanidhya Kashyap, and Changwoo Min. 2021. Birds of a Feather Flock Together: Scaling RDMA RPCs with Flock. InSOSP ’21: ACM SIGOPS 28th Symposium on Operating Systems Principles, Virtual Event / Koblenz, Germany, October 26-29, 2021. ACM, 212–227. https://doi.org/10.1145/3477132. 3483576

  39. [39]

    Marnidala, Abhinav Vishnu, Karthikeyan Vaidyanathan, and Dhabaleswar K

    Sundeep Narravula, A. Marnidala, Abhinav Vishnu, Karthikeyan Vaidyanathan, and Dhabaleswar K. Panda. 2007. High Performance Distributed Lock Man- agement Services using Network-based Remote Atomic Operations. InSev- enth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 14-17 May 2007, Rio de Janeiro, Brazil. IEEE Computer Soci...

  40. [40]

    Vlad Nitu, Boris Teabe, Alain Tchana, Canturk Isci, and Daniel Hagimont. 2018. Welcome to zombieland: practical and energy-efficient memory disaggregation in a datacenter. InProceedings of the Thirteenth EuroSys Conference, EuroSys 2018, Porto, Portugal, April 23-26, 2018. ACM, 16:1–16:12. https://doi.org/10.1145/ 3190508.3190537

  41. [41]

    Feng Ren, Mingxing Zhang, Kang Chen, Huaxia Xia, Zuoning Chen, and Yongwei Wu. 2024. Scaling Up Memory Disaggregated Applications with SMART. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1, ASPLOS 2024, La Jolla, CA, USA, 27 April 2024- 1 May 2024. ACM, 351–367. ht...

  42. [42]

    Aguilera, and Adam Belay

    Zhenyuan Ruan, Malte Schwarzkopf, Marcos K. Aguilera, and Adam Belay

  43. [43]

    In14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4-6, 2020

    AIFM: High-Performance, Application-Integrated Far Memory. In14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4-6, 2020. USENIX Association, 315–332. https: //www.usenix.org/conference/osdi20/presentation/ruan

  44. [44]

    Salvatore Sanfilippo and Redis Ltd. 2025. Redis. https://redis.io. Accessed: 2025

  45. [45]

    Michael L. Scott. 2013.Shared-Memory Synchronization. Morgan & Claypool Publishers. https://doi.org/10.2200/S00499ED1V01Y201304CAC023

  46. [46]

    Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. 2018. LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation. In13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8-10, 2018. USENIX Association, 69–87. https://www.usenix.org/conference/osdi18/presentation/shan

  47. [47]

    Jiacheng Shen, Pengfei Zuo, Xuchuan Luo, Yuxin Su, Jiazhen Gu, Hao Feng, Yangfan Zhou, and Michael R. Lyu. 2023. Ditto: An Elastic and Adaptive Memory- Disaggregated Caching System. InProceedings of the 29th Symposium on Operat- ing Systems Principles, SOSP 2023, Koblenz, Germany, October 23-26, 2023. ACM, 675–691. https://doi.org/10.1145/3600006.3613144

  48. [48]

    Jiacheng Shen, Pengfei Zuo, Xuchuan Luo, Tianyi Yang, Yuxin Su, Yangfan Zhou, and Michael R. Lyu. 2023. FUSEE: A Fully Memory-Disaggregated Key- Value Store. In21st USENIX Conference on File and Storage Technologies, FAST 2023, Santa Clara, CA, USA, February 21-23, 2023. USENIX Association, 81–98. https://www.usenix.org/conference/fast23/presentation/shen

  49. [49]

    Transaction Processing Performance Council. 2025. TPC Benchmark C (TPC-C). https://www.tpc.org/tpcc/. Accessed: 2025

  50. [50]

    Lluís Vilanova, Lina Maudlej, Shai Bergman, Till Miemietz, Matthias Hille, Nils Asmussen, Michael Roitzsch, Hermann Härtig, and Mark Silberstein. 2022. Slash- ing the disaggregation tax in heterogeneous data centers with FractOS. InEuroSys ’22: Seventeenth European Conference on Computer Systems, Rennes, France, April 5 - 8, 2022. ACM, 352–367. https://do...

  51. [51]

    Bond, Ravi Netravali, Miryung Kim, and Guoqing Harry Xu

    Chenxi Wang, Haoran Ma, Shi Liu, Yuanqi Li, Zhenyuan Ruan, Khanh Nguyen, Michael D. Bond, Ravi Netravali, Miryung Kim, and Guoqing Harry Xu. 2020. Semeru: A Memory-Disaggregated Managed Runtime. In14th USENIX Sympo- sium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4-6, 2020. USENIX Association, 261–280. https://www.u...

  52. [52]

    Chenxi Wang, Haoran Ma, Shi Liu, Yifan Qiao, Jonathan Eyolfson, Christian Navasca, Shan Lu, and Guoqing Harry Xu. 2022. MemLiner: Lining up Tracing and Application for a Far-Memory-Friendly Runtime. In16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022, Carlsbad, CA, USA, July 11-13, 2022. USENIX Association, 35–53. https://www...

  53. [53]

    Qing Wang, Youyou Lu, and Jiwu Shu. 2022. Sherman: A Write-Optimized Dis- tributed B+Tree Index on Disaggregated Memory. InSIGMOD ’22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022. ACM, 1033–1048. https://doi.org/10.1145/3514221.3517824

  54. [54]

    Qing Wang, Youyou Lu, and Jiwu Shu. 2025. Designing an Efficient Tree Index on Disaggregated Memory.Commun. ACM68, 05 (2025), 92–100. doi: 10.1145/ 3709647

  55. [55]

    Qing Wang, Youyou Lu, Erci Xu, Junru Li, Youmin Chen, and Jiwu Shu. 2021. Concordia: Distributed Shared Memory with In-Network Cache Coherence. In 19th USENIX Conference on File and Storage Technologies, FAST 2021, February 23-25, 2021. USENIX Association, 277–292. https://www.usenix.org/conference/ fast21/presentation/wang

  56. [56]

    Xingda Wei, Zhiyuan Dong, Rong Chen, and Haibo Chen. 2018. Deconstructing RDMA-enabled Distributed Transactions: Hybrid is Better!. In13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8-10, 2018. USENIX Association, 233–251. https://www.usen ix.org/conference/osdi18/presentation/wei

  57. [57]

    Xingda Wei, Jiaxin Shi, Yanzhe Chen, Rong Chen, and Haibo Chen. 2015. Fast in-memory transaction processing using RDMA and HTM. InProceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015, Monterey, CA, USA, October 4-7, 2015. ACM, 87–104. https://doi.org/10.1145/2815400.2815419

  58. [58]

    Juncheng Yang, Yao Yue, and KV Rashmi. 2020. A large scale analysis of hundreds of in-memory cache clusters at Twitter. In14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 191–208

  59. [59]

    Dong Young Yoon, Mosharaf Chowdhury, and Barzan Mozafari. 2018. Dis- tributed Lock Management with RDMA: Decentralization without Starvation. InProceedings of the 2018 International Conference on Management of Data, SIG- MOD Conference 2018, Houston, TX, USA, June 10-15, 2018. ACM, 1571–1586. https://doi.org/10.1145/3183713.3196890

  60. [60]

    Zhuolong Yu, Yiwen Zhang, Vladimir Braverman, Mosharaf Chowdhury, and Xin Jin. 2020. NetLock: Fast, Centralized Lock Management Using Programmable Switches. InSIGCOMM ’20: Proceedings of the 2020 Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communicat...

  61. [61]

    Daniel Zahka and Ada Gavrilovska. 2022. FAM-Graph: Graph Analytics on Disaggregated Memory. In2022 IEEE International Parallel and Distributed Pro- cessing Symposium, IPDPS 2022, Lyon, France, May 30 - June 3, 2022. IEEE, 81–92. https://doi.org/10.1109/IPDPS53621.2022.00017

  62. [62]

    Hanze Zhang, Ke Cheng, Rong Chen, and Haibo Chen. 2024. Fast and Scalable In-network Lock Management Using Lock Fission. In18th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2024, Santa Clara, CA, USA, July 10-12, 2024. USENIX Association, 251–268. https://www.usenix.org/c onference/osdi24/presentation/zhang-hanze

  63. [63]

    Ming Zhang, Yu Hua, and Zhijun Yang. 2024. Motor: Enabling Multi-Versioning for Distributed Transactions on Disaggregated Memory. In18th USENIX Sym- posium on Operating Systems Design and Implementation, OSDI 2024, Santa Clara, CA, USA, July 10-12, 2024. USENIX Association, 801–819. https: //www.usenix.org/conference/osdi24/presentation/zhang-ming

  64. [64]

    Ming Zhang, Yu Hua, Pengfei Zuo, and Lurong Liu. 2022. FORD: Fast One-sided RDMA-based Distributed Transactions for Disaggregated Persistent Memory. In 20th USENIX Conference on File and Storage Technologies, FAST 2022, Santa Clara, CA, USA, February 22-24, 2022. USENIX Association, 51–68. https://www.usen ix.org/conference/fast22/presentation/zhang-ming

  65. [65]

    Bernstein, Daniel S

    Qizhen Zhang, Philip A. Bernstein, Daniel S. Berger, and Badrish Chandramouli

  66. [66]

    VLDB Endow.15, 4 (2021), 766–779

    Redy: Remote Dynamic Memory Cache.Proc. VLDB Endow.15, 4 (2021), 766–779. https://www.vldb.org/pvldb/vol15/p766-zhang.pdf

  67. [67]

    Qizheni Zhang, Yifan Cai, Sebastian Angel, Vincent Liu, Ang Chen, and Boon Thau Loo. 2020. Rethinking Data Management Systems for Disaggre- gated Data Centers. In10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12-15, 2020, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2020/papers/p6-zhang-cidr20.pdf

  68. [68]

    Qizhen Zhang, Xinyi Chen, Sidharth Sankhe, Zhilei Zheng, Ke Zhong, Sebastian Angel, Ang Chen, Vincent Liu, and Boon Thau Loo. 2022. Optimizing Data- intensive Systems in Disaggregated Data Centers with TELEPORT. InSIGMOD ’22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022. ACM, 1345–1359. https://doi.org/10.1145/...

  69. [69]

    Pengfei Zuo, Jiazhao Sun, Liu Yang, Shuangwu Zhang, and Yu Hua. 2021. One- sided RDMA-Conscious Extendible Hashing for Disaggregated Memory. InPro- ceedings of the 2021 USENIX Annual Technical Conference, USENIX ATC 2021, July 14-16, 2021. USENIX Association, 15–29. https://www.usenix.org/conference/at c21/presentation/zuo 6K 60K 600K 6M 60M Array Size 0 ...