Octopus: Enhancing CXL Memory Pods via Sparse Topology
Pith reviewed 2026-05-23 05:10 UTC · model grok-4.3
The pith
Octopus uses sparse CXL connections and server islands to scale memory pods without switches while cutting RPC latency and costs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Octopus constructs a sparse CXL topology in which each pooling device connects to a carefully chosen subset of servers grouped into islands; this arrangement delivers low-latency intra-island communication, sufficient inter-island overlap for pooling, and scalability to large pods without switches while staying inside 1.5 m cable constraints.
What carries the argument
Island grouping that creates low-latency clusters while allowing controlled overlap between clusters for memory pooling.
If this is right
- RPCs complete 3.2 times faster than in-rack RDMA on the hardware prototype.
- RPCs complete 2.4 times faster than through CXL switches on the hardware prototype.
- Simulated 96-server pods deliver 3 to 5.4 percent net server cost savings.
- CXL-switch designs produce a net cost increase in the same simulations.
- The topology respects 1.5 m copper cable limits while scaling to 96 servers.
Where Pith is reading between the lines
- The same sparse-island pattern could be tested on other disaggregated interconnects such as PCIe or NVLink to see if similar latency and cost gains appear.
- Dynamic adjustment of island boundaries at runtime might further improve overlap for changing workload patterns.
- Reducing reliance on switches could lower overall rack power draw even if the paper does not measure power directly.
- Standards bodies might consider specifying lower port counts on future CXL devices if sparse topologies prove reliable at scale.
Load-bearing premise
A carefully chosen sparse subset of connections per pooling device combined with island grouping can satisfy low-latency intra-island communication, sufficient inter-island pooling overlap, and physical cable length limits without hidden performance or scalability penalties.
What would settle it
Running the three-server hardware prototype at larger scale, such as 16 servers, and checking whether RPC latency remains 3.2 times faster than RDMA and net cost savings stay positive under measured device characteristics and 1.5 m cables.
Figures
read the original abstract
The Compute Express Link (CXL) interconnect enables compute "pods" that pool memory across servers to reduce cost and improve efficiency. These pods also facilitate pairwise communication whose needs conflict with pooling. Importantly, existing pod designs are small or require indirection through expensive switches. These conventional designs implicitly assume that pods must fully connect all servers to all CXL pooling devices. This paper breaks with this conventional wisdom by introducing Octopus pods. Octopus directly connects servers to low-port-count CXL pooling devices (e.g., 4 ports) yet scales to large pods without switches by constructing a sparse CXL topology in which each pooling device connects to a carefully chosen subset of servers. Octopus explicitly balances "overlap", where two servers connect to the same pooling device: overlap reduces pooling efficiency but enables low-latency communication. Octopus resolves this tension by grouping servers into "islands" with low-latency intra-island communication and interconnecting islands to favor pooling. We build a three-server CXL pod prototype and simulate scaled pods with 96 servers under measured device characteristics and physical constraints (1.5 m copper cables). On hardware, Octopus RPCs are 3.2x faster than in-rack RDMA and 2.4x faster than CXL switches. In simulation, Octopus achieves net server cost savings of 3-5.4% whereas CXL switches result in a net cost increase.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that conventional CXL memory pods require full connectivity or expensive switches, but Octopus enables scalable pods by using low-port-count pooling devices in a sparse topology: servers are grouped into islands for low-latency intra-island communication while islands are interconnected to maintain sufficient pooling overlap, all subject to 1.5 m cable limits. A 3-server hardware prototype demonstrates 3.2x faster RPCs than in-rack RDMA and 2.4x faster than CXL switches; 96-server simulations under measured device traits report 3-5.4% net server cost savings (versus net increases with switches).
Significance. If the central claims hold, the work could meaningfully reduce the cost and complexity of large CXL pods by eliminating switches while preserving both pooling efficiency and low-latency pairwise communication. Strengths include the hardware prototype driven by real device characteristics and physical cable constraints, plus the explicit framing of the overlap-versus-pooling tension. The island-based sparse construction is a concrete alternative to the full-connectivity assumption in prior designs.
major comments (2)
- [§4] §4 (Hardware Prototype): The three-server prototype permits near-full connectivity within the 1.5 m cable limit, so the measured 3.2x RPC speedup does not exercise the island-grouping or sparse-subset tension that the 96-server simulation claims to resolve; this leaves the simulation cost savings dependent on an unvalidated extrapolation.
- [§5] §5 (Simulation Results): The algorithm or procedure used to select the sparse per-device subsets and to define island boundaries is not described with sufficient detail (no pseudocode, parameters, or validation against cable and overlap constraints), so the reported 3-5.4% savings cannot be independently checked for robustness under the stated physical limits.
minor comments (2)
- [Abstract] Abstract: omits any mention of how sparse subsets or island boundaries are chosen, which is load-bearing for the claimed benefits.
- [§3] Figure captions and §3 notation for overlap and pooling metrics could be made more precise to aid readability.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive review. We address each major comment below, indicating where revisions will be made to the manuscript.
read point-by-point responses
-
Referee: [§4] §4 (Hardware Prototype): The three-server prototype permits near-full connectivity within the 1.5 m cable limit, so the measured 3.2x RPC speedup does not exercise the island-grouping or sparse-subset tension that the 96-server simulation claims to resolve; this leaves the simulation cost savings dependent on an unvalidated extrapolation.
Authors: We agree with the referee that the three-server prototype, given its small scale, permits near-full connectivity within the cable length limit and therefore does not fully exercise the sparse topology or island-grouping mechanism central to the 96-server simulation. The prototype primarily serves to validate the performance of direct CXL connections using real hardware and measured device characteristics under physical constraints. The simulation then applies these traits to larger sparse configurations. This represents a genuine limitation in hardware validation of the scaling claims. In the revised manuscript, we will update §4 to clearly distinguish the prototype's contributions from the simulation, explicitly note the extrapolation involved, and discuss potential avenues for further validation. We will also add a brief sensitivity analysis in the simulation section if space permits. revision: partial
-
Referee: [§5] §5 (Simulation Results): The algorithm or procedure used to select the sparse per-device subsets and to define island boundaries is not described with sufficient detail (no pseudocode, parameters, or validation against cable and overlap constraints), so the reported 3-5.4% savings cannot be independently checked for robustness under the stated physical limits.
Authors: The referee is correct that the current manuscript lacks detailed description of the algorithm for selecting sparse subsets and defining island boundaries. No pseudocode or explicit parameters are provided, which hinders reproducibility and independent verification of the results under the cable and overlap constraints. We will revise the paper by adding a new subsection in §5 (or an appendix) that includes: (1) pseudocode for the island construction and subset selection procedure, (2) all relevant parameters and thresholds used (e.g., overlap targets, cable length enforcement), and (3) validation checks ensuring the generated topologies respect the 1.5 m limit and maintain required pooling overlap. This will allow readers to reproduce and assess the robustness of the 3-5.4% cost savings. revision: yes
Circularity Check
No significant circularity; empirical results from hardware and external measurements
full rationale
The paper reports latency and cost numbers directly from a 3-server hardware prototype and from simulations driven by measured device characteristics plus physical cable constraints (1.5 m). No equations, fitted parameters, or topology-construction procedure are shown to be self-definitional or to rename a prediction as a result. The central design choice (sparse island-based topology) is presented as an engineering tradeoff rather than a derived theorem, and quantitative claims do not reduce to the inputs by construction. This is the normal case of a measurement-driven systems paper.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption CXL pooling devices have low port counts (e.g., 4 ports)
- domain assumption Physical cable length is limited to 1.5 m copper
Forward citations
Cited by 1 Pith paper
-
Proxics: an efficient programming model for far memory accelerators
Proxics introduces lightweight virtual processors and low-latency communication channels as portable OS abstractions for programming near-data processing accelerators, demonstrated on real hardware for memory-intensiv...
Reference graph
Works this paper leans on
-
[1]
Mpi allgather utilizing cxl shared memory pool in multi-node computing systems
Hooyoung Ahn, Seonyoung Kim, Yoomi Park, Woojong Han, Shiny- oung Ahn, Tu Tran, Bharath Ramesh, Hari Subramoni, and Dha- baleswar K Panda. Mpi allgather utilizing cxl shared memory pool in multi-node computing systems. In 2024 IEEE International Conference on Big Data (BigData) , pages 332–337. IEEE, 2024
work page 2024
-
[2]
An examination of cxl memory use cases for in-memory database management systems using sap hana
Minseon Ahn, Thomas Willhalm, Norman May, Donghun Lee, Suprasad Mutalik Desai, Daniel Booss, Jungmin Kim, Navneet Singh, Daniel Ritter, and Oliver Rebholz. An examination of cxl memory use cases for in-memory database management systems using sap hana. Proceedings of the VLDB Endowment , 17(12):3827–3840, 2024
work page 2024
-
[3]
Logical memory pools: Flexible and local disaggregated memory
Emmanuel Amaro, Stephanie Wang, Aurojit Panda, and Marcos K Aguilera. Logical memory pools: Flexible and local disaggregated memory. In Proceedings of the 22nd ACM Workshop on Hot Topics in Networks, pages 25–32, 2023
work page 2023
-
[4]
AMD. AMD EPYC 9124 Specifications. https://www.techpowerup. com/cpu-specs/epyc-9124.c2917. Accessed: 2025-01-15
work page 2025
-
[5]
Inc. Astera Labs. Aries pcie ®/cxl® smart cable modules™ . https: //www.asteralabs.com/products/aries-smart-cable-modules/ , 2024. Product Brief
work page 2024
-
[6]
Hardware support for cloud database systems in the post-moore’s law era (dagstuhl seminar 24162)
David F Bacon, Carsten Binnig, David Patterson, and Margo Seltzer. Hardware support for cloud database systems in the post-moore’s law era (dagstuhl seminar 24162). Dagstuhl Reports, 14(4):54–84, 2024
work page 2024
-
[7]
Global combine on mesh architectures with wormhole rout- ing
Michael Barnett, Rick Littlefield, David G Payne, and Robert van de Geijn. Global combine on mesh architectures with wormhole rout- ing. In [1993] Proceedings Seventh International Parallel Processing Symposium, pages 156–162. IEEE, 1993
work page 1993
-
[8]
So far and yet so near-accelerating distributed joins with cxl
Alexander Baumstark, Marcus Paradies, Kai-Uwe Sattler, Steffen Kläbe, and Stephan Baumann. So far and yet so near-accelerating distributed joins with cxl. In Proceedings of the 20th International Workshop on Data Management on New Hardware , pages 1–9, 2024
work page 2024
-
[9]
Design tradeoffs in cxl-based memory pools for public cloud platforms
Daniel S Berger, Daniel Ernst, Huaicheng Li, Pantea Zardoshti, Mon- ish Shah, Samir Rajadnya, Scott Lee, Lisa Hsu, Ishwar Agarwal, Mark D Hill, et al. Design tradeoffs in cxl-based memory pools for public cloud platforms. IEEE Micro, 43(2):30–38, 2023
work page 2023
-
[10]
Slim fly: A cost effective low- diameter network topology
Maciej Besta and Torsten Hoefler. Slim fly: A cost effective low- diameter network topology. In SC’14: proceedings of the international conference for high performance computing, networking, storage and analysis, pages 348–359. IEEE, 2014
work page 2014
-
[11]
Thomas Beth, Dieter Jungnickel, and Hanfried Lenz. Design Theory, volume 1. Cambridge University Press, Cambridge, UK, 2nd edition,
-
[12]
Covers constructions of combinatorial designs, including BIBDs 12 Octopus: Scalable Low-Cost CXL Memory Pooling using finite projective planes
-
[13]
A survey of research and practices of network-on-chip
Tobias Bjerregaard and Shankar Mahadevan. A survey of research and practices of network-on-chip. ACM Computing Surveys (CSUR), 38(1):1–es, 2006
work page 2006
-
[14]
A {High-Performance} design, implementation, deployment, and evaluation of the slim fly network
Nils Blach, Maciej Besta, Daniele De Sensi, Jens Domke, Hussein Harake, Shigang Li, Patrick Iff, Marek Konieczny, Kartik Lakhotia, Ales Kubicek, et al. A {High-Performance} design, implementation, deployment, and evaluation of the slim fly network. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), pages 1025–1044, 2024
work page 2024
-
[15]
R. C. Bose. On balanced incomplete block designs. Annals of Human Genetics, 9:353–399, 1939
work page 1939
-
[16]
Char- acterizing, modeling, and benchmarking {RocksDB} {Key-Value} workloads at facebook
Zhichao Cao, Siying Dong, Sagar Vemuri, and David HC Du. Char- acterizing, modeling, and benchmarking {RocksDB} {Key-Value} workloads at facebook. In 18th USENIX Conference on File and Storage Technologies (FAST 20), pages 209–223, 2020
work page 2020
-
[17]
Hyperscale tiered memory expander spec- ification for compute express link
Prakash Chauhan, Chris Petersen, Brian Morris, and Jerome Glisse. Hyperscale tiered memory expander spec- ification for compute express link. Available at https: //www.opencompute.org/documents/hyperscale-tiered-memory- expander-specification-for-compute-express-link-cxl-1-pdf , 2023. Open Compute Project, Revision 1, Effective October 27, 2023
work page 2023
-
[18]
The mpi mes- sage passing interface standard
Lyndon Clarke, Ian Glendinning, and Rolf Hempel. The mpi mes- sage passing interface standard. In Programming Environments for Massively Parallel Distributed Systems: Working Conference of the IFIP WG 10.3, April 25–29, 1994 , pages 213–218. Springer, 1994
work page 1994
-
[19]
Dictionary based cache line compression
Daniel Cohen, Sarel Cohen, Dalit Naor, Daniel Waddington, and Moshik Hershcovitch. Dictionary based cache line compression. In Proceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems, pages 8–14, 2024
work page 2024
-
[20]
Charles J. Colbourn and Jeffrey H. Dinitz, editors. Handbook of Combinatorial Designs. Chapman and Hall/CRC, 2nd edition, 2006
work page 2006
-
[21]
Compute Express Link (CXL) Specification, Revision 2.0, November 2020
Compute Express Link Consortium. Compute Express Link (CXL) Specification, Revision 2.0, November 2020. Accessed: 2025-03-11
work page 2020
-
[22]
DG Corneil and RA Mathon. Algorithmic techniques for the genera- tion and analysis of strongly regular graphs and other combinatorial configurations. In Annals of Discrete Mathematics, volume 2, pages 1–32. Elsevier, 1978
work page 1978
-
[23]
Mscclang: Microsoft collective communication language
Meghan Cowan, Saeed Maleki, Madanlal Musuvathi, Olli Saarikivi, and Yifan Xiong. Mscclang: Microsoft collective communication language. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, pages 502–514, 2023
work page 2023
-
[24]
William J. Dally. Performance analysis of k-ary n-cube intercon- nection networks. IEEE transactions on Computers , 39(6):775–785, 1990
work page 1990
-
[25]
An introduction to the compute express link (cxl) interconnect
Debendra Das Sharma, Robert Blankenship, and Daniel Berger. An introduction to the compute express link (cxl) interconnect. ACM Computing Surveys, 56(11):1–37, 2024
work page 2024
-
[26]
Seagate Composable Memory Appliance (CMA) Architecture
Mohamad El-Batal. Seagate Composable Memory Appliance (CMA) Architecture. https://www.youtube.com/watch?v=KCgE0WejXl0, June 2024
work page 2024
-
[27]
Disaggregated memory in the datacenter: A survey
Mohammad Ewais and Paul Chow. Disaggregated memory in the datacenter: A survey. IEEE Access, 11:20688–20712, 2023
work page 2023
-
[28]
Power provisioning for a warehouse-sized computer
Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Barroso. Power provisioning for a warehouse-sized computer. ACM SIGARCH com- puter architecture news, 35(2):13–23, 2007
work page 2007
-
[29]
R. A. Fisher and F. Yates. The construction of balanced incomplete block designs. Annals of Human Genetics, 9:30–43, 1938
work page 1938
-
[30]
Fugaku: Japan’s super- computer successor to the k computer
Riken Center for Computational Science. Fugaku: Japan’s super- computer successor to the k computer. Riken Press Release , 2020. Fugaku employs the Tofu interconnect, maintaining a 6D mesh/torus topology for scalability and performance
work page 2020
-
[31]
Mak- ing kernel bypass practical for the cloud with junction
Joshua Fried, Gohar Irfan Chaudhry, Enrique Saurez, Esha Choukse, Íñigo Goiri, Sameh Elnikety, Rodrigo Fonseca, and Adam Belay. Mak- ing kernel bypass practical for the cloud with junction. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), pages 55–73, 2024
work page 2024
-
[32]
K computer: The world’s first 10-petaflop super- computer
Fujitsu and RIKEN. K computer: The world’s first 10-petaflop super- computer. Fujitsu Technical Journal, 2011. The K Computer uses the Tofu interconnect, a proprietary 6D mesh/torus network topology
work page 2011
-
[33]
Acid support for compute express link memory transactions
Ellis Giles and Peter Varman. Acid support for compute express link memory transactions. In SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 982–995. IEEE, 2024
work page 2024
-
[34]
Efficient memory disaggregation with infiniswap
Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G Shin. Efficient memory disaggregation with infiniswap. In 14th USENIX Symposium on Networked Systems Design and Imple- mentation (NSDI 17), pages 649–667, 2017
work page 2017
-
[35]
Bcube: a high performance, server-centric network architecture for modular data centers
Chuanxiong Guo, Guohan Lu, Dan Li, Haitao Wu, Xuan Zhang, Yunfeng Shi, Chen Tian, Yongguang Zhang, and Songwu Lu. Bcube: a high performance, server-centric network architecture for modular data centers. In Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, SIGCOMM ’09, page 63–74, New York, NY, USA, 2009. Association for Computing Machinery
work page 2009
-
[36]
A cxl-powered database system: Op- portunities and challenges
Yunyan Guo and Guoliang Li. A cxl-powered database system: Op- portunities and challenges. In 2024 IEEE 40th International Conference on Data Engineering (ICDE) , pages 5593–5604. IEEE, 2024
work page 2024
-
[37]
Dynamic capacity service for improving cxl pooled memory efficiency
Minho Ha, Junhee Ryu, Jungmin Choi, Kwangjin Ko, Sunwoong Kim, Sungwoo Hyun, Donguk Moon, Byungil Koh, Hokyoon Lee, Myoungseo Kim, et al. Dynamic capacity service for improving cxl pooled memory efficiency. IEEE Micro, 43(2):39–47, 2023
work page 2023
-
[38]
Protean: {VM} allocation service at scale
Ori Hadary, Luke Marshall, Ishai Menache, Abhisek Pan, Esaias E Greeff, David Dion, Star Dorminey, Shailesh Joshi, Yang Chen, Mark Russinovich, et al. Protean: {VM} allocation service at scale. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 845–861, 2020
work page 2020
-
[39]
Design Considerations for CXL Device Hardware Coherency (HDM-DB)
David Hawkins and Matt Bromage. Design Considerations for CXL Device Hardware Coherency (HDM-DB). https://www.youtube.com/ watch?v=2ktM7dPcmqI, 2024
work page 2024
-
[40]
Ali Heydari. How generative ai and accelerated compute is creating the next generation liquid cooled data centers with focus on chal- lenges, opportunities and the road ahead. In 2024 IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Elec- tronic Systems (ITHERM) , Gaylord Rockies, CO, May 2024. Keynote presentation
work page 2024
-
[41]
Tao Huang, Yonggui Liang, Shubao Yu, and Kexin Chen. Txcocket: an innovative solution for efficient cross-node data transmission enabled by cxl-based shared memory. CCF Transactions on High Performance Computing, January 2025. Regular Paper, Published: 22 January 2025
work page 2025
-
[42]
Pasha: An efficient, scalable database architecture for cxl pods
Yibo Huang, Newton Ni, Vijay Chidambaram, Emmett Witchel, and Dixin Tang. Pasha: An efficient, scalable database architecture for cxl pods. In 15th Annual Conference on Innovative Data Systems Research (CIDR ’25), Amsterdam, The Netherlands, 2025. The University of Texas at Austin. Published under the Creative Commons Attribution 4.0 International (CC-BY ...
work page 2025
-
[43]
P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: Wait- free coordination for internet-scale systems. In Proceedings of the USENIX Annual Technical Conference (ATC), pages 145–158. USENIX Association, 2010
work page 2010
-
[44]
The quest for bandwidth and capacity: Memory edition, 2023
Ronen Hyatt. The quest for bandwidth and capacity: Memory edition, 2023. https://www.hpcuserforum.com/wp-content/uploads/ 2023/09/Ronen-Hyatt_UnifabriX_The-Quest-for-Bandwidth-and- Capacity-Memory-Edition_Sept-2023-HPC-UF.pdf
work page 2023
-
[45]
Towards uni- versally accessible SAT technology
Alexey Ignatiev, Zi Li Tan, and Christos Karamanos. Towards uni- versally accessible SAT technology. In SAT, pages 4:1–4:11, 2024
work page 2024
-
[46]
CXL Switch for Scalable & Composable Memory Pool- ing/Sharing
JP Jiang. CXL Switch for Scalable & Composable Memory Pool- ing/Sharing. FMS presentation available at https://www.xconn- 13 Berger et al. tech.com/products, 2024
work page 2024
-
[47]
F. P. Junqueira, B. C. Reed, and M. Serafini. Zab: High-performance broadcast for primary-backup systems. InProceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) , pages 245–256. IEEE Computer Society, 2011
work page 2011
-
[48]
New supermicro x14 systems, 2024
Michael Kalodrich. New supermicro x14 systems, 2024. Accessed: 2024-12-18
work page 2024
-
[49]
CXL 2.0 Switch for a Composable Memory Sys- tem
Jim Kao. CXL 2.0 Switch for a Composable Memory Sys- tem. https://computeexpresslink.org/wp-content/uploads/ 2024/09/Xconn_CXL-2.0-Switch-for-a-Composable-Memory- System_FMS-2024_FINAL.pdf, October 2024
work page 2024
-
[50]
Lenovo has a cxl memory monster with 128x 128gb ddr5 dimms, 2024
Patrick Kennedy. Lenovo has a cxl memory monster with 128x 128gb ddr5 dimms, 2024. Accessed: 2024-12-18
work page 2024
-
[51]
John Kim, Wiliam J Dally, Steve Scott, and Dennis Abts. Technology- driven, highly-scalable dragonfly topology.ACM SIGARCH Computer Architecture News, 36(3):77–88, 2008
work page 2008
-
[52]
Flattened butterfly: a cost-efficient topology for high-radix networks
John Kim, William J Dally, and Dennis Abts. Flattened butterfly: a cost-efficient topology for high-radix networks. In Proceedings of the 34th annual international symposium on Computer architecture , pages 126–137, 2007
work page 2007
-
[53]
Microar- chitecture of a high radix router
John Kim, William J Dally, Brian Towles, and Amit K Gupta. Microar- chitecture of a high radix router. In 32nd International Symposium on Computer Architecture (ISCA’05), pages 420–431. IEEE, 2005
work page 2005
-
[54]
Dense server design for immersion cooling
Milin Kodnongbua, Zachary Englhardt, Ricardo Bianchini, Rodrigo Fonseca, Alvin Lebeck, Daniel S Berger, Vikram Iyer, Fiodar Kazhami- aka, and Adriana Schulz. Dense server design for immersion cooling. ACM Transactions on Graphics (TOG) , 43(6):1–20, 2024
work page 2024
-
[55]
Leo cxl smart memory controllers
Astera Labs. Leo cxl smart memory controllers. Available at https://www.asteralabs.com/products/leo-cxl-smart-memory- controllers/, December 2023. Product Brief
work page 2023
-
[56]
Polarfly: a cost-effective and flexible low-diameter topology
Kartik Lakhotia, Maciej Besta, Laura Monroe, Kelly Isham, Patrick Iff, Torsten Hoefler, and Fabrizio Petrini. Polarfly: a cost-effective and flexible low-diameter topology. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis , pages 1–15. IEEE, 2022
work page 2022
-
[57]
Leslie Lamport. The part-time parliament. ACM Transactions on Computer Systems (TOCS), 16(2):133–169, May 1998
work page 1998
-
[58]
The evolution of compet- itive advantage in the worldwide semiconductor industry
Richard Langlois and Edward Steinmueller. The evolution of compet- itive advantage in the worldwide semiconductor industry. Sources of industrial leadership. Cambridge University Press, Cambridge, UK , 1999
work page 1999
-
[59]
Elastic use of far memory for in-memory data- base management systems
Donghun Lee, Thomas Willhalm, Minseon Ahn, Suprasad Muta- lik Desai, Daniel Booss, Navneet Singh, Daniel Ritter, Jungmin Kim, and Oliver Rebholz. Elastic use of far memory for in-memory data- base management systems. In Proceedings of the 19th International Workshop on Data Management on New Hardware , pages 35–43, 2023
work page 2023
-
[60]
Dram scaling challenges and solu- tions
Donghyuk Lee and Onur Mutlu. Dram scaling challenges and solu- tions. IEEE Micro, 42(2):14–25, 2022
work page 2022
-
[61]
Memtis: Efficient memory tiering with dynamic page clas- sification and page size determination
Taehyung Lee, Sumit Kumar Monga, Changwoo Min, and Young Ik Eom. Memtis: Efficient memory tiering with dynamic page clas- sification and page size determination. In Proceedings of the 29th Symposium on Operating Systems Principles , pages 17–34, 2023
work page 2023
-
[62]
Fat-trees: Universal networks for hardware- efficient supercomputing
Charles E Leiserson. Fat-trees: Universal networks for hardware- efficient supercomputing. IEEE transactions on Computers , 100(10):892–901, 1985
work page 1985
-
[63]
Lenovo thinksystem sr860 v3 server, 2024
Lenovo. Lenovo thinksystem sr860 v3 server, 2024. Accessed: 2024- 12-18
work page 2024
-
[64]
Cxl and the return of scale-up database engines
Alberto Lerner and Gustavo Alonso. Cxl and the return of scale-up database engines. arXiv preprint arXiv:2401.01150, 2024
-
[65]
A case against cxl memory pooling
Philip Levis, Kun Lin, and Amy Tai. A case against cxl memory pooling. In Proceedings of the 22nd ACM Workshop on Hot Topics in Networks, pages 18–24, 2023
work page 2023
-
[66]
Huaicheng Li, Daniel S. Berger, Lisa Hsu, Daniel Ernst, Pantea Zar- doshti, Stanko Novakovic, Monish Shah, Samir Rajadnya, Scott Lee, Ishwar Agarwal, Mark D. Hill, Marcus Fontoura, and Ricardo Bian- chini. Pond: CXL-Based Memory Pooling Systems for Cloud Plat- forms. In ASPLOS, 2023
work page 2023
-
[67]
System-level implications of disaggregated memory
Kevin Lim, Yoshio Turner, Jose Renato Santos, Alvin AuYoung, Jichuan Chang, Parthasarathy Ranganathan, and Thomas F Wenisch. System-level implications of disaggregated memory. In IEEE Inter- national Symposium on High-Performance Comp Architecture , pages 1–12. IEEE, 2012
work page 2012
-
[68]
Perftest: Infiniband verbs performance tests, 2025
linux rdma. Perftest: Infiniband verbs performance tests, 2025. Ac- cessed: 2025-03-12
work page 2025
-
[69]
B. Liskov and J. Cowling. Viewstamped replication revisited. Tech- nical Report MIT-CSAIL-TR-2012-021, MIT, July 2012
work page 2012
-
[70]
The primary-backup approach to fault-tolerant distributed systems
Brian Liskov and Robert Scheifler. The primary-backup approach to fault-tolerant distributed systems. In Proceedings of the 7th ACM Sym- posium on Operating Systems Principles (SOSP) , pages 48–55. ACM, 1979
work page 1979
-
[71]
Berger, Marie Nguyen, Xun Jian, Sam H
Jinshu Liu, Hamid Hadian, Yuyue Wang, Daniel S. Berger, Marie Nguyen, Xun Jian, Sam H. Noh, and Huaicheng Li. Systematic CXL memory characterization and performance analysis at scale. In Pro- ceedings of the 2025 ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASP- LOS’25), 2025
work page 2025
-
[72]
Nvidia collective communications library (nccl).https: //developer.nvidia.com/nccl, 2016
Nathan Luehr. Nvidia collective communications library (nccl).https: //developer.nvidia.com/nccl, 2016. Accessed 1/1/2025
work page 2016
-
[73]
Hyrax: {Fail-in-Place} server operation in cloud platforms
Jialun Lyu, Marisa You, Celine Irvene, Mark Jung, Tyler Narmore, Jacob Shapiro, Luke Marshall, Savyasachi Samal, Ioannis Manousakis, Lisa Hsu, et al. Hyrax: {Fail-in-Place} server operation in cloud platforms. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23) , pages 287–304, 2023
work page 2023
-
[74]
{HydraRPC}:{RPC} in the {CXL} era
Teng Ma, Zheng Liu, Chengkun Wei, Jialiang Huang, Youwei Zhuo, Haoyu Li, Ning Zhang, Yijin Guan, Dimin Niu, Mingxing Zhang, et al. {HydraRPC}:{RPC} in the {CXL} era. In 2024 USENIX Annual Technical Conference (USENIX ATC 24), pages 387–395, 2024
work page 2024
-
[75]
You don’t know ’jack’: Cxl fabric orchestration and management
Grant Mackey. You don’t know ’jack’: Cxl fabric orchestration and management. Available at https://files.futurememorystorage.com/ proceedings/2024/20240806_CXLT-102-1_Mackey.pdf, 2024. Pre- sented by Jackrabbit Labs
work page 2024
-
[76]
Telepathic datacenters: Fast rpcs using shared cxl memory, 2024
Suyash Mahar, Ehsan Hajyjasini, Seungjin Lee, Zifeng Zhang, Mingyao Shen, and Steven Swanson. Telepathic datacenters: Fast rpcs using shared cxl memory, 2024
work page 2024
-
[77]
Tpp: Trans- parent page placement for cxl-enabled tiered-memory
Hasan Al Maruf, Hao Wang, Abhishek Dhanotia, Johannes Weiner, Niket Agarwal, Pallab Bhattacharya, Chris Petersen, Mosharaf Chowdhury, Shobhit Kanaujia, and Prakash Chauhan. Tpp: Trans- parent page placement for cxl-enabled tiered-memory. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating...
work page 2023
-
[78]
Inc. Marvell Technology. Structera a 2504 memory-expansion controller. Available at https://www.marvell.com/content/dam/ marvell/en/public-collateral/assets/marvell-structera-a-2504-near- memory-accelerator-product-brief.pdf , 2024. Product Brief, P/N MV-SLA25041-A0-HF350AA-C000
work page 2024
-
[79]
Memverge unveils world’s first cxl-based multi-server shared memory at isc
MemVerge. Memverge unveils world’s first cxl-based multi-server shared memory at isc. Press release, May 2023. International Super- computing Conference (ISC), Hamburg, Germany
work page 2023
-
[80]
The power of two choices in randomized load balancing
Michael Mitzenmacher. The power of two choices in randomized load balancing. IEEE Transactions on Parallel and Distributed Systems , 12(10):1094–1104, 2001
work page 2001
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.