EPIC: Abstraction and Polymorphism of In-Network Collectives on Ethernet
Pith reviewed 2026-05-20 08:04 UTC · model grok-4.3
The pith
EPIC introduces a standard-Ethernet-compatible abstraction for in-network collectives that supports polymorphic realizations across different hardware capabilities.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EPIC (Ethernet Polymorphic In-network Collective) is an INC protocol specification and reference system built on 'Unified Abstraction, Polymorphic Realization'. It introduces an abstraction compatible with standard Ethernet that aligns functional boundaries with participant roles, while offering polymorphic realizations tailored to varying hardware capabilities. The design addresses challenges with modular evolution, formal proofs of correctness for all modes, and a unified resource management model.
What carries the argument
The unified abstraction in EPIC that aligns functional boundaries with participant roles to enable polymorphic realizations for hardware of varying capabilities.
If this is right
- Vendors can incrementally develop hardware from simple to complex implementations without losing compatibility.
- Formal verification confirms the correctness of all proposed polymorphic modes.
- A unified resource management model supports a wide range of in-network collective scenarios.
- Performance gains are realized in AI training and inference workloads.
Where Pith is reading between the lines
- This abstraction could be adapted to other networking standards to broaden INC adoption.
- Real-world deployments might reveal optimizations for specific AI model types.
- The modular approach could template other cross-layer network protocols.
Load-bearing premise
A modular design enables an evolutionary path from simple to complex implementations allowing vendors to iterate their hardware incrementally.
What would settle it
A failure in formal verification of any polymorphic mode or the absence of performance gains in Tofino testbed experiments compared to standard Ethernet collectives would falsify the claims.
Figures
read the original abstract
In-Network Collective (INC) acceleration holds immense potential for optimizing AI training and inference; however, its cross-layer nature has historically hindered investment and adoption within the open Ethernet ecosystem. To bridge this gap, we propose EPIC (Ethernet Polymorphic In-network Collective), an INC protocol specification and reference system built on the principle of "Unified Abstraction, Polymorphic Realization." EPIC introduces an abstraction compatible with standard Ethernet that aligns functional boundaries with participant roles, while offering polymorphic realizations tailored to varying hardware capabilities. We address three fundamental challenges: first, we employ a modular design that enables an evolutionary path from simple to complex implementations, allowing vendors to iterate their hardware incrementally; second, we apply formal verification methodologies to prove the correctness of all proposed polymorphic modes; and third, we develop a unified resource management model versatile enough for diverse INC scenarios. Extensive validation -- spanning model checking, packet/flow simulations, VM emulation, Tofino Testbed, and FPGA/RTL verification -- confirms EPIC's correctness, performance gain, and feasibility.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes EPIC (Ethernet Polymorphic In-network Collective), a protocol specification and reference system for in-network collective acceleration on Ethernet. It introduces a unified abstraction compatible with standard Ethernet that aligns functional boundaries with participant roles and supports polymorphic realizations for different hardware capabilities. The work addresses three challenges: modular design for evolutionary hardware implementation, formal verification of correctness for all modes, and a unified resource management model. Validation spans model checking, simulations, VM emulation, Tofino testbed, and FPGA/RTL verification, claiming correctness, performance gains, and feasibility.
Significance. If the central claims hold, particularly the compatibility with unmodified Ethernet and the polymorphic approach enabling incremental vendor adoption, this could facilitate broader investment and adoption of in-network collectives in the open Ethernet ecosystem for AI training and inference, addressing a historical barrier due to cross-layer nature.
major comments (1)
- [Abstract] Abstract: The abstract states that EPIC is 'compatible with standard Ethernet' and offers 'polymorphic realizations tailored to varying hardware capabilities', with validation including 'Tofino Testbed, and FPGA/RTL verification'. If even the basic modes require programmable switch features (as implied by the testbeds used for all modes), this contradicts the claim of compatibility with unmodified standard Ethernet hardware and the modular evolutionary path starting from simple implementations on existing vendor silicon. This is load-bearing for the central claim and requires clarification or evidence of a non-programmable basic mode.
minor comments (1)
- [Abstract] Abstract: The abstract mentions 'extensive validation' but does not specify quantitative performance gains or specific metrics used to confirm feasibility.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and for identifying a point that is central to the paper's claims. We address the major comment below and commit to revisions that improve clarity without altering the core technical contributions.
read point-by-point responses
-
Referee: [Abstract] Abstract: The abstract states that EPIC is 'compatible with standard Ethernet' and offers 'polymorphic realizations tailored to varying hardware capabilities', with validation including 'Tofino Testbed, and FPGA/RTL verification'. If even the basic modes require programmable switch features (as implied by the testbeds used for all modes), this contradicts the claim of compatibility with unmodified standard Ethernet hardware and the modular evolutionary path starting from simple implementations on existing vendor silicon. This is load-bearing for the central claim and requires clarification or evidence of a non-programmable basic mode.
Authors: We appreciate the referee highlighting this important clarification need. The EPIC abstraction is defined to be compatible with standard Ethernet by using unmodified Ethernet frame formats, standard multicast groups, and conventional switch forwarding tables for the basic mode; no programmable data-plane features are required for correct operation or role alignment in this mode. Polymorphic realizations then layer optional in-network computation on top when programmable hardware (Tofino, FPGA) is present. The listed testbeds were selected to exercise the full spectrum of modes and to provide rigorous verification of the advanced realizations, but they do not imply that the basic mode depends on programmability. We agree that the manuscript would benefit from an explicit statement of per-mode hardware requirements and a short illustrative example of the basic mode on commodity silicon. We will therefore revise the abstract, add a clarifying paragraph in Section 3, and include a table summarizing hardware prerequisites for each polymorphic variant. These changes will be incorporated in the revised manuscript. revision: yes
Circularity Check
No circularity: design proposal rests on independent specification and validation
full rationale
The paper presents EPIC as a new protocol specification and reference system based on the principle of unified abstraction with polymorphic realizations. Claims about compatibility with standard Ethernet, modular evolutionary path, formal verification, and resource management are introduced as design choices rather than derived from fitted parameters or prior self-referential results. Validation spans model checking, simulations, emulation, Tofino testbed, and FPGA/RTL, providing external checks. No equations, self-citations, or reductions to inputs by construction appear in the abstract or described structure. The derivation chain is self-contained as a proposed architecture with stated assumptions that do not presuppose the target outcomes.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Formal verification methodologies can prove the correctness of all proposed polymorphic modes.
invented entities (1)
-
EPIC abstraction and polymorphic realizations
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
EPIC introduces an abstraction compatible with standard Ethernet that aligns functional boundaries with participant roles, while offering polymorphic realizations tailored to varying hardware capabilities.
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We employ a modular design that enables an evolutionary path from simple to complex implementations
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
AMD. 2024. RCCL. (2024). https://github.com/ROCm/rccl
work page 2024
-
[2]
Wei Bai, Shanim Sainul Abdeen, Ankit Agrawal, Krishan Kumar Attre, Paramvir Bahl, Ameya Bhagat, Gowri Bhaskara, Tanya Brokhman, Lei Cao, Ahmad Cheema, et al . 2023. Empowering azure storage with RDMA. In20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). 49–67
work page 2023
-
[3]
Marcel Blöcher, Lin Wang, Patrick Eugster, and Max Schmidt. 2021. Switches for HIRE: Resource scheduling for data center in-network computing. InProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 268–285
work page 2021
-
[4]
Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, et al. 2014. P4: Programming protocol-independent packet processors.ACM SIGCOMM Computer Communication Review44, 3 (2014), 87–95
work page 2014
-
[5]
Broadcom Inc. 2026. StrataXGS Tomahawk 5 Series: 51.2 Tb/s Ethernet Switch ASIC Family. https://www.broadcom.com/products/ethernet -connectivity/switching/strataxgs/bcm78920-series. (2026). Accessed: 2026-02-06
work page 2026
-
[6]
Jiamin Cao, Yu Guan, Kun Qian, Jiaqi Gao, Wencong Xiao, Jianbo Dong, Binzhang Fu, Dennis Cai, and Ennan Zhai. 2024. Crux: Gpu-efficient communication scheduling for deep learning training. InProceedings of the ACM SIGCOMM 2024 Conference. 1–15
work page 2024
-
[7]
Daniele De Sensi, Salvatore Di Girolamo, Saleh Ashkboos, Shigang Li, and Torsten Hoefler. 2021. Flare: Flexible in-network allreduce. InProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–16
work page 2021
-
[8]
Salvatore Di Girolamo, Andreas Kurth, Alexandru Calotoiu, Thomas Benz, Timo Schneider, Jakub Beránek, Luca Benini, and Torsten Hoefler
-
[9]
In2021 ACM/IEEE 48th Annual Interna- tional Symposium on Computer Architecture (ISCA)
A RISC-V in-network accelerator for flexible high-performance low-power packet processing. In2021 ACM/IEEE 48th Annual Interna- tional Symposium on Computer Architecture (ISCA). IEEE, 958–971
-
[10]
Mihai Dobrescu, Norbert Egi, Katerina Argyraki, Byung-Gon Chun, Kevin Fall, Gianluca Iannaccone, Allan Knies, Maziar Manesh, and Sylvia Ratnasamy. 2009. RouteBricks: Exploiting parallelism to scale software routers. InProceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. 15–28
work page 2009
- [11]
-
[12]
Jianbo Dong, Shaochuang Wang, Fei Feng, Zheng Cao, Heng Pan, Lingbo Tang, Pengcheng Li, Hao Li, Qianyuan Ran, Yiqun Guo, et al
-
[13]
ACCL: Architecting Highly Scalable Distributed Training Sys- tems with Highly Efficient Collective Communication Library.IEEE Micro41, 5 (2021), 85–92
work page 2021
-
[14]
Shichen Dong, Zhixiong Niu, Mingchao Zhang, Zhiying Xu, Chuntao Hu, Pengzhi Zhu, Qingchun Song, Lei Qu, Peng Cheng, Cam-Tu Nguyen, et al . 2025. Mina: Fine-Grained In-network Aggregation Resource Scheduling for Machine Learning Service. InIEEE INFOCOM 2025-IEEE Conference on Computer Communications. IEEE, 1–10
work page 2025
-
[15]
Cheng Tien Ee, Rodrigo Fonseca, Sukun Kim, Daekyeong Moon, Ar- salan Tavakoli, David E Culler, Scott Shenker, and Ion Stoica. 2006. A Modular Network Layer for Sensornets.. InOSDI, Vol. 6. 249–262
work page 2006
-
[16]
Jin Fang, Gongming Zhao, Hongli Xu, Changbo Wu, and Zhuolong Yu. 2023. GRID: Gradient routing with in-network aggregation for distributed training.IEEE/ACM Transactions on Networking31, 5 (2023), 2267–2280
work page 2023
-
[17]
Fagg, George Bosilca, Thara Angskun, Jack J
Edgar Gabriel, Graham E. Fagg, George Bosilca, Thara Angskun, Jack J. Dongarra, Jeffrey M. Squyres, Vishal Sahay, Prabhanjan Kambadur, Brian Barrett, Andrew Lumsdaine, Ralph H. Castain, David J. Daniel, Richard L. Graham, and Timothy S. Woodall. 2004. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. InPro- ceedings of the 11th...
work page 2004
-
[18]
Massimo Gallo and Rafael Laufer. 2018. ClickNF: a Modular Stack for Custom Network Functions. In2018 USENIX Annual Technical Conference (USENIX ATC 18). 745–757
work page 2018
-
[19]
Adithya Gangidi, Rui Miao, Shengbao Zheng, Sai Jayesh Bondu, Guilherme Goes, Hany Morsy, Rohit Puri, Mohammad Riftadi, Ashmitha Jeevaraj Shetty, Jingyi Yang, et al . 2024. Rdma over eth- ernet for distributed training at meta scale. InProceedings of the ACM SIGCOMM 2024 Conference. 57–70
work page 2024
-
[20]
Yixiao Gao, Qiang Li, Lingbo Tang, Yongqing Xi, Pengcheng Zhang, Wenwen Peng, Bo Li, Yaohui Wu, Shaozong Liu, Lei Yan, et al. 2021. When cloud storage meets RDMA. In18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). 519–533
work page 2021
-
[21]
Nadeen Gebara, Manya Ghobadi, and Paolo Costa. 2021. In-network ag- gregation for shared machine learning clusters.Proceedings of Machine Learning and Systems3 (2021), 829–844
work page 2021
-
[22]
Richard L Graham, Devendar Bureddy, Pak Lui, Hal Rosenstock, Gilad Shainer, Gil Bloch, Dror Goldenerg, Mike Dubman, Sasha Kotchu- bievsky, Vladimir Koushnir, et al. 2016. Scalable hierarchical aggre- gation protocol (SHArP): A hardware architecture for efficient data reduction. In2016 First International Workshop on Communication Op- timizations in HPC (C...
work page 2016
-
[23]
Richard L Graham, Lion Levi, Devendar Burredy, Gil Bloch, Gilad Shainer, David Cho, George Elias, Daniel Klein, Joshua Ladd, Ophir Maor, et al. 2020. Scalable hierarchical aggregation and reduction protocol (sharp) tm streaming-aggregation hardware design and eval- uation. InInternational Conference on High Performance Computing. Springer, 41–59
work page 2020
-
[24]
Yongchao He, Wenfei Wu, Yanfang Le, Ming Liu, and ChonLam Lao
-
[25]
A generic service to provide in-network aggregation for key- value streams. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 33–47
- [26]
-
[27]
Torsten Hoefler, Tommaso Bonato, Daniele De Sensi, Salvatore Di Giro- lamo, Shigang Li, Marco Heddes, Deepak Goel, Miguel Castro, and Steve Scott. 2024. Hammingmesh: A network topology for large-scale deep learning.Commun. ACM67, 12 (2024), 97–105
work page 2024
-
[28]
Torsten Hoefler, Mikhail Khalilov, Josiah Clark, Surendra Anubolu, Mohan Kalkunte, Karen Schramm, Eric Spada, Duncan Roweth, Keith Underwood, Adrian Caulfield, et al. 2025. In-Network Collective Op- erations: Game Changer or Challenge for AI Workloads?Computer 59, 1 (2025), 24–33
work page 2025
-
[29]
Torsten Hoefler, Andrew Lumsdaine, and Wolfgang Rehm. 2007. Im- plementation and performance analysis of non-blocking collective operations for MPI. InProceedings of the 2007 ACM/IEEE conference on Supercomputing. 1–10
work page 2007
-
[30]
Torsten Hoefler and Dmitry Moor. 2014. Energy, memory, and run- time tradeoffs for implementing collective communication operations. Supercomputing frontiers and innovations1, 2 (2014), 58–75
work page 2014
-
[31]
Chengyuan Huang, Yixiao Gao, Wei Chen, Duoxing Li, Yibo Xiao, Ruyi Zhang, Chen Tian, Xiaoliang Wang, Wanchun Dou, Guihai Chen, et al
-
[32]
MC-RDMA: Improving Replication Performance of RDMA-based Distributed Systems with Reliable Multicast Support. In2023 IEEE 31st 15 Y. Yuan, J. Nie, T. Bai, R. Zhou, S. Cao, X. Fan, et al. International Conference on Network Protocols (ICNP). IEEE, 1–11
-
[33]
Guyue Huang, Hao Li, Le Qin, Jiayi Huang, Yangwook Kang, Yufei Ding, and Yuan Xie. 2025. TRACI: Network Acceleration of Input- Dynamic Communication for Large-Scale Deep Learning Recommen- dation Model. InProceedings of the 52nd Annual International Sympo- sium on Computer Architecture. 1880–1893
work page 2025
-
[34]
InfiniBand Trade Association. [n. d.].Supplement to InfiniBand™Archi- tecture Specification Volume 1 Release 1.2.1: Annex A17: RoCEv2. Tech- nical Specification Supplement Release 1.2.1, Annex A17. InfiniBand Trade Association. Proprietary document; available via InfiniBand Trade Association membership
-
[35]
Intel. 2024. Intel Tofino 2. (2024). https://www.intel.com/content/ww w/us/en/products/details/network-io/intelligent-fabric-processors/ tofino-2.html
work page 2024
-
[36]
Intel. 2024. oneAPI Collective Communications Library (oneCCL). (2024). https://github.com/oneapi-src/oneCCL
work page 2024
-
[37]
Norm Jouppi, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, Suvinay Subramanian, Andy Swing, Brian Towles, et al. 2023. Tpu v4: An optically reconfigurable supercom- puter for machine learning with hardware support for embeddings. In Proceedings of the 50th annual international symposium on computer architecture. 1–14
work page 2023
-
[38]
Mikhail Khalilov, Salvatore Di Girolamo, Marcin Chrapek, Rami Nudelman, Gil Bloch, and Torsten Hoefler. 2024. Network-offloaded bandwidth-optimal broadcast and Allgather for distributed AI. InSC24: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–17
work page 2024
-
[39]
Heehoon Kim, Junyeol Ryu, and Jaejin Lee. 2024. TCCL: Discovering Better Communication Paths for PCIe GPU Clusters. InProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. 999–1015
work page 2024
-
[40]
Benjamin Klenk, Nan Jiang, Greg Thorson, and Larry Dennison. 2020. An in-network architecture for accelerating shared-memory multi- processor collectives. In2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 996–1009
work page 2020
-
[41]
Eddie Kohler, Robert Morris, Benjie Chen, John Jannotti, and M Frans Kaashoek. 2000. The Click modular router.ACM Transactions on Computer Systems (TOCS)18, 3 (2000), 263–297
work page 2000
-
[42]
2002.Specifying systems: The TLA+ language and tools for hardware and software engineers
Leslie Lamport. 2002.Specifying systems: The TLA+ language and tools for hardware and software engineers. Addison-Wesley
work page 2002
-
[43]
ChonLam Lao, Jiaqi Gao, Jiamin Cao, Zhipeng Zhang, Pengcheng Zhang, Jiangfei Duan, Minlan Yu, Aditya Akella, Zhilong Zheng, Yu Guan, Yichi Xu, Yong Li, Ennan Zhai, Dennis Cai, Zhengping Qian, and Jingren Zhou. 2026. Continuum: An Interruption-Resilient Runtime for ML Training. InOSDI
work page 2026
-
[44]
ChonLam Lao, Yanfang Le, Kshiteej Mahajan, Yixi Chen, Wenfei Wu, Aditya Akella, and Michael Swift. 2021. ATP: In-network aggregation for multi-tenant learning. In18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). 741–761
work page 2021
-
[45]
Wenxue Li, Xiangzhou Liu, Yuxuan Li, Yilun Jin, Han Tian, Zhizhen Zhong, Guyue Liu, Ying Zhang, and Kai Chen. 2024. Understanding communication characteristics of distributed training. InProceedings of the 8th Asia-Pacific Workshop on Networking. 1–8
work page 2024
-
[46]
Wenxue Li, Xiangzhou Liu, Yunxuan Zhang, Zihao Wang, Wei Gu, Tao Qian, Gaoxiong Zeng, Shoushou Ren, Xinyang Huang, Zhenghang Ren, et al. 2025. Revisiting RDMA Reliability for Lossy Fabrics. In Proceedings of the ACM SIGCOMM 2025 Conference. 85–98
work page 2025
-
[47]
Wenxue Li, Junyi Zhang, Yufei Liu, Gaoxiong Zeng, Zilong Wang, Chaoliang Zeng, Pengpeng Zhou, Qiaoling Wang, and Kai Chen. 2024. Cepheus: accelerating datacenter applications with high-performance roce-capable multicast. In2024 IEEE International Symposium on High- Performance Computer Architecture (HPCA). IEEE, 908–921
work page 2024
-
[48]
Youjie Li, Iou-Jen Liu, Yifan Yuan, Deming Chen, Alexander Schwing, and Jian Huang. 2019. Accelerating distributed reinforcement learning with in-switch computing. InProceedings of the 46th International Symposium on Computer Architecture. 279–291
work page 2019
-
[49]
Zhaoyi Li, Jiawei Huang, Yijun Li, Aikun Xu, Shengwen Zhou, Jingling Liu, and Jianxin Wang. 2023. A2TP: Aggregator-aware in-network aggregation for multi-tenant learning. InProceedings of the Eighteenth European Conference on Computer Systems. 639–653
work page 2023
-
[50]
Zhaoyi Li, Jiawei Huang, Tao Zhang, Shengwen Zhou, Qile Wang, Yijun Li, Jingling Liu, Wanchun Jiang, and Jianxin Wang. 2023. PA-ATP: Progress-Aware Transmission Protocol for In-Network Aggregation. In2023 IEEE 31st International Conference on Network Protocols (ICNP). IEEE, 1–11
work page 2023
-
[51]
Linux man-pages project. [n. d.].rxe(7): Software RDMA over Ethernet (RoCE) driver. ([n. d.]). https://man7.org/linux/man-pages/man7/rxe. 7.html Documents Linux kernel modulerdma_rxe(Soft-RoCE/RXE)
-
[52]
Linux RDMA Community. 2024. libibverbs: Userspace InfiniBand Verbs Library. https://github.com/linux-rdma/rdma-core/tree/maste r/libibverbs. (2024). Part of rdma-core; provides the ibv_* API for RDMA device management, QP/CQ/MR operations, and data transfer
work page 2024
-
[53]
Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al
-
[54]
Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[55]
Shuo Liu, Qiaoling Wang, Junyi Zhang, Wenfei Wu, Qinliang Lin, Yao Liu, Meng Xu, Marco Canini, Ray CC Cheung, and Jianfei He. 2023. In-network aggregation with transport transparency for distributed training. InProceedings of the 28th ACM International Conference on Ar- chitectural Support for Programming Languages and Operating Systems, Volume 3. 376–391
work page 2023
-
[56]
Qingkai Meng, Hao Zheng, Zhenhui Zhang, ChonLam Lao, Chengyuan Huang, Baojia Li, Ziyuan Zhu, Hao Lu, Weizhen Dang, Zitong Lin, et al
-
[57]
InProceedings of the ACM SIGCOMM 2025 Conference
Astral: A datacenter infrastructure for large language model training at scale. InProceedings of the ACM SIGCOMM 2025 Conference. 609–625
work page 2025
-
[58]
Zili Meng, Jun Bi, Haiping Wang, Chen Sun, and Hongxin Hu. 2019. MicroNF: An efficient framework for enabling modularized service chains in NFV.IEEE Journal on Selected Areas in Communications37, 8 (2019), 1851–1865
work page 2019
-
[59]
Microsoft. 2023. MSCCL. (2023). https://github.com/microsoft/msccl
work page 2023
-
[60]
NVIDIA. 2024. NCCL. (2024). https://github.com/NVIDIA/nccl
work page 2024
-
[61]
NVIDIA Corporation. [n. d.].NVIDIA NVLink High-Speed Interconnect: Application Performance. Whitepaper. NVIDIA Corporation
-
[62]
OMNeT++ Community. 2024. OMNeT++ Discrete Event Simulator. https://omnetpp.org. (2024). Version 6.2
work page 2024
-
[63]
OpenInfra Foundation. [n. d.]. OpenStack: Open source cloud comput- ing infrastructure. ([n. d.]). https://www.openstack.org/
-
[64]
Aurojit Panda, Sangjin Han, Keon Jang, Melvin Walls, Sylvia Rat- nasamy, and Scott Shenker. 2016. NetBricks: Taking the V out of NFV. In12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 203–216
work page 2016
-
[65]
Ben Pfaff, Justin Pettit, Teemu Koponen, Ethan J. Jackson, Andy Zhou, Jarno Rajahalme, Jesse Gross, Alex Wang, Jonathan Stringer, Pravin Shelar, Keith Amidon, and Martin Casado. 2015. The Design and Imple- mentation of Open vSwitch. In12th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’15). USENIX Association, 117–130. https://www....
work page 2015
-
[66]
Chenchen Qi, Wenfei Wu, Yongcan Wang, Keqiang He, Yu-Hsiang Kao, Zongying He, Chen-Yu Yen, Zhuo Jiang, Feng Luo, Surendra Anubolu, et al. 2025. SGLB: Scalable and Robust Global Load Balancing in Commodity AI Clusters. InProceedings of the ACM SIGCOMM 2025 16 EPIC Conference. 626–644
work page 2025
-
[67]
Kun Qian, Yongqing Xi, Jiamin Cao, Jiaqi Gao, Yichi Xu, Yu Guan, Binzhang Fu, Xuemei Shi, Fangbo Zhu, Rui Miao, et al. 2024. Alibaba hpn: A data center network for large language model training. In Proceedings of the ACM SIGCOMM 2024 Conference. 691–706
work page 2024
-
[68]
Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He
-
[69]
InSC20: International Conference for High Performance Computing, Networking, Storage and Analysis
Zero: Memory optimizations toward training trillion param- eter models. InSC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–16
-
[70]
Ori Rottenstreich and Jose Yallouz. 2024. Edge-disjoint tree allocation for multi-tenant cloud security in datacenter topologies.IEEE/ACM Transactions on Networking32, 4 (2024), 2858–2874
work page 2024
-
[71]
Amedeo Sapio, Marco Canini, Chen-Yu Ho, Jacob Nelson, Panos Kalnis, Changhoon Kim, Arvind Krishnamurthy, Masoud Moshref, Dan Ports, and Peter Richtárik. 2021. Scaling distributed machine learning with In-Network aggregation. In18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). 785–808
work page 2021
-
[72]
Raz Segal, Chen Avin, and Gabriel Scalosub. 2021. SOAR: Minimiz- ing network utilization with bounded in-network computing. InPro- ceedings of the 17th International Conference on emerging Networking EXperiments and Technologies. 16–29
work page 2021
-
[73]
Raz Segal, Chen Avin, and Gabriel Scalosub. 2022. Constrained in- network computing with low congestion in datacenter networks. In IEEE INFOCOM 2022-IEEE Conference on Computer Communications. IEEE, 1639–1648
work page 2022
-
[74]
Aashaka Shah, Vijay Chidambaram, Meghan Cowan, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi, and Rachee Singh. 2023. TACCL: Guiding Collective Algorithm Synthe- sis using Communication Sketches. In20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). 593–612
work page 2023
-
[75]
The Tcpdump Group. [n. d.]. libpcap: Portable packet capture library. ([n. d.]). https://www.tcpdump.org/
-
[76]
UAlink Consortium. 2024. UAlink Consortium. Online Consortium Website. (2024). https://ualinkconsortium.org/
work page 2024
-
[77]
Ultra Ethernet Consortium. 2024. Ultra Ethernet Specification Update. Ultra Ethernet Consortium Blog. (29 August 2024). https://ultraether net.org/ultra-ethernet-specification-update/ Accessed: 2026-02-06
work page 2024
-
[78]
Xinchen Wan, Luyang Li, Han Tian, Xudong Liao, Xinyang Huang, Chaoliang Zeng, Zilong Wang, Xinyu Yang, Ke Cheng, Qingsong Ning, et al. 2025. A Generic and Efficient Communication Framework for Message-level In-Network Computing. InIEEE INFOCOM 2025-IEEE Conference on Computer Communications. IEEE, 1–10
work page 2025
-
[79]
Ruiqi Wang, Dezun Dong, Fei Lei, Junchao Ma, Ke Wu, and Kai Lu
-
[80]
In Proceedings of the 37th International Conference on Supercomputing
Roar: A router microarchitecture for in-network allreduce. In Proceedings of the 37th International Conference on Supercomputing. 423–436
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.