LCMP: Distributed Long-Haul Cost-Aware Multi-Path Routing for Inter-Datacenter RDMA Networks
Pith reviewed 2026-05-10 18:17 UTC · model grok-4.3
The pith
LCMP places RDMA flows on multiple inter-DC paths to cut median slowdown by 76%.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LCMP combines a control-plane path-quality score with compact on-switch congestion signals to place RDMA flows on multiple inter-datacenter paths for low-cost, low-latency, and congestion-responsive transmission, while resolving simultaneous flow decision collisions by filtering high-cost candidates and performing a diversity-preserving hash inside the reduced set.
What carries the argument
The LCMP routing framework, which unifies asymmetric path assessment through control-plane quality scores and enables responsive congestion reaction through on-switch signals, with collision resolution via high-cost filtering plus hashed selection.
If this is right
- RDMA flows can use multiple paths effectively even when those paths differ in length and capacity.
- Distributed decisions stay collision-free without requiring a central coordinator.
- Congestion response becomes fast enough to avoid the delayed-signal problems typical in long-haul links.
Where Pith is reading between the lines
- The same score-plus-signal pattern could be tested on other wide-area latency-sensitive workloads such as distributed training or database replication.
- If path scoring remains accurate at larger scales, the approach may reduce the need for over-provisioning inter-DC links.
Load-bearing premise
The control-plane path-quality score accurately unifies assessment across asymmetric paths, and the on-switch congestion signals plus filtered hash reliably prevent collisions and delayed feedback problems without introducing new overhead or misjudgments.
What would settle it
Run the 8-DC testbed or 2000 km NS-3 scenario with LCMP disabled versus enabled and check whether the reported reductions in median and tail FCT slowdown disappear or reverse when paths are highly asymmetric or when many flows decide simultaneously.
Figures
read the original abstract
RDMA-empowered cloud services are gradually deployed across datacenters (DCs) with multiple paths, which exhibit new properties of path asymmetry, delayed congestion signals, and simultaneous flow routing collisions, and further fail existing routing methods. We present LCMP, a distributed long-haul cost-aware multi-path routing framework that aims to place RDMA flows on multiple inter-DC paths, achieving low-cost, low-latency, and congestion-responsive transmission. LCMP combines a control-plane path-quality score with compact on-switch congestion signals, where the former unifies quality assessment for asymmetric paths and the latter enables responsive reaction to path congestion. LCMP further resolves the simultaneous flow decision collision problem by filtering high-cost candidates, and performing a diversity-preserving hash inside the reduced set. On an 8-DC testbed, LCMP reduces median and tail FCT slowdown by up to 76% and 64%, respectively compared to state-of-the-art (SOTA) DCN routing strategies. And large-scale NS-3 simulations under the 2000 km inter-DC scenario confirm similar improvements.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents LCMP, a distributed long-haul cost-aware multi-path routing framework for RDMA flows in inter-datacenter networks. It combines a control-plane path-quality score to unify assessment of asymmetric paths, compact on-switch congestion signals for responsive reaction to congestion, and a cost-filtered hash to resolve simultaneous flow decision collisions while preserving path diversity. On an 8-DC testbed, LCMP is reported to reduce median and tail FCT slowdown by up to 76% and 64% versus SOTA DCN routing strategies, with similar gains confirmed in large-scale NS-3 simulations for a 2000 km inter-DC scenario.
Significance. If the mechanisms are shown to remain stable under realistic long-haul delay variance and the performance numbers are reproducible with full experimental details, the work would offer a practical advance for RDMA deployment across geographically distributed datacenters. The explicit handling of path asymmetry and delayed feedback distinguishes it from intra-DC solutions and could improve tail latency and cost efficiency for cloud services.
major comments (2)
- [Design and Evaluation sections] The path-quality score unification and on-switch signal responsiveness are load-bearing for the central FCT claims, yet the manuscript provides no quantitative validation (e.g., accuracy metrics or mis-ranking rates) of the score when propagation delays reach 2000 km and feedback latency exceeds multiple RTTs; this directly affects whether the 76%/64% reductions generalize beyond the reported testbed.
- [Design section] The collision-avoidance mechanism (cost filtering plus diversity-preserving hash) is presented as eliminating simultaneous-flow problems without new overhead or misjudgments, but no measurements of collision rates, hash diversity, or overhead under bursty arrivals are supplied; without these, the reported gains cannot be attributed confidently to the proposed components.
minor comments (2)
- [Abstract and Evaluation] The abstract and evaluation summary omit the specific SOTA baselines, traffic workloads, statistical methods, and raw data sources used for the FCT slowdown figures, which reduces reproducibility.
- [Design section] Notation for the path-quality score and congestion signals is introduced without an explicit equation or pseudocode listing, making the unification claim harder to follow.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to incorporate additional validation where the current presentation is incomplete.
read point-by-point responses
-
Referee: [Design and Evaluation sections] The path-quality score unification and on-switch signal responsiveness are load-bearing for the central FCT claims, yet the manuscript provides no quantitative validation (e.g., accuracy metrics or mis-ranking rates) of the score when propagation delays reach 2000 km and feedback latency exceeds multiple RTTs; this directly affects whether the 76%/64% reductions generalize beyond the reported testbed.
Authors: We agree that the manuscript lacks explicit quantitative metrics (accuracy, mis-ranking rates) for the path-quality score under 2000 km propagation delays and multi-RTT feedback. While the NS-3 simulations already cover the 2000 km inter-DC scenario and show overall FCT gains, they do not isolate score accuracy under high delay variance. In the revision we will add a dedicated analysis subsection that reports these metrics from the existing simulation traces (and additional runs if needed) to directly address stability and generalization of the reported improvements. revision: yes
-
Referee: [Design section] The collision-avoidance mechanism (cost filtering plus diversity-preserving hash) is presented as eliminating simultaneous-flow problems without new overhead or misjudgments, but no measurements of collision rates, hash diversity, or overhead under bursty arrivals are supplied; without these, the reported gains cannot be attributed confidently to the proposed components.
Authors: The referee is correct that the manuscript supplies no direct measurements of collision rates, hash diversity, or overhead for the cost-filtered hash under bursty arrivals. The design description argues that filtering reduces candidates while the hash preserves diversity without added overhead, but empirical data are absent. We will revise the Design section to include these measurements, obtained from both the 8-DC testbed and NS-3 simulations under bursty traffic patterns, allowing clearer attribution of performance gains to each component. revision: yes
Circularity Check
No circularity: empirical results measured against external SOTA baselines
full rationale
The paper's central claims consist of measured FCT reductions on a physical 8-DC testbed and NS-3 simulations under 2000 km scenarios, compared directly to existing DCN routing strategies. No equations, first-principles derivations, or predictions are presented that reduce to fitted parameters, self-citations, or renamed inputs. The path-quality score and filtered-hash mechanisms are presented as design choices motivated by stated network properties (asymmetry, delayed signals, collisions), with performance validated externally rather than by construction. This is the common case of a systems paper whose value lies in implementation and benchmarking, not in a closed mathematical loop.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption RDMA multi-path inter-DC networks exhibit path asymmetry, delayed congestion signals, and simultaneous flow routing collisions that defeat existing methods.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
LCMP combines a control-plane path-quality score with compact on-switch congestion signals... filtering high-cost candidates, and performing a diversity-preserving hash inside the reduced set.
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
C(p)=α·Cpath(p)+β·Ccong(p)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Yixiao Gao, Qiang Li, Lingbo Tang, Yongqing Xi, Pengcheng Zhang, Wenwen Peng et al. 2021. When Cloud Storage Meets RDMA. In18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). USENIX Association, 519–533
work page 2021
-
[2]
Wei Bai, Shanim Sainul Abdeen, Ankit Agrawal, Krishan Kumar Attre, Paramvir Bahl, Ameya Bhagat et al. 2023. Empowering Azure Storage with RDMA. In20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). USENIX Association, Boston, MA, 49– 67
work page 2023
-
[3]
Adithya Gangidi, Rui Miao, Shengbao Zheng, Sai Jayesh Bondu, Guil- herme Goes, Hany Morsy et al. 2024. RDMA over Ethernet for Dis- tributed Training at Meta Scale. InProceedings of the ACM SIGCOMM 2024 Conference. Association for Computing Machinery, New York, NY, USA, 57–70
work page 2024
-
[4]
Yibo Zhu, Haggai Eran, Daniel Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron et al. 2015. Congestion Control for Large- Scale RDMA Deployments. InProceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, Vol. 45. Association for Computing Machinery, New York, NY, USA, 523–536
work page 2015
-
[5]
Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye et al . 2016. RDMA over Commodity Ethernet at Scale. InProceedings of the 2016 ACM SIGCOMM Conference(Florianopolis, Brazil)(SIGCOMM ’16). Association for Computing Machinery, New York, NY, USA, 202–215
work page 2016
-
[6]
Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A Scalable, Commodity Data Center Network Architecture. InProceed- ings of the ACM SIGCOMM 2008 Conference on Data Communication. Association for Computing Machinery, New York, NY, USA, 63–74
work page 2008
-
[7]
Christian Hopps. 2000. Analysis of an Equal-Cost Multi-Path Algo- rithm. RFC 2992
work page 2000
-
[8]
Jialong Li, Haotian Gong, Federico De Marchi, Aoyu Gong, Yiming Lei, Wei Bai et al. 2024. Uniform-Cost Multi-Path Routing for Recon- figurable Data Center Networks. InProceedings of the ACM SIGCOMM 2024 Conference. Association for Computing Machinery, New York, NY, USA, 433–448
work page 2024
-
[9]
Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh et al. 2013. B4: Experience with a Globally- Deployed Software Defined Wan. InProceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM. Association for Computing Machinery, Hong Kong, China and New York, NY, USA, 3–14
work page 2013
-
[10]
Ferguson, Steve Gribble, Chi-Yao Hong, Charles Killian, Waqar Mohsin, Henrik Muehe et al
Andrew D. Ferguson, Steve Gribble, Chi-Yao Hong, Charles Killian, Waqar Mohsin, Henrik Muehe et al. 2021. Orion: Google’s Software- Defined Networking Control Plane. In18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). USENIX Association, 83–98
work page 2021
-
[11]
Cha Hwan Song, Xin Zhe Khooi, Raj Joshi, Inho Choi, Jialin Li, and Mun Choon Chan. 2023. Network Load Balancing with In-Network Reordering Support for RDMA. InProceedings of the ACM SIGCOMM 2023 Conference. Association for Computing Machinery, New York, NY, USA, 816–831
work page 2023
-
[12]
Wenxue Li, Xiangzhou Liu, Yunxuan Zhang, Zihao Wang, Wei Gu, Tao Qian et al. 2025. Revisiting RDMA Reliability for Lossy Fabrics. InProceedings of the ACM SIGCOMM 2025 Conference. Association for Computing Machinery, New York, NY, USA, 85–98
work page 2025
-
[13]
Junlan Zhou, Malveeka Tewari, Min Zhu, Abdul Kabbani, Leon Poutievski, Arjun Singh et al . 2014. WCMP: Weighted Cost Multi- pathing for Improved Fairness in Data Centers. InProceedings of the Ninth European Conference on Computer Systems. Association for Com- puting Machinery, New York, NY, USA, 14 pages
work page 2014
-
[14]
Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armis- tead, Roy Bannon et al. 2015. Jupiter Rising: A Decade of Clos Topolo- gies and Centralized Control in Google’s Datacenter Network. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. Association for Computing Machinery, London, United Kingdom and New Y...
work page 2015
-
[15]
Kok-Kiong Yap, Murtaza Motiwala, Jeremy Rahe, Steve Padgett, Matthew Holliman, Gary Baldus et al . 2017. Taking the Edge off with Espresso: Scale, Reliability and Programmability for Global In- ternet Peering. InProceedings of the Conference of the ACM Special Interest Group on Data Communication. Association for Computing Machinery, Los Angeles, CA, USA ...
work page 2017
-
[16]
Yuchao Zhang, Junchen Jiang, Ke Xu, Xiaohui Nie, Martin J. Reed, Haiyang Wang et al. 2018. BDS: A Centralized near-Optimal Overlay EUROSYS ’26, April 27–30, 2026, Edinburgh, Scotland Uk Dong-Yang Yu et al. Network for Inter-Datacenter Data Replication. InProceedings of the Thirteenth EuroSys Conference. Association for Computing Machinery, New York, NY, USA, 1–14
work page 2018
-
[17]
Yuchao Zhang, Xiaohui Nie, Junchen Jiang, Wendong Wang, Ke Xu, Youjian Zhao et al. 2021. BDS+: An Inter-Datacenter Data Replication System With Dynamic Bandwidth Separation.IEEE/ACM Transactions on Networking29, 2 (April 2021), 918–934
work page 2021
-
[18]
Srikanth Kandula, Dina Katabi, Shantanu Sinha, and Arthur Berger
-
[19]
Dynamic load balancing without packet reordering.SIGCOMM Comput. Commun. Rev.37, 2 (March 2007), 51–62
work page 2007
-
[20]
Peihao Huang, Guo Chen, Xin Zhang, Can Liu, Hongyu Wang, Huijun Shen et al. 2025. Fast and Scalable Selective Retransmission for RDMA. InIEEE INFOCOM 2025 - IEEE Conference on Computer Communications. 1–10
work page 2025
-
[21]
Shawn Shuoshuo Chen, Keqiang He, Rui Wang, Srinivasan Seshan, and Peter Steenkiste. 2024. Precise Data Center Traffic Engineering with Constrained Hardware Resources. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). USENIX Association, Santa Clara, CA, 669–690
work page 2024
-
[22]
Fei Gui, Songtao Wang, Dan Li, Li Chen, Kaihui Gao, Congcong Min et al. 2024. RedTE: Mitigating Subsecond Traffic Bursts with Real- Time and Distributed Traffic Engineering. InProceedings of the ACM SIGCOMM 2024 Conference. Association for Computing Machinery, New York, NY, USA, 71–85
work page 2024
-
[23]
Yuliang Li, Rui Miao, Hongqiang Harry Liu, Yan Zhuang, Fei Feng, Lingbo Tang et al. 2019. HPCC: High Precision Congestion Control. In Proceedings of the ACM Special Interest Group on Data Communication. ACM, Beijing China, 44–58
work page 2019
-
[24]
Zeling Zhang, Dongqi Cai, Yiran Zhang, Mengwei Xu, Shangguang Wang, and Ao Zhou. 2024. FedRDMA: Communication-Efficient Cross- Silo Federated LLM via Chunked RDMA Transmission. InProceedings of the 4th Workshop on Machine Learning and Systems. Association for Computing Machinery, New York, NY, USA, 126–133
work page 2024
-
[25]
Nguyen, Nickolas Falkner, Rhys Bowden, and Matthew Roughan
Simon Knight, Hung X. Nguyen, Nickolas Falkner, Rhys Bowden, and Matthew Roughan. 2011. The Internet Topology Zoo.IEEE Journal on Selected Areas in Communications29, 9 (2011), 1765–1775
work page 2011
-
[26]
Arjun Roy, Hongyi Zeng, Jasmeet Bagga, George Porter, and Alex C. Snoeren. 2015. Inside the Social Network’s (Datacenter) Network. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. Association for Computing Machinery, New York, NY, USA, 123–137
work page 2015
-
[27]
Mohammad Alizadeh, View Profile, Albert Greenberg, View Profile, David A. Maltz, View Profile et al. 2010. Data Center TCP (DCTCP). Proceedings of the ACM SIGCOMM 2010 conference40, 4 (Aug. 2010), 63–74
work page 2010
-
[28]
Radhika Mittal, Alexander Shpiner, Aurojit Panda, Eitan Zahavi, Arvind Krishnamurthy, Sylvia Ratnasamy et al. 2018. Revisiting Net- work Support for RDMA. InProceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. Association for Computing Machinery, New York, NY, USA, 313–326
work page 2018
-
[29]
Zilong Wang, Layong Luo, Qingsong Ning, Chaoliang Zeng, Wenxue Li, Xinchen Wan et al. 2023. SRNIC: A Scalable Architecture for RDMA NICs. In20th USENIX Symposium on Networked Systems Design and Implementation. USENIX Association, Boston, MA, 1–14
work page 2023
-
[30]
Peihao Huang, Xin Zhang, Zhigang Chen, Can Liu, and Guo Chen
-
[31]
In Proceedings of the 8th Asia-Pacific Workshop on Networking
LEFT: LightwEight and FasT Packet Reordering for RDMA. In Proceedings of the 8th Asia-Pacific Workshop on Networking. Association for Computing Machinery, New York, NY, USA, 67–73
-
[32]
Deepak Narayanan, Fiodar Kazhamiaka, Firas Abuzaid, Peter Kraft, Akshay Agrawal, Srikanth Kandula et al. 2021. Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP. InPro- ceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles(Virtual Event, Germany)(SOSP ’21). Association for Com- puting Machinery, New York, N...
work page 2021
-
[33]
Zhiying Xu, Francis Y. Yan, Rachee Singh, Justin T. Chiu, Alexander M. Rush, and Minlan Yu. 2023. Teal: Learning-Accelerated Optimization of WAN Traffic Engineering. InProceedings of the ACM SIGCOMM 2023 Conference. ACM, New York NY USA, 378–393
work page 2023
-
[34]
Bo He, Jingyu Wang, Qi Qi, Haifeng Sun, and Jianxin Liao. 2023. RTHop: Real-time Hop-by-Hop Mobile Network Routing by Decentral- ized Learning with Semantic Attention.IEEE Transactions on Mobile Computing22, 3 (March 2023), 1731–1747
work page 2023
-
[35]
Xinglong Diao, Huaxi Gu, Wenting Wei, Guoyong Jiang, and Baochun Li. 2024. Deep Reinforcement Learning Based Dynamic Flowlet Switch- ing for DCN.IEEE Transactions on Cloud Computing12, 2 (April 2024), 580–593
work page 2024
-
[36]
Jianmin Liu, Dan Li, and Yongjun Xu. 2024. Deep Distributional Rein- forcement Learning-Based Adaptive Routing with Guaranteed Delay Bounds.IEEE/ACM Transactions on Networking32, 6 (Dec. 2024), 4692– 4706
work page 2024
-
[37]
Yanqing Chen, Chen Tian, Jiaqing Dong, Song Feng, Xu Zhang, Chang Liu et al. 2022. Swing: Providing long-range lossless rdma via pfc- relay.IEEE Transactions on Parallel and Distributed Systems34, 1 (2022), 63–75
work page 2022
-
[38]
Chengyuan Huang, Feiyang Xue, Peiwen Yu, Xiaoliang Wang, Yanqing Chen, Tao Wu et al. 2024. Minimizing buffer utilization for lossless inter-DC links.IEEE/ACM Transactions on Networking(2024)
work page 2024
-
[39]
Minfei Long, Jiangping Han, Wentao Wang, Jiayu Yang, and Kaiping Xue. 2024. Lscc: Link-segmented congestion control for rdma in cross- datacenter networks. In2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS). IEEE, 1–10
work page 2024
-
[40]
Gaoxiong Zeng, Wei Bai, Ge Chen, Kai Chen, Dongsu Han, Yibo Zhu et al. 2022. Congestion Control for Cross-Datacenter Networks. IEEE/ACM Transactions on Networking30, 5 (2022), 2074–2089
work page 2022
-
[41]
Yantao Geng, Han Zhang, Xingang Shi, Jilong Wang, Xia Yin, Dongbiao He et al. 2023. Delay Based Congestion Control for Cross-Datacenter Networks. In2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS). 1–4
work page 2023
-
[42]
Minfei Long, Jiangping Han, Wentao Wang, Jiayu Yang, and Kaiping Xue. 2024. LSCC: Link-Segmented Congestion Control for RDMA in Cross-Datacenter Networks. In2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS). 1–10
work page 2024
-
[43]
Kai Lv, Jinyang Li, Pengyi Zhang, Heng Pan, Luyang Li, Shuihai Hu et al. 2025. OmniDMA: Scalable RDMA Transport over WAN. In Proceedings of the 9th Asia-Pacific Workshop on Networking. Association for Computing Machinery, New York, NY, USA, 135–141
work page 2025
-
[44]
Yuanwei Lu, Guo Chen, Bojie Li, Kun Tan, Yongqiang Xiong, Peng Cheng et al. 2018. Multi-Path Transport for RDMA in Datacenters. In 15th USENIX Symposium on Networked Systems Design and Implemen- tation (NSDI 18). USENIX Association, Renton, WA, 357–371
work page 2018
-
[45]
Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, Ramanan Vaidyanathan, Kevin Chu, Andy Fingerhut et al. 2014. CONGA: Dis- tributed Congestion-Aware Load Balancing for Datacenters. InPro- ceedings of the 2014 ACM Conference on SIGCOMM. Association for Computing Machinery, New York, NY, USA, 503–514
work page 2014
-
[46]
Naga Katta, Mukesh Hira, Changhoon Kim, Anirudh Sivaraman, and Jennifer Rexford. 2016. HULA: Scalable Load Balancing Using Pro- grammable Data Planes. InProceedings of the Symposium on SDN Research. Association for Computing Machinery, New York, NY, USA, Article 10, 12 pages
work page 2016
-
[47]
Brighten Godfrey, Yashar Ganjali, and Amin Firoozshahian
Soudeh Ghorbani, Zibin Yang, P. Brighten Godfrey, Yashar Ganjali, and Amin Firoozshahian. 2017. DRILL: Micro Load Balancing for Low- Latency Data Center Networks. InProceedings of the Conference of the ACM Special Interest Group on Data Communication. Association for Computing Machinery, Los Angeles, CA, USA and New York, NY, USA, 225–238
work page 2017
-
[48]
Naga Katta, Aditi Ghag, Mukesh Hira, Isaac Keslassy, Aran Bergman, Changhoon Kim et al. 2017. Clove: Congestion-Aware Load Balancing LCMP EUROSYS ’26, April 27–30, 2026, Edinburgh, Scotland Uk at the Virtual Edge. InProceedings of the 13th International Conference on Emerging Networking Experiments and Technologies. Association for Computing Machinery, In...
work page 2017
-
[49]
Hong Zhang, Junxue Zhang, Wei Bai, Kai Chen, and Mosharaf Chowd- hury. 2017. Resilient Datacenter Load Balancing in the Wild. InPro- ceedings of the Conference of the ACM Special Interest Group on Data Communication. Association for Computing Machinery, Los Angeles, CA, USA and New York, NY, USA, 253–266
work page 2017
-
[50]
Zhehui Zhang, Haiyang Zheng, Jiayao Hu, Xiangning Yu, Chenchen Qi, Xuemei Shi et al. 2021. Hashing Linearity Enables Relative Path Control in Data Centers. In2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 855–862
work page 2021
-
[51]
David Wetherall, Abdul Kabbani, Van Jacobson, Jim Winget, Yuchung Cheng, Charles B. Morrey III et al. 2023. Improving Network Avail- ability with Protective ReRoute. InProceedings of the ACM SIGCOMM 2023 Conference. Association for Computing Machinery, New York, NY, USA and New York, NY, USA, 684–695
work page 2023
-
[52]
Yadong Liu, Yunming Xiao, Xuan Zhang, Weizhen Dang, Huihui Liu, Xiang Li et al. 2025. Unlocking ECMP Programmability for Precise Traffic Control. In22nd USENIX Symposium on Networked Systems De- sign and Implementation (NSDI 25). USENIX Association, Philadelphia, PA, 87–106
work page 2025
-
[53]
Huimin Luo, Jiao Zhang, Mingxuan Yu, Yongchen Pan, Tian Pan, and Tao Huang. 2025. SeqBalance: Congestion-Aware Load Balancing with No Reordering in Data Center Networks.IEEE Internet of Things Journal12, 13 (2025), 25707–25719
work page 2025
-
[54]
Radhika Mittal, Vinh The Lam, Nandita Dukkipati, Emily Blem, Hassan Wassel, Monia Ghobadi et al. 2015. TIMELY: RTT-Based Congestion Control for the Datacenter.ACM SIGCOMM Computer Communication Review45, 4 (Sept. 2015), 537–550
work page 2015
-
[55]
Gautam Kumar, Nandita Dukkipati, Keon Jang, Hassan M. G. Wassel, Xian Wu, Behnam Montazeri et al. 2020. Swift: Delay Is Simple and Effective for Congestion Control in the Datacenter. InProceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Co...
work page 2020
-
[56]
Ahmed Saeed, Varun Gupta, Prateesh Goyal, Milad Sharif, Rong Pan, Mostafa Ammar et al . 2020. Annulus: A Dual Congestion Control Loop for Datacenter and WAN Traffic Aggregates. InProceedings of the Annual Conference of the ACM Special Interest Group on Data Commu- nication on the Applications, Technologies, Architectures, and Protocols for Computer Commun...
work page 2020
-
[57]
Parvin Taheri, Danushka Menikkumbura, Erico Vanini, Sonia Fahmy, Patrick Eugster, and Tom Edsall. 2020. RoCC: Robust Congestion Con- trol for RDMA. InProceedings of the 16th International Conference on Emerging Networking EXperiments and Technologies. ACM, Barcelona Spain, 17–30
work page 2020
-
[58]
Vamsi Addanki, Oliver Michel, and Stefan Schmid. 2022. PowerTCP: Pushing the Performance Limits of Datacenter Networks. In19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). USENIX Association, Renton, WA, 51–70
work page 2022
- [59]
-
[60]
Xiaolong Zhong, Jiao Zhang, Yali Zhang, Zixuan Guan, and Zirui Wan. 2022. PACC: Proactive and Accurate Congestion Feedback for RDMA Congestion Control. InIEEE INFOCOM 2022 - IEEE Conference on Computer Communications. 2228–2237
work page 2022
-
[61]
Yanqing Chen, Chen Tian, Jiaqing Dong, Song Feng, Xu Zhang, Chang Liu et al . 2023. Swing: Providing Long-Range Lossless RDMA via PFC-Relay.IEEE Transactions on Parallel and Distributed Systems34, 1 (Jan. 2023), 63–75
work page 2023
-
[62]
Jiao Zhang, Xiaolong Zhong, Zirui Wan, Yu Tian, Tian Pan, and Tao Huang. 2023. RCC: Enabling Receiver-Driven RDMA Congestion Con- trol With Congestion Divide-and-Conquer in Datacenter Networks. IEEE/ACM Transactions on Networking31, 1 (Feb. 2023), 103–117
work page 2023
-
[63]
Ke Wu, Dezun Dong, and Weixia Xu. 2024. COER: A Network Interface Offloading Architecture for RDMA and Congestion Control Protocol Codesign.ACM Transactions on Architecture and Code Optimization 21, 3 (Sept. 2024), 49:1–49:26
work page 2024
-
[64]
Jiao Zhang, Yuqing Wang, Xiaolong Zhong, Mingxuan Yu, Haoyu Pan, Yali Zhang et al. 2024. PACC: A Proactive CNP Generation Scheme for Datacenter Networks.IEEE/ACM Transactions on Networking32, 3 (June 2024), 2586–2599
work page 2024
-
[65]
Shaojun Zou, Yi Jiang, Jiacheng Qu, Tao Zhang, Yuanzhen Hu, and Yujie Peng. 2024. Achieving Ultra-Low Latency for Timeout-Less Con- gestion Control in Data Center Networks. In2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA). 1439–1444
work page 2024
-
[66]
Zirui Wan, Jiao Zhang, Yuxiang Wang, Kefei Liu, Haoyu Pan, Yongchen Pan et al. 2025. RHCC: Revisiting Intra-Host Congestion Control in RDMA Networks.IEEE Transactions on Networking33, 3 (2025), 1–14
work page 2025
-
[67]
Yuchao Zhang, Chenyue Zheng, Wenfei Wu, Zhuo Jiang, Lei Wang, Huichen Dai et al . 2025. MORS: Traffic-Aware Routing based on Temporal Attributes for Model Training Clusters. In2025 IEEE 33rd International Conference on Network Protocols (ICNP). 1–12
work page 2025
-
[68]
Chuhao Chen, Jiarui Ye, Yongbo Gao, Sen Liu, and Yang Xu. 2024. HF^2T: Host-Based Flowlet Fine-Tuning for RDMA Load Balancing. InProceedings of the 8th Asia-Pacific Workshop on Networking. ACM, Sydney Australia, 9–15
work page 2024
-
[69]
Maciej Besta, Marcel Schneider, Marek Konieczny, Karolina Cynk, Erik Henriksson, Salvatore Di Girolamo et al. 2020. FatPaths: Routing in Supercomputers and Data Centers When Shortest Paths Fall Short. InSC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 1–18. EUROSYS ’26, April 27–30, 2026, Edinburgh, Scotland...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.