pith. sign in

arxiv: 2604.07836 · v1 · submitted 2026-04-09 · 💻 cs.NI

LCMP: Distributed Long-Haul Cost-Aware Multi-Path Routing for Inter-Datacenter RDMA Networks

Pith reviewed 2026-05-10 18:17 UTC · model grok-4.3

classification 💻 cs.NI
keywords RDMAmulti-path routinginter-datacenter networkscongestion signalsflow completion timepath asymmetrydistributed routinglong-haul networks
0
0 comments X

The pith

LCMP places RDMA flows on multiple inter-DC paths to cut median slowdown by 76%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that prior routing methods fail for RDMA traffic across datacenters because of path asymmetry, delayed congestion feedback, and simultaneous flow collisions. LCMP introduces a distributed framework that scores path quality centrally to handle asymmetry uniformly and uses compact on-switch signals for quick congestion response. It prevents collisions by dropping high-cost path candidates and applying a diversity-preserving hash on the rest. Testbed results on eight datacenters and large-scale simulations over 2000 km distances show substantial drops in flow completion time slowdown versus existing strategies.

Core claim

LCMP combines a control-plane path-quality score with compact on-switch congestion signals to place RDMA flows on multiple inter-datacenter paths for low-cost, low-latency, and congestion-responsive transmission, while resolving simultaneous flow decision collisions by filtering high-cost candidates and performing a diversity-preserving hash inside the reduced set.

What carries the argument

The LCMP routing framework, which unifies asymmetric path assessment through control-plane quality scores and enables responsive congestion reaction through on-switch signals, with collision resolution via high-cost filtering plus hashed selection.

If this is right

  • RDMA flows can use multiple paths effectively even when those paths differ in length and capacity.
  • Distributed decisions stay collision-free without requiring a central coordinator.
  • Congestion response becomes fast enough to avoid the delayed-signal problems typical in long-haul links.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same score-plus-signal pattern could be tested on other wide-area latency-sensitive workloads such as distributed training or database replication.
  • If path scoring remains accurate at larger scales, the approach may reduce the need for over-provisioning inter-DC links.

Load-bearing premise

The control-plane path-quality score accurately unifies assessment across asymmetric paths, and the on-switch congestion signals plus filtered hash reliably prevent collisions and delayed feedback problems without introducing new overhead or misjudgments.

What would settle it

Run the 8-DC testbed or 2000 km NS-3 scenario with LCMP disabled versus enabled and check whether the reported reductions in median and tail FCT slowdown disappear or reverse when paths are highly asymmetric or when many flows decide simultaneously.

Figures

Figures reproduced from arXiv: 2604.07836 by Dong-Yang Yu, Haipeng Yao, Jun Wang, Ke Xu, Wendong Wang, Wenfei Wu, Xiaodi Wang, Yuchao Zhang.

Figure 1
Figure 1. Figure 1: [Motivation] Capacity–delay asymmetry causes ECMP and UCMP to make poor placement choices. LCMP balances utilization and reduces both median and tail FCT. To illustrate, consider an inter-DC scenario ( [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: LCMP architecture overview. the final egress from a low-cost candidate set. The abstrac￾tion directly targets the three challenges identified in §2.3: Addressing C1 heterogeneous, asymmetric topolo￾gies. We separate slowly-varying path attributes from tran￾sient congestion by precomputing a compact per-path qual￾ity score in the control plane (§3.2). Encoding delay and pro￾visioned capacity into a score al… view at source ↗
Figure 3
Figure 3. Figure 3: Switch bootstrap tables and mappings. Control plane installs a small set of vectors. Level score table. A linear mapping from level index to a 0–255 score is precomputed. This avoids per-packet floating computation. Trend normalization tables. For each coarse link-rate bucket (e.g., 25/100/400 Gbps), a small per-level trend thresh￾old vector is created. These tables normalize the raw trend accumulator into… view at source ↗
Figure 5
Figure 5. Figure 5: Median and tail FCT slowdown for Web Search on the testbed topology under 30%, 50%, 80% load. adjust traffic splitting ratios at edge routers to mitigate sub￾second traffic bursts. Metrics. Our primary metric is FCT slowdown[22]. It means a flow’s actual FCT normalized by its ideal FCT. Ideal FCT is the FCT of the same flow when run alone in the network with the shortest propagation delay in its topol￾ogy,… view at source ↗
Figure 6
Figure 6. Figure 6: [Simulator fidelity] NS-3 vs testbed FCT slowdown. 6.2 Large-Scale NS-3 Simulations Real-world topology. Fig. 4b provides a realistic Euro￾pean network topology (BSONetworkSolutions) drawn from the Internet Topology Zoo[24]. This topology contains back￾bone, customer and transit links across regions and therefore captures realistic heterogeneity in both delay and capacity. There are 13 DCs and we set inter… view at source ↗
Figure 7
Figure 7. Figure 7: [System-wide validation] Median and tail FCT slowdown across all inter-DC flows at 30%, 50% and 80% loads. 6.2.2 Representative DC-Pair Case Study: (DC1, DC13). Setup. To highlight LCMP’s mechanism, We filter the same runs used above to extract flows between DC1 and DC13, which exhibit multiple candidate routes. Results. When we focus on a representative DC-pair with multiple candidate routes (DC1–DC13), L… view at source ↗
Figure 8
Figure 8. Figure 8: [DC-pair case study] Median and tail FCT slow￾down for flows between DC pair (DC1, DC13) at 30%, 50% and 80% loads. Results [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Congestion-control orthogonality: median and tail FCT slowdown under different CCs. 7 Sensitivity Analysis and Discussion We present ablation and parameter-sensitivity results in this section. These experiments show how to configure LCMP and why each component matters in practice. The experi￾ments measure the impact of the control-plane path-quality term and the data-plane congestion term. They also ident… view at source ↗
Figure 11
Figure 11. Figure 11: [Sensitivity analysis] Median and tail FCT slowdown for WebSearch on the 8-DC topology at 30% load. Key findings. Fig. 11a shows two clear failure modes. First, the rm-alpha run (path-quality removed) severely de￾grades performance across almost all flow sizes. For example, the median for a 3,438 B flow rises from 6.8 (normal) to 26.0 when 𝛼 = 0 (+280%). The P99 for the same size rises from 12.1 to 50.0 (… view at source ↗
read the original abstract

RDMA-empowered cloud services are gradually deployed across datacenters (DCs) with multiple paths, which exhibit new properties of path asymmetry, delayed congestion signals, and simultaneous flow routing collisions, and further fail existing routing methods. We present LCMP, a distributed long-haul cost-aware multi-path routing framework that aims to place RDMA flows on multiple inter-DC paths, achieving low-cost, low-latency, and congestion-responsive transmission. LCMP combines a control-plane path-quality score with compact on-switch congestion signals, where the former unifies quality assessment for asymmetric paths and the latter enables responsive reaction to path congestion. LCMP further resolves the simultaneous flow decision collision problem by filtering high-cost candidates, and performing a diversity-preserving hash inside the reduced set. On an 8-DC testbed, LCMP reduces median and tail FCT slowdown by up to 76% and 64%, respectively compared to state-of-the-art (SOTA) DCN routing strategies. And large-scale NS-3 simulations under the 2000 km inter-DC scenario confirm similar improvements.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents LCMP, a distributed long-haul cost-aware multi-path routing framework for RDMA flows in inter-datacenter networks. It combines a control-plane path-quality score to unify assessment of asymmetric paths, compact on-switch congestion signals for responsive reaction to congestion, and a cost-filtered hash to resolve simultaneous flow decision collisions while preserving path diversity. On an 8-DC testbed, LCMP is reported to reduce median and tail FCT slowdown by up to 76% and 64% versus SOTA DCN routing strategies, with similar gains confirmed in large-scale NS-3 simulations for a 2000 km inter-DC scenario.

Significance. If the mechanisms are shown to remain stable under realistic long-haul delay variance and the performance numbers are reproducible with full experimental details, the work would offer a practical advance for RDMA deployment across geographically distributed datacenters. The explicit handling of path asymmetry and delayed feedback distinguishes it from intra-DC solutions and could improve tail latency and cost efficiency for cloud services.

major comments (2)
  1. [Design and Evaluation sections] The path-quality score unification and on-switch signal responsiveness are load-bearing for the central FCT claims, yet the manuscript provides no quantitative validation (e.g., accuracy metrics or mis-ranking rates) of the score when propagation delays reach 2000 km and feedback latency exceeds multiple RTTs; this directly affects whether the 76%/64% reductions generalize beyond the reported testbed.
  2. [Design section] The collision-avoidance mechanism (cost filtering plus diversity-preserving hash) is presented as eliminating simultaneous-flow problems without new overhead or misjudgments, but no measurements of collision rates, hash diversity, or overhead under bursty arrivals are supplied; without these, the reported gains cannot be attributed confidently to the proposed components.
minor comments (2)
  1. [Abstract and Evaluation] The abstract and evaluation summary omit the specific SOTA baselines, traffic workloads, statistical methods, and raw data sources used for the FCT slowdown figures, which reduces reproducibility.
  2. [Design section] Notation for the path-quality score and congestion signals is introduced without an explicit equation or pseudocode listing, making the unification claim harder to follow.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to incorporate additional validation where the current presentation is incomplete.

read point-by-point responses
  1. Referee: [Design and Evaluation sections] The path-quality score unification and on-switch signal responsiveness are load-bearing for the central FCT claims, yet the manuscript provides no quantitative validation (e.g., accuracy metrics or mis-ranking rates) of the score when propagation delays reach 2000 km and feedback latency exceeds multiple RTTs; this directly affects whether the 76%/64% reductions generalize beyond the reported testbed.

    Authors: We agree that the manuscript lacks explicit quantitative metrics (accuracy, mis-ranking rates) for the path-quality score under 2000 km propagation delays and multi-RTT feedback. While the NS-3 simulations already cover the 2000 km inter-DC scenario and show overall FCT gains, they do not isolate score accuracy under high delay variance. In the revision we will add a dedicated analysis subsection that reports these metrics from the existing simulation traces (and additional runs if needed) to directly address stability and generalization of the reported improvements. revision: yes

  2. Referee: [Design section] The collision-avoidance mechanism (cost filtering plus diversity-preserving hash) is presented as eliminating simultaneous-flow problems without new overhead or misjudgments, but no measurements of collision rates, hash diversity, or overhead under bursty arrivals are supplied; without these, the reported gains cannot be attributed confidently to the proposed components.

    Authors: The referee is correct that the manuscript supplies no direct measurements of collision rates, hash diversity, or overhead for the cost-filtered hash under bursty arrivals. The design description argues that filtering reduces candidates while the hash preserves diversity without added overhead, but empirical data are absent. We will revise the Design section to include these measurements, obtained from both the 8-DC testbed and NS-3 simulations under bursty traffic patterns, allowing clearer attribution of performance gains to each component. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results measured against external SOTA baselines

full rationale

The paper's central claims consist of measured FCT reductions on a physical 8-DC testbed and NS-3 simulations under 2000 km scenarios, compared directly to existing DCN routing strategies. No equations, first-principles derivations, or predictions are presented that reduce to fitted parameters, self-citations, or renamed inputs. The path-quality score and filtered-hash mechanisms are presented as design choices motivated by stated network properties (asymmetry, delayed signals, collisions), with performance validated externally rather than by construction. This is the common case of a systems paper whose value lies in implementation and benchmarking, not in a closed mathematical loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on domain assumptions about RDMA multi-path problems and the effectiveness of the proposed scoring and filtering mechanisms; no explicit free parameters or invented entities are named in the abstract.

axioms (1)
  • domain assumption RDMA multi-path inter-DC networks exhibit path asymmetry, delayed congestion signals, and simultaneous flow routing collisions that defeat existing methods.
    Explicitly stated as the motivation and failure mode of prior approaches.

pith-pipeline@v0.9.0 · 5516 in / 1307 out tokens · 61794 ms · 2026-05-10T18:17:35.491841+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages

  1. [1]

    Yixiao Gao, Qiang Li, Lingbo Tang, Yongqing Xi, Pengcheng Zhang, Wenwen Peng et al. 2021. When Cloud Storage Meets RDMA. In18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). USENIX Association, 519–533

  2. [2]

    Wei Bai, Shanim Sainul Abdeen, Ankit Agrawal, Krishan Kumar Attre, Paramvir Bahl, Ameya Bhagat et al. 2023. Empowering Azure Storage with RDMA. In20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). USENIX Association, Boston, MA, 49– 67

  3. [3]

    Adithya Gangidi, Rui Miao, Shengbao Zheng, Sai Jayesh Bondu, Guil- herme Goes, Hany Morsy et al. 2024. RDMA over Ethernet for Dis- tributed Training at Meta Scale. InProceedings of the ACM SIGCOMM 2024 Conference. Association for Computing Machinery, New York, NY, USA, 57–70

  4. [4]

    Yibo Zhu, Haggai Eran, Daniel Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron et al. 2015. Congestion Control for Large- Scale RDMA Deployments. InProceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, Vol. 45. Association for Computing Machinery, New York, NY, USA, 523–536

  5. [5]

    Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye et al . 2016. RDMA over Commodity Ethernet at Scale. InProceedings of the 2016 ACM SIGCOMM Conference(Florianopolis, Brazil)(SIGCOMM ’16). Association for Computing Machinery, New York, NY, USA, 202–215

  6. [6]

    Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A Scalable, Commodity Data Center Network Architecture. InProceed- ings of the ACM SIGCOMM 2008 Conference on Data Communication. Association for Computing Machinery, New York, NY, USA, 63–74

  7. [7]

    Christian Hopps. 2000. Analysis of an Equal-Cost Multi-Path Algo- rithm. RFC 2992

  8. [8]

    Jialong Li, Haotian Gong, Federico De Marchi, Aoyu Gong, Yiming Lei, Wei Bai et al. 2024. Uniform-Cost Multi-Path Routing for Recon- figurable Data Center Networks. InProceedings of the ACM SIGCOMM 2024 Conference. Association for Computing Machinery, New York, NY, USA, 433–448

  9. [9]

    Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh et al. 2013. B4: Experience with a Globally- Deployed Software Defined Wan. InProceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM. Association for Computing Machinery, Hong Kong, China and New York, NY, USA, 3–14

  10. [10]

    Ferguson, Steve Gribble, Chi-Yao Hong, Charles Killian, Waqar Mohsin, Henrik Muehe et al

    Andrew D. Ferguson, Steve Gribble, Chi-Yao Hong, Charles Killian, Waqar Mohsin, Henrik Muehe et al. 2021. Orion: Google’s Software- Defined Networking Control Plane. In18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). USENIX Association, 83–98

  11. [11]

    Cha Hwan Song, Xin Zhe Khooi, Raj Joshi, Inho Choi, Jialin Li, and Mun Choon Chan. 2023. Network Load Balancing with In-Network Reordering Support for RDMA. InProceedings of the ACM SIGCOMM 2023 Conference. Association for Computing Machinery, New York, NY, USA, 816–831

  12. [12]

    Wenxue Li, Xiangzhou Liu, Yunxuan Zhang, Zihao Wang, Wei Gu, Tao Qian et al. 2025. Revisiting RDMA Reliability for Lossy Fabrics. InProceedings of the ACM SIGCOMM 2025 Conference. Association for Computing Machinery, New York, NY, USA, 85–98

  13. [13]

    Junlan Zhou, Malveeka Tewari, Min Zhu, Abdul Kabbani, Leon Poutievski, Arjun Singh et al . 2014. WCMP: Weighted Cost Multi- pathing for Improved Fairness in Data Centers. InProceedings of the Ninth European Conference on Computer Systems. Association for Com- puting Machinery, New York, NY, USA, 14 pages

  14. [14]

    Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armis- tead, Roy Bannon et al. 2015. Jupiter Rising: A Decade of Clos Topolo- gies and Centralized Control in Google’s Datacenter Network. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. Association for Computing Machinery, London, United Kingdom and New Y...

  15. [15]

    Kok-Kiong Yap, Murtaza Motiwala, Jeremy Rahe, Steve Padgett, Matthew Holliman, Gary Baldus et al . 2017. Taking the Edge off with Espresso: Scale, Reliability and Programmability for Global In- ternet Peering. InProceedings of the Conference of the ACM Special Interest Group on Data Communication. Association for Computing Machinery, Los Angeles, CA, USA ...

  16. [16]

    Reed, Haiyang Wang et al

    Yuchao Zhang, Junchen Jiang, Ke Xu, Xiaohui Nie, Martin J. Reed, Haiyang Wang et al. 2018. BDS: A Centralized near-Optimal Overlay EUROSYS ’26, April 27–30, 2026, Edinburgh, Scotland Uk Dong-Yang Yu et al. Network for Inter-Datacenter Data Replication. InProceedings of the Thirteenth EuroSys Conference. Association for Computing Machinery, New York, NY, USA, 1–14

  17. [17]

    Yuchao Zhang, Xiaohui Nie, Junchen Jiang, Wendong Wang, Ke Xu, Youjian Zhao et al. 2021. BDS+: An Inter-Datacenter Data Replication System With Dynamic Bandwidth Separation.IEEE/ACM Transactions on Networking29, 2 (April 2021), 918–934

  18. [18]

    Srikanth Kandula, Dina Katabi, Shantanu Sinha, and Arthur Berger

  19. [19]

    Dynamic load balancing without packet reordering.SIGCOMM Comput. Commun. Rev.37, 2 (March 2007), 51–62

  20. [20]

    Peihao Huang, Guo Chen, Xin Zhang, Can Liu, Hongyu Wang, Huijun Shen et al. 2025. Fast and Scalable Selective Retransmission for RDMA. InIEEE INFOCOM 2025 - IEEE Conference on Computer Communications. 1–10

  21. [21]

    Shawn Shuoshuo Chen, Keqiang He, Rui Wang, Srinivasan Seshan, and Peter Steenkiste. 2024. Precise Data Center Traffic Engineering with Constrained Hardware Resources. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). USENIX Association, Santa Clara, CA, 669–690

  22. [22]

    Fei Gui, Songtao Wang, Dan Li, Li Chen, Kaihui Gao, Congcong Min et al. 2024. RedTE: Mitigating Subsecond Traffic Bursts with Real- Time and Distributed Traffic Engineering. InProceedings of the ACM SIGCOMM 2024 Conference. Association for Computing Machinery, New York, NY, USA, 71–85

  23. [23]

    Yuliang Li, Rui Miao, Hongqiang Harry Liu, Yan Zhuang, Fei Feng, Lingbo Tang et al. 2019. HPCC: High Precision Congestion Control. In Proceedings of the ACM Special Interest Group on Data Communication. ACM, Beijing China, 44–58

  24. [24]

    Zeling Zhang, Dongqi Cai, Yiran Zhang, Mengwei Xu, Shangguang Wang, and Ao Zhou. 2024. FedRDMA: Communication-Efficient Cross- Silo Federated LLM via Chunked RDMA Transmission. InProceedings of the 4th Workshop on Machine Learning and Systems. Association for Computing Machinery, New York, NY, USA, 126–133

  25. [25]

    Nguyen, Nickolas Falkner, Rhys Bowden, and Matthew Roughan

    Simon Knight, Hung X. Nguyen, Nickolas Falkner, Rhys Bowden, and Matthew Roughan. 2011. The Internet Topology Zoo.IEEE Journal on Selected Areas in Communications29, 9 (2011), 1765–1775

  26. [26]

    Arjun Roy, Hongyi Zeng, Jasmeet Bagga, George Porter, and Alex C. Snoeren. 2015. Inside the Social Network’s (Datacenter) Network. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. Association for Computing Machinery, New York, NY, USA, 123–137

  27. [27]

    Maltz, View Profile et al

    Mohammad Alizadeh, View Profile, Albert Greenberg, View Profile, David A. Maltz, View Profile et al. 2010. Data Center TCP (DCTCP). Proceedings of the ACM SIGCOMM 2010 conference40, 4 (Aug. 2010), 63–74

  28. [28]

    Radhika Mittal, Alexander Shpiner, Aurojit Panda, Eitan Zahavi, Arvind Krishnamurthy, Sylvia Ratnasamy et al. 2018. Revisiting Net- work Support for RDMA. InProceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. Association for Computing Machinery, New York, NY, USA, 313–326

  29. [29]

    Zilong Wang, Layong Luo, Qingsong Ning, Chaoliang Zeng, Wenxue Li, Xinchen Wan et al. 2023. SRNIC: A Scalable Architecture for RDMA NICs. In20th USENIX Symposium on Networked Systems Design and Implementation. USENIX Association, Boston, MA, 1–14

  30. [30]

    Peihao Huang, Xin Zhang, Zhigang Chen, Can Liu, and Guo Chen

  31. [31]

    In Proceedings of the 8th Asia-Pacific Workshop on Networking

    LEFT: LightwEight and FasT Packet Reordering for RDMA. In Proceedings of the 8th Asia-Pacific Workshop on Networking. Association for Computing Machinery, New York, NY, USA, 67–73

  32. [32]

    Deepak Narayanan, Fiodar Kazhamiaka, Firas Abuzaid, Peter Kraft, Akshay Agrawal, Srikanth Kandula et al. 2021. Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP. InPro- ceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles(Virtual Event, Germany)(SOSP ’21). Association for Com- puting Machinery, New York, N...

  33. [33]

    Yan, Rachee Singh, Justin T

    Zhiying Xu, Francis Y. Yan, Rachee Singh, Justin T. Chiu, Alexander M. Rush, and Minlan Yu. 2023. Teal: Learning-Accelerated Optimization of WAN Traffic Engineering. InProceedings of the ACM SIGCOMM 2023 Conference. ACM, New York NY USA, 378–393

  34. [34]

    Bo He, Jingyu Wang, Qi Qi, Haifeng Sun, and Jianxin Liao. 2023. RTHop: Real-time Hop-by-Hop Mobile Network Routing by Decentral- ized Learning with Semantic Attention.IEEE Transactions on Mobile Computing22, 3 (March 2023), 1731–1747

  35. [35]

    Xinglong Diao, Huaxi Gu, Wenting Wei, Guoyong Jiang, and Baochun Li. 2024. Deep Reinforcement Learning Based Dynamic Flowlet Switch- ing for DCN.IEEE Transactions on Cloud Computing12, 2 (April 2024), 580–593

  36. [36]

    Jianmin Liu, Dan Li, and Yongjun Xu. 2024. Deep Distributional Rein- forcement Learning-Based Adaptive Routing with Guaranteed Delay Bounds.IEEE/ACM Transactions on Networking32, 6 (Dec. 2024), 4692– 4706

  37. [37]

    Yanqing Chen, Chen Tian, Jiaqing Dong, Song Feng, Xu Zhang, Chang Liu et al. 2022. Swing: Providing long-range lossless rdma via pfc- relay.IEEE Transactions on Parallel and Distributed Systems34, 1 (2022), 63–75

  38. [38]

    Chengyuan Huang, Feiyang Xue, Peiwen Yu, Xiaoliang Wang, Yanqing Chen, Tao Wu et al. 2024. Minimizing buffer utilization for lossless inter-DC links.IEEE/ACM Transactions on Networking(2024)

  39. [39]

    Minfei Long, Jiangping Han, Wentao Wang, Jiayu Yang, and Kaiping Xue. 2024. Lscc: Link-segmented congestion control for rdma in cross- datacenter networks. In2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS). IEEE, 1–10

  40. [40]

    Gaoxiong Zeng, Wei Bai, Ge Chen, Kai Chen, Dongsu Han, Yibo Zhu et al. 2022. Congestion Control for Cross-Datacenter Networks. IEEE/ACM Transactions on Networking30, 5 (2022), 2074–2089

  41. [41]

    Yantao Geng, Han Zhang, Xingang Shi, Jilong Wang, Xia Yin, Dongbiao He et al. 2023. Delay Based Congestion Control for Cross-Datacenter Networks. In2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS). 1–4

  42. [42]

    Minfei Long, Jiangping Han, Wentao Wang, Jiayu Yang, and Kaiping Xue. 2024. LSCC: Link-Segmented Congestion Control for RDMA in Cross-Datacenter Networks. In2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS). 1–10

  43. [43]

    Kai Lv, Jinyang Li, Pengyi Zhang, Heng Pan, Luyang Li, Shuihai Hu et al. 2025. OmniDMA: Scalable RDMA Transport over WAN. In Proceedings of the 9th Asia-Pacific Workshop on Networking. Association for Computing Machinery, New York, NY, USA, 135–141

  44. [44]

    Yuanwei Lu, Guo Chen, Bojie Li, Kun Tan, Yongqiang Xiong, Peng Cheng et al. 2018. Multi-Path Transport for RDMA in Datacenters. In 15th USENIX Symposium on Networked Systems Design and Implemen- tation (NSDI 18). USENIX Association, Renton, WA, 357–371

  45. [45]

    Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, Ramanan Vaidyanathan, Kevin Chu, Andy Fingerhut et al. 2014. CONGA: Dis- tributed Congestion-Aware Load Balancing for Datacenters. InPro- ceedings of the 2014 ACM Conference on SIGCOMM. Association for Computing Machinery, New York, NY, USA, 503–514

  46. [46]

    Naga Katta, Mukesh Hira, Changhoon Kim, Anirudh Sivaraman, and Jennifer Rexford. 2016. HULA: Scalable Load Balancing Using Pro- grammable Data Planes. InProceedings of the Symposium on SDN Research. Association for Computing Machinery, New York, NY, USA, Article 10, 12 pages

  47. [47]

    Brighten Godfrey, Yashar Ganjali, and Amin Firoozshahian

    Soudeh Ghorbani, Zibin Yang, P. Brighten Godfrey, Yashar Ganjali, and Amin Firoozshahian. 2017. DRILL: Micro Load Balancing for Low- Latency Data Center Networks. InProceedings of the Conference of the ACM Special Interest Group on Data Communication. Association for Computing Machinery, Los Angeles, CA, USA and New York, NY, USA, 225–238

  48. [48]

    Naga Katta, Aditi Ghag, Mukesh Hira, Isaac Keslassy, Aran Bergman, Changhoon Kim et al. 2017. Clove: Congestion-Aware Load Balancing LCMP EUROSYS ’26, April 27–30, 2026, Edinburgh, Scotland Uk at the Virtual Edge. InProceedings of the 13th International Conference on Emerging Networking Experiments and Technologies. Association for Computing Machinery, In...

  49. [49]

    Hong Zhang, Junxue Zhang, Wei Bai, Kai Chen, and Mosharaf Chowd- hury. 2017. Resilient Datacenter Load Balancing in the Wild. InPro- ceedings of the Conference of the ACM Special Interest Group on Data Communication. Association for Computing Machinery, Los Angeles, CA, USA and New York, NY, USA, 253–266

  50. [50]

    Zhehui Zhang, Haiyang Zheng, Jiayao Hu, Xiangning Yu, Chenchen Qi, Xuemei Shi et al. 2021. Hashing Linearity Enables Relative Path Control in Data Centers. In2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 855–862

  51. [51]

    Morrey III et al

    David Wetherall, Abdul Kabbani, Van Jacobson, Jim Winget, Yuchung Cheng, Charles B. Morrey III et al. 2023. Improving Network Avail- ability with Protective ReRoute. InProceedings of the ACM SIGCOMM 2023 Conference. Association for Computing Machinery, New York, NY, USA and New York, NY, USA, 684–695

  52. [52]

    Yadong Liu, Yunming Xiao, Xuan Zhang, Weizhen Dang, Huihui Liu, Xiang Li et al. 2025. Unlocking ECMP Programmability for Precise Traffic Control. In22nd USENIX Symposium on Networked Systems De- sign and Implementation (NSDI 25). USENIX Association, Philadelphia, PA, 87–106

  53. [53]

    Huimin Luo, Jiao Zhang, Mingxuan Yu, Yongchen Pan, Tian Pan, and Tao Huang. 2025. SeqBalance: Congestion-Aware Load Balancing with No Reordering in Data Center Networks.IEEE Internet of Things Journal12, 13 (2025), 25707–25719

  54. [54]

    Radhika Mittal, Vinh The Lam, Nandita Dukkipati, Emily Blem, Hassan Wassel, Monia Ghobadi et al. 2015. TIMELY: RTT-Based Congestion Control for the Datacenter.ACM SIGCOMM Computer Communication Review45, 4 (Sept. 2015), 537–550

  55. [55]

    Gautam Kumar, Nandita Dukkipati, Keon Jang, Hassan M. G. Wassel, Xian Wu, Behnam Montazeri et al. 2020. Swift: Delay Is Simple and Effective for Congestion Control in the Datacenter. InProceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Co...

  56. [56]

    Ahmed Saeed, Varun Gupta, Prateesh Goyal, Milad Sharif, Rong Pan, Mostafa Ammar et al . 2020. Annulus: A Dual Congestion Control Loop for Datacenter and WAN Traffic Aggregates. InProceedings of the Annual Conference of the ACM Special Interest Group on Data Commu- nication on the Applications, Technologies, Architectures, and Protocols for Computer Commun...

  57. [57]

    Parvin Taheri, Danushka Menikkumbura, Erico Vanini, Sonia Fahmy, Patrick Eugster, and Tom Edsall. 2020. RoCC: Robust Congestion Con- trol for RDMA. InProceedings of the 16th International Conference on Emerging Networking EXperiments and Technologies. ACM, Barcelona Spain, 17–30

  58. [58]

    Vamsi Addanki, Oliver Michel, and Stefan Schmid. 2022. PowerTCP: Pushing the Performance Limits of Datacenter Networks. In19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). USENIX Association, Renton, WA, 51–70

  59. [59]

    Anderson

    Prateesh Goyal, Preey Shah, Kevin Zhao, Georgios Nikolaidis, Moham- mad Alizadeh, and Thomas E. Anderson. 2022. Backpressure Flow Control. In19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). USENIX Association, Renton, WA, 779–805

  60. [60]

    Xiaolong Zhong, Jiao Zhang, Yali Zhang, Zixuan Guan, and Zirui Wan. 2022. PACC: Proactive and Accurate Congestion Feedback for RDMA Congestion Control. InIEEE INFOCOM 2022 - IEEE Conference on Computer Communications. 2228–2237

  61. [61]

    Yanqing Chen, Chen Tian, Jiaqing Dong, Song Feng, Xu Zhang, Chang Liu et al . 2023. Swing: Providing Long-Range Lossless RDMA via PFC-Relay.IEEE Transactions on Parallel and Distributed Systems34, 1 (Jan. 2023), 63–75

  62. [62]

    Jiao Zhang, Xiaolong Zhong, Zirui Wan, Yu Tian, Tian Pan, and Tao Huang. 2023. RCC: Enabling Receiver-Driven RDMA Congestion Con- trol With Congestion Divide-and-Conquer in Datacenter Networks. IEEE/ACM Transactions on Networking31, 1 (Feb. 2023), 103–117

  63. [63]

    Ke Wu, Dezun Dong, and Weixia Xu. 2024. COER: A Network Interface Offloading Architecture for RDMA and Congestion Control Protocol Codesign.ACM Transactions on Architecture and Code Optimization 21, 3 (Sept. 2024), 49:1–49:26

  64. [64]

    Jiao Zhang, Yuqing Wang, Xiaolong Zhong, Mingxuan Yu, Haoyu Pan, Yali Zhang et al. 2024. PACC: A Proactive CNP Generation Scheme for Datacenter Networks.IEEE/ACM Transactions on Networking32, 3 (June 2024), 2586–2599

  65. [65]

    Shaojun Zou, Yi Jiang, Jiacheng Qu, Tao Zhang, Yuanzhen Hu, and Yujie Peng. 2024. Achieving Ultra-Low Latency for Timeout-Less Con- gestion Control in Data Center Networks. In2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA). 1439–1444

  66. [66]

    Zirui Wan, Jiao Zhang, Yuxiang Wang, Kefei Liu, Haoyu Pan, Yongchen Pan et al. 2025. RHCC: Revisiting Intra-Host Congestion Control in RDMA Networks.IEEE Transactions on Networking33, 3 (2025), 1–14

  67. [67]

    Yuchao Zhang, Chenyue Zheng, Wenfei Wu, Zhuo Jiang, Lei Wang, Huichen Dai et al . 2025. MORS: Traffic-Aware Routing based on Temporal Attributes for Model Training Clusters. In2025 IEEE 33rd International Conference on Network Protocols (ICNP). 1–12

  68. [68]

    Chuhao Chen, Jiarui Ye, Yongbo Gao, Sen Liu, and Yang Xu. 2024. HF^2T: Host-Based Flowlet Fine-Tuning for RDMA Load Balancing. InProceedings of the 8th Asia-Pacific Workshop on Networking. ACM, Sydney Australia, 9–15

  69. [69]

    Maciej Besta, Marcel Schneider, Marek Konieczny, Karolina Cynk, Erik Henriksson, Salvatore Di Girolamo et al. 2020. FatPaths: Routing in Supercomputers and Data Centers When Shortest Paths Fall Short. InSC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 1–18. EUROSYS ’26, April 27–30, 2026, Edinburgh, Scotland...