pith. machine review for the scientific record. sign in

arxiv: 2604.23932 · v1 · submitted 2026-04-27 · 💻 cs.NI

Recognition: unknown

MatchRDMA: A Segmented and Rate-Matched Long-Haul RDMA Scheme for Geo-distributed LLM Training over OTN

Authors on Pith no claims yet

Pith reviewed 2026-05-08 01:22 UTC · model grok-4.3

classification 💻 cs.NI
keywords RDMAOTNLLM traininggeo-distributed systemslong-haul networkingrate matchinginter-DC communication
0
0 comments X

The pith

MatchRDMA coordinates OTN rates at both ends of long-haul links to raise RDMA throughput up to 20x for geo-distributed LLM training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a segmented RDMA scheme that proactively matches transmission rates between source and destination over optical transport networks. This addresses the mismatch that causes buffer buildup and wasted bandwidth when moving model data and gradients across distant data centers. A sympathetic reader would care because geo-distributed training is increasingly needed to access diverse compute and data, yet conventional RDMA performs poorly on long-haul OTN paths. If the coordination works, it lets existing RDMA stacks scale to wider geographic spreads without new hardware.

Core claim

MatchRDMA segments the long-haul path and matches source and destination OTN rates in advance, yielding up to 20 times higher inter-DC throughput and up to 62.7 percent lower destination buffer occupancy than standard RDMA.

What carries the argument

Proactive rate coordination between source and destination OTN endpoints, applied to a segmented RDMA flow.

If this is right

  • Distributed training jobs can run across more distant sites while keeping high link utilization.
  • Destination nodes need smaller buffers, lowering memory cost and power draw.
  • Existing RDMA applications for AI can extend to wide-area networks without protocol changes.
  • Training clusters gain flexibility to place GPUs where power or data are cheapest.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same rate-matching idea could apply to other bursty, high-bandwidth workloads such as large-scale data analytics.
  • Network operators might expose simple rate-control APIs on OTN gear to enable this coordination at scale.
  • If widely adopted, it would lower the barrier to multi-region AI training and reduce reliance on centralized mega-clusters.

Load-bearing premise

OTN equipment at both ends can be instructed to change rates on the fly without adding latency or requiring unavailable control interfaces.

What would settle it

A deployment measurement showing that rate coordination adds more than a few percent extra end-to-end latency or demands custom firmware on current OTN switches would disprove the practical gains.

Figures

Figures reproduced from arXiv: 2604.23932 by Hongxiang Wang, Jiawei Zhang, Jun Dai, Kexiong Fang, Xiaorun Wang, Xingde Li, Yuefeng Ji, Zheng Yang, Zhiqun Gu.

Figure 1
Figure 1. Figure 1: Three bottlenecks for long-haul RDMA over OTN view at source ↗
Figure 2
Figure 2. Figure 2: Principles of MatchRDMA. (a) Comparison of conventional long-haul RDMA and MatchRDMA with segmented OTN￾assisted control; (b) Reservoir model of destination-OTN buffer stress; (c) Source-OTN control workflow; (d) RoCE packet fields used for OTN-side control; (e) Destination-OTN slot-level rate estimation and rate-budget generation view at source ↗
Figure 3
Figure 3. Figure 3: (b) shows the inter-DC throughput under different message sizes and inter-DC OTN delays. As the delay increases, the DCQCN-like and THEMIS-like baselines suffer severe throughput degradation because sender progress remains constrained by the stretched end-to-end ACK feedback loop. In contrast, both pseudo-ACK baseline and MatchRDMA are much less sensitive to distance, since pseudo￾ACK keeps sender progress… view at source ↗
read the original abstract

We propose MatchRDMA, a proactive, segmented, and rate-matched long-haul RDMA scheme for geo-distributed LLM training over OTN. By coordinating source and destination OTN rates, it improves inter-DC throughput by up to 20x compared with conventional RDMA, and reduces destination-OTN buffer occupancy by up to 62.7%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes MatchRDMA, a proactive, segmented, and rate-matched long-haul RDMA scheme for geo-distributed LLM training over OTN. By coordinating source and destination OTN rates, it claims to improve inter-DC throughput by up to 20x compared with conventional RDMA while reducing destination-OTN buffer occupancy by up to 62.7%.

Significance. If the claimed gains can be realized under realistic OTN constraints, the work would address a practical bottleneck in wide-area RDMA for large-scale distributed training, potentially enabling more efficient geo-distributed LLM workloads. No machine-checked proofs, reproducible artifacts, or parameter-free derivations are presented.

major comments (2)
  1. The headline performance figures (20x throughput improvement and 62.7% buffer reduction) appear only as summary statements with no accompanying derivation, simulation parameters, traffic model, or validation data, so the central claims cannot be checked against the paper's own evidence.
  2. The scheme's core mechanism relies on proactive, zero-overhead coordination of source and destination OTN line rates, yet no analysis is given of control-plane signaling latency, reconfiguration times, or compatibility with ITU-T G.709 equipment; this assumption directly supports both the throughput multiplier and buffer-occupancy reduction and must be substantiated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, providing clarifications and committing to revisions that strengthen the presentation of our results and assumptions without altering the core contributions.

read point-by-point responses
  1. Referee: The headline performance figures (20x throughput improvement and 62.7% buffer reduction) appear only as summary statements with no accompanying derivation, simulation parameters, traffic model, or validation data, so the central claims cannot be checked against the paper's own evidence.

    Authors: The 20x throughput and 62.7% buffer-occupancy figures are obtained from the discrete-event simulations described in Section 5. That section specifies the OTN line-rate range (100–400 Gbps), the segmented RDMA flow model calibrated to LLM training traffic traces (with burst sizes and inter-DC distances), the baseline conventional RDMA implementation, and the exact buffer-occupancy metric. The derivation follows directly from the rate-matching equations in Section 3.3 applied to the simulated traces. To improve verifiability, we will add a concise parameter table to the abstract/introduction and expand the evaluation summary to restate the key traffic and OTN parameters alongside the headline numbers. revision: partial

  2. Referee: The scheme's core mechanism relies on proactive, zero-overhead coordination of source and destination OTN line rates, yet no analysis is given of control-plane signaling latency, reconfiguration times, or compatibility with ITU-T G.709 equipment; this assumption directly supports both the throughput multiplier and buffer-occupancy reduction and must be substantiated.

    Authors: We agree that the control-plane assumptions require explicit treatment. In the revised manuscript we will insert a new subsection (3.4) that (i) references the relevant ITU-T G.709 OTN framing and rate-adaptation procedures, (ii) cites typical control-plane latencies and reconfiguration times from commercial OTN equipment literature (sub-ms signaling, 10–100 ms rate changes), and (iii) quantifies the sensitivity of the reported gains to non-zero coordination delay. The analysis shows that the long-haul propagation delay still dominates, preserving the majority of the throughput and buffer benefits. revision: yes

Circularity Check

0 steps flagged

No circularity: proposal contains no derivations or self-referential equations

full rationale

The manuscript proposes MatchRDMA as a segmented rate-matching scheme that coordinates OTN line rates between source and destination to improve inter-DC throughput and reduce buffer occupancy. No equations, derivations, fitted parameters, or mathematical models are visible in the abstract or context provided. Claims of up to 20x throughput gain and 62.7% buffer reduction are presented as outcomes of the proposed coordination rather than results derived from prior fitted inputs or self-citations. The feasibility assumption regarding proactive OTN rate coordination is an external engineering premise, not a self-definitional or load-bearing circular step. The paper is therefore self-contained as a design proposal without any reduction of its central claims to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no equations, parameters, or assumptions, so the ledger is empty.

pith-pipeline@v0.9.0 · 5383 in / 1094 out tokens · 36775 ms · 2026-05-08T01:22:32.800427+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 16 canonical work pages

  1. [1]

    Congestion Control for Large-Scale RDMA De- ployments,

    Y. Zhu, H. Eran, D. Firestone, C. Guo, M. Lipshteyn, Y. Liron, J. Padhye, S. Raindel, M. H. Yahia, and M. Zhang, “Congestion Control for Large-Scale RDMA De- ployments,” in Proceedings of the 2015 ACM Confer- ence on Special Interest Group on Data Communication (SIGCOMM), London, United Kingdom, 2015, pp. 523– 536, DOI: 10.1145/2785956.2787484

  2. [2]

    TIMELY: RTT-based Congestion Control for the Datacenter,

    R. Mittal, V. T. Lam, N. Dukkipati, E. Blem, H. Wassel, M. Ghobadi, A. Vahdat, Y. Wang, D. Wetherall, and D. Zats, “TIMELY: RTT-based Congestion Control for the Datacenter,” in Proceedings of the 2015 ACM Confer- ence on Special Interest Group on Data Communication (SIGCOMM), London, United Kingdom, 2015, pp. 537– 550, DOI: 10.1145/2785956.2787510

  3. [3]

    Revisiting Net- work Support for RDMA,

    R. Mittal, A. Shpiner, A. Panda, E. Zahavi, A. Krishna- murthy, S. Ratnasamy, and S. Shenker, “Revisiting Net- work Support for RDMA,” in Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication(SIGCOMM), Budapest, Hungary, 2018, pp. 313–326, DOI: 10.1145/3230543.3230557

  4. [4]

    HPCC: high precision congestion control,

    Y. Li, R. Miao, H. H. Liu, Y. Zhuang, F. Feng, L. Tang, Z. Cao, M. Zhang, F. Kelly, M. Alizadeh, and M. Yu, “HPCC: high precision congestion control,” in Proceed- ings of the 2019 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM), Beijing, China, 2019, pp. 44–58, DOI: 10.1145/3341302.3342085

  5. [5]

    Alibaba Stel- lar: A New Generation RDMA Network for Cloud AI,

    J. Lu, J. Gao, F. Feng, Z. He, M. Zheng, K. Liu, J. He, B. Liao, S. Xu, K. Sun, Y. Mo, Q. Peng, J. Luo, Q. Li, G. Lu, Z. Wang, J. Dong, K. He, S. Cheng, J. Cao, H. Jiao, P. Zhang, S. Ma, L. Zhu, C. Shi, Y. Zhang, Y. Chen, W. Wang, S. Zhu, X. Li, Q. Wang, J. Liu, C. Wang, W. Lin, E. Zhai, J. Wu, Q. Liu, B. Fu, and D. Cai, “Alibaba Stel- lar: A New Generati...

  6. [6]

    Decentralized Training over 100km Based on Op- tical Transport Network for Artificial Intelligence,

    J. Sun, D. Wang, B. Qi, T. Gao, D. Zhang, W. Chen, and H. Li, “Decentralized Training over 100km Based on Op- tical Transport Network for Artificial Intelligence,” in Pro- ceedings of 50th European Conference on Optical Com- munication (ECOC), 2024, pp. 1-3, DOI: 10.1109/ECOC00010.2024.10739621

  7. [7]

    Field Trial of Multi-Datacenter Dis- tributed Training for LLM Based on Bandwidth Conver- gence and Two Parallel Strategies over 120km High-reli- ability 800Gbit/s C+L OTN

    Y. Liu, A. Zhang, X. Wang, L. Feng, K. Lv, H. Liu, X. Sheng, X. Huo, J. Li, “Field Trial of Multi-Datacenter Dis- tributed Training for LLM Based on Bandwidth Conver- gence and Two Parallel Strategies over 120km High-reli- ability 800Gbit/s C+L OTN”, in Proceedings of 50th Opti- cal Fiber Communication Conference (OFC), 2025, pp. 1-3

  8. [8]

    Cross- Pipe: Towards Optimal Pipeline Schedules for Cross- Datacenter Training,

    T. Chen, A. Kubicek, L. Huang, and T. Hoefler, “Cross- Pipe: Towards Optimal Pipeline Schedules for Cross- Datacenter Training,” in Proceedings of the 2025 USENIX Conference on USENIX Annual Technical Con- ference(ATC 25), Boston, MA, USA, 2025, Art. no. 64, 20 pages. DOI: 10.5555/3768039.3768103

  9. [9]

    GeoPipe: a Geo-distributed LLM Training Framework with enhanced Pipeline Parallelism in a Lossless RDMA- enabled Datacenter Optical Transport Network,

    J. Dai, X. Wang, K. Fang, Z. Yang, Y. Ji, and J. Zhang, “GeoPipe: a Geo-distributed LLM Training Framework with enhanced Pipeline Parallelism in a Lossless RDMA- enabled Datacenter Optical Transport Network,” in Pro- ceedings of 2025 Asia Communications and Photonics Conference (ACP), 2025, pp. 1–6, DOI: 10.1109/ACP66871.2025.11350566

  10. [10]

    RDMA Acceleration Scheme for Long-Dis- tance Optical Network,

    J. Ichikawa, H. Masutani, K. Obana, H. Takahashi, and K. Takasugi, “RDMA Acceleration Scheme for Long-Dis- tance Optical Network,” in Proceedings of 2024 IEEE Global Communications Conference (GLOBECOM), Cape Town, South Africa, 2024, pp. 4842–4847, DOI: 10.1109/GLOBECOM52923.2024.10901383

  11. [11]

    Swing: Providing Long-Range Lossless RDMA via PFC-Relay,

    Y. Chen, C. Tian, J. Dong, S. Feng, X. Zhang, C. Liu, P. Yu, N. Xia, W. Dou, and G. Chen, “Swing: Providing Long-Range Lossless RDMA via PFC-Relay,” IEEE Transactions on Parallel and Distributed Systems, vol. 34, no. 1, pp. 63–75, 2023, DOI: 10.1109/TPDS.2022.3215517

  12. [12]

    LSCC: Link-Segmented Congestion Control for RDMA in Cross- Datacenter Networks,

    M. Long, J. Han, W. Wang, J. Yang, and K. Xue, “LSCC: Link-Segmented Congestion Control for RDMA in Cross- Datacenter Networks,” in Proceedings of 2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS), Guangzhou, China, 2024, pp. 1–10, DOI: 10.1109/IWQoS61813.2024.10682909

  13. [13]

    LRCC: Long-haul RDMA congestion control for cross- datacenter networks,

    D. Yan, Y. Liu, S. Zhang, M. Xu, Z. Yang, and B. Fang, “LRCC: Long-haul RDMA congestion control for cross- datacenter networks,” Computer Networks, vol. 273, art. no. 111756, 2025, DOI: 10.1016/j.comnet.2025.111756

  14. [14]

    THEMIS: Addressing Congestion-Induced Unfairness in Long-Haul RDMA Networks,

    Z. Niu, M. Zhang, J. Zhang, R. Xie, Y. Yang, and X. Hu, “THEMIS: Addressing Congestion-Induced Unfairness in Long-Haul RDMA Networks,” in Proceedings of 2025 IEEE 33rd International Conference on Network Proto- cols (ICNP), 2025, pp. 1–13, DOI: 10.1109/ICNP65844.2025.11192376

  15. [15]

    Uno: A One-Stop Solution for Inter- and Intra- Data Center Congestion Control and Reliable Connectiv- ity,

    T. Bonato, S. Abdous, A. Kabbani, A. Ghalayini, N. Gebara, T. Lam, A. Agarwal, T. Chen, Z. Yu, K. Tara- nov, M. Elhaddad, D. De Sensi, S. Ghorbani, and T. Hoefler, “Uno: A One-Stop Solution for Inter- and Intra- Data Center Congestion Control and Reliable Connectiv- ity,” in Proceedings of the International Conference for High Performance Computing, Netwo...

  16. [16]

    Understanding Communication Characteristics of Distributed Training,

    W. Li, X. Liu, Y. Li, Y. Jin, H. Tian, Z. Zhong, G. Liu, Y. Zhang, and K. Chen, “Understanding Communication Characteristics of Distributed Training,” in Proceedings of the 8th Asia-Pacific Workshop on Networking (APNet '24), Sydney, Australia, 2024, pp. 1–8, DOI: 10.1145/3663408.3663409

  17. [17]

    Task placement and traf- fic interleaving for cross-datacenter LLM training over optical networks,

    Q. Hu, W. Wang, C. Huang, X. Wang, Y. Li, Y. Zhao, Y. Zheng, Y. Tan, and J. Zhang, “Task placement and traf- fic interleaving for cross-datacenter LLM training over optical networks,” Journal of Optical Communications and Networking, vol. 18, no. 2, pp. 137–149, 2026, DOI: 10.1364/JOCN.579324

  18. [18]

    AICB: Artificial Intelligence Communica- tion Benchmark,

    Alibaba Cloud, “AICB: Artificial Intelligence Communica- tion Benchmark,” GitHub repository. [Online]. Available: https://github.com/aliyun/aicb. Accessed: 2026

  19. [19]

    [Online]

    Networked-System-and-Security-Group, “THEMIS,” GitHub repository. [Online]. Available: https://github.com/Networked-System-and-Security- Group/Themis. Accessed: 2026