Enabling Reconfiguration-Communication Overlap for Collective Communication in Optical Networks

Changbo Wu; Gongming Zhao; Hongli Xu; Zhuolong Yu

arxiv: 2510.19322 · v3 · submitted 2025-10-22 · 💻 cs.NI · cs.AI· cs.DC

Enabling Reconfiguration-Communication Overlap for Collective Communication in Optical Networks

Changbo Wu , Zhuolong Yu , Gongming Zhao , Hongli Xu This is my paper

Pith reviewed 2026-05-18 05:06 UTC · model grok-4.3

classification 💻 cs.NI cs.AIcs.DC

keywords optical networkscollective communicationreconfigurable topologiesdistributed machine learningnetwork reconfigurationcommunication overlap

0 comments

The pith

SWOT overlaps network reconfiguration with data transmission inside collective communication to reduce completion times.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes SWOT as a demand-aware framework for optical networks that performs topology reconfiguration during collective communication rather than before or after it. It introduces intra-collective reconfiguration and three supporting techniques to conceal reconfiguration latency within ongoing data transfers. This matters for distributed machine learning because collective communication dominates runtime and reconfigurable optical networks offer high bandwidth only if their topology changes can be made cheap.

Core claim

SWOT uses intra-collective reconfiguration to align optical topology resources with the traffic patterns that arise inside a single collective communication algorithm. Heterogeneous message splitting, asynchronous overlapping, and topology bypassing allow the time for reconfiguration to be hidden inside data transmission phases, producing up to 89.7 percent lower communication completion time than static topologies across multiple algorithms.

What carries the argument

Intra-collective reconfiguration, which performs topology changes concurrently with data movement inside a collective operation rather than as a separate phase, enabled by heterogeneous message splitting, asynchronous overlapping, and topology bypassing.

If this is right

Communication completion time drops by as much as 89.7 percent compared with static optical topologies.
The gains remain stable when the number of optical resources and the reconfiguration delay both vary.
The same approach works for a range of collective communication algorithms used in distributed training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The overlapping strategy may make reconfigurable optical fabrics practical for production-scale AI clusters without requiring ultra-low-latency switches.
Similar hiding techniques could be explored for reconfiguration costs in wireless or hybrid network settings.
Hardware prototypes would be the next concrete step to check whether simulation assumptions hold when synchronization is implemented in firmware.

Load-bearing premise

Reconfiguration latency can be hidden by overlapping it with data transmission without introducing synchronization overhead or correctness issues that appear in real hardware.

What would settle it

A physical testbed experiment on optical switches that measures whether the proposed overlapping techniques produce the simulated communication time reductions or instead incur extra synchronization costs.

Figures

Figures reproduced from arXiv: 2510.19322 by Changbo Wu, Gongming Zhao, Hongli Xu, Zhuolong Yu.

**Figure 1.** Figure 1: Example of OCS interconnect. Left and right sides denote the same nodes (TX and RX paths, respectively). ... OCS2 ... OCSk ... ... ... ... OCS1 Node1 ... Node2 ... NodeN k interfaces ... Node3 [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 3.** Figure 3: For an 8-node AllReduce with a 40 MB payload, Rabenseifner’s algorithm proceeds in six communica [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: OCS configuration corresponding to traffic pattern in 8-node Rabenseifner’s AllReduce (see Fig 3) [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Design Intuition of reconfiguration-communication overlapping design. Communication [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: System Integration of SWOT. The framework consists of an Offline Scheduler for plan generation and a runtime Shim layer. The Shim mediates optical hardware with existing communication libraries, ensuring synchronization between data streams (via NICs) and topology reconfiguration (via OCSs). Phase 2: Runtime Orchestration via Shim. At runtime, SWOT introduces a lightweight Shim Layer to orchestrate executi… view at source ↗

**Figure 7.** Figure 7: CCT vs message size for different collective operations algorithm on a dedicated cluster of 256 nodes [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Impact of cluster size on CCT for different CC algorithm on a dedicated cluster physically fully [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: Normalized CCT for different collective primitives. For each primitive, the results are derived using the [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: illustrates the improvement over Strawman-ICR. We observe three key trends: 128 KB 512 KB 2 MB 8 MB 32 MB 128 MB 512 MB Message Size 0 25 50 75 Improvement (%) AllReduce-Rabenseifner s k=2 k=4 k=8 128 KB 512 KB 2 MB 8 MB 32 MB 128 MB 512 MB Message Size 0 25 50 75 Improvement (%) AlltoAll-Pairwise k=2 k=4 k=8 128 KB 512 KB 2 MB 8 MB 32 MB 128 MB 512 MB Message Size 0 25 50 75 Improvement (%) AlltoAll-Bruc… view at source ↗

**Figure 11.** Figure 11: Reconfiguration Overhead vs. Message Size under varying Reconfiguration Latencies ( [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗

read the original abstract

Collective communication (CC) is critical for scaling distributed machine learning (DML). The predictable traffic patterns of DML present a great opportunity for applying optical network technologies. Optical networks with reconfigurable topologies promise high bandwidth and low latency for collective communications. However, existing approaches face inherent limitations: static topologies are inefficient for dynamic communication patterns within CC algorithm, while frequent topology reconfiguration matching every step of the algorithm incurs significant overhead. In this paper, we propose SWOT, a demand-aware optical network framework that employs ``intra-collective reconfiguration'' to dynamically align network resources with CC traffic patterns. SWOT hides reconfiguration latency by overlapping it with data transmission through three key techniques: \textit{Heterogeneous Message Splitting}, \textit{Asynchronous Overlapping}, and \textit{Topology Bypassing}. Extensive simulations demonstrate that SWOT reduces communication completion time up to 89.7% across diverse CC algorithm compared to static baselines, demonstrating strong robustness to varying optical resources and reconfiguration delay.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SWOT gives a concrete way to hide optical reconfiguration latency inside collective communication steps via message splitting and async overlap, but the gains rest on simulations whose hardware realism is untested.

read the letter

The main point is that SWOT uses intra-collective reconfiguration to match optical topology changes to the predictable but varying traffic inside collective communication algorithms, rather than sticking with a fixed topology or reconfiguring at every step. The three techniques—heterogeneous message splitting, asynchronous overlapping, and topology bypassing—are meant to let data keep moving while the network adjusts, which directly targets the overhead that has limited reconfigurable optics in ML training clusters so far.

Referee Report

1 major / 1 minor

Summary. The paper proposes SWOT, a demand-aware optical network framework that uses intra-collective reconfiguration to align resources with collective communication traffic patterns in distributed machine learning. It hides reconfiguration latency via three techniques—Heterogeneous Message Splitting, Asynchronous Overlapping, and Topology Bypassing—and reports simulation results showing up to 89.7% reduction in communication completion time versus static baselines, along with robustness to varying optical resources and reconfiguration delays.

Significance. If the simulation results hold under realistic conditions, the work could meaningfully advance the use of reconfigurable optical networks for collective communication by demonstrating practical ways to overlap reconfiguration with data transmission. The extensive simulations across diverse CC algorithms and parameter sweeps provide a useful initial demonstration of robustness.

major comments (1)

[Evaluation section] Evaluation section: The headline result of up to 89.7% reduction in completion time rests on the assumption that Heterogeneous Message Splitting, Asynchronous Overlapping, and Topology Bypassing can hide reconfiguration latency without introducing measurable synchronization overhead, additional coordination messages, or buffer-management costs. The simulations use idealized optical models and controlled traffic patterns but provide no quantification or sensitivity analysis of these potential real-hardware costs; if they scale with message size or node count, the reported gains would shrink or disappear.

minor comments (1)

[Abstract] Abstract: 'diverse CC algorithm' should read 'diverse CC algorithms'.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive feedback and for recognizing the potential of our work to advance reconfigurable optical networks for collective communication. We address the major comment below and have revised the manuscript to strengthen the evaluation discussion.

read point-by-point responses

Referee: [Evaluation section] Evaluation section: The headline result of up to 89.7% reduction in completion time rests on the assumption that Heterogeneous Message Splitting, Asynchronous Overlapping, and Topology Bypassing can hide reconfiguration latency without introducing measurable synchronization overhead, additional coordination messages, or buffer-management costs. The simulations use idealized optical models and controlled traffic patterns but provide no quantification or sensitivity analysis of these potential real-hardware costs; if they scale with message size or node count, the reported gains would shrink or disappear.

Authors: We thank the referee for this important observation. Our simulations employ idealized optical models, as is standard for evaluating novel network architectures at scale, to isolate the benefits of intra-collective reconfiguration. The three techniques are explicitly designed to limit overhead: Heterogeneous Message Splitting permits partial message transmission during reconfiguration without requiring global barriers, Asynchronous Overlapping pipelines data movement with control operations, and Topology Bypassing avoids full-mesh coordination by preserving stable sub-topologies. In the original evaluation we modeled only the core reconfiguration delay. To address the concern, the revised manuscript adds a sensitivity analysis (new subsection 5.4) that injects parameterized coordination-message and buffer costs scaling with node count and message size. Even under conservative assumptions (e.g., 5–10 µs per-node coordination latency), SWOT retains at least 60 % reduction in completion time versus static baselines. We agree that real-hardware measurements would be valuable but lie outside the simulation scope of this paper. revision: partial

standing simulated objections not resolved

Direct empirical quantification of synchronization, coordination-message, and buffer-management costs on physical optical hardware

Circularity Check

0 steps flagged

No significant circularity detected; results stem from simulation of proposed techniques

full rationale

The paper proposes the SWOT framework and three specific techniques (Heterogeneous Message Splitting, Asynchronous Overlapping, Topology Bypassing) to overlap reconfiguration latency with data transmission in optical networks for collective communication. Performance claims (up to 89.7% reduction) are supported by extensive simulations comparing against static baselines under varying conditions. No mathematical derivations, equations, fitted parameters presented as predictions, or load-bearing self-citations appear in the abstract or described content. The central claims rest on novel algorithmic design and empirical evaluation rather than any reduction of outputs to inputs by construction. This is a standard self-contained systems design paper with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that DML traffic patterns are sufficiently predictable to allow demand-aware intra-collective reconfiguration; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Predictable traffic patterns of DML present a great opportunity for applying optical network technologies.
Invoked in the opening paragraph of the abstract as the premise enabling the entire approach.

pith-pipeline@v0.9.0 · 5703 in / 1244 out tokens · 40919 ms · 2026-05-18T05:06:37.802478+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SWOT hides reconfiguration latency by overlapping it with data transmission through three key techniques: Heterogeneous Message Splitting, Asynchronous Overlapping, and Topology Bypassing. ... MILP formulation ... constraints (1)–(11)
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

min CCT = max_i tstep_e_i subject to transmission-reconfiguration precedence, serial resource usage, logical step barrier

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

Broadcom Ethernet Network Adapter User Guide

2025. Broadcom Ethernet Network Adapter User Guide. https://techdocs.broadcom.com/us/en/storage-and-ethernet- connectivity/ethernet-nic-controllers/bcm957xxx/adapters.html

work page 2025
[2]

https://www

2025.Ethernet Adapters and Controllers - FastLinQ Performance NICs - 45000 FastLinQ CNA - Marvell. https://www. marvell.com/products/ethernet-adapters-and-controllers/fastlinq-cna-adapters/45000-fastlinq-cna/documents.html

work page 2025
[3]

Intel 2025.Intel®Ethernet 800 Series Product Guide. Intel. https://www.intel.com/content/www/us/en/content- details/709766/intel-ethernet-800-series-product-guide.html

work page 2025
[4]

https://docs.nvidia.com/networking/display/connectx8supernic/port- splitting-configurations

2025.Port Splitting Configurations - NVIDIA Docs. https://docs.nvidia.com/networking/display/connectx8supernic/port- splitting-configurations

work page 2025
[5]

Rukshani Athapathu and George Porter. 2025. Reconfigurability within Collective Communication Algorithms. In Proceedings of the 2nd Workshop on Networks for AI Computing (NAIC ’25). Association for Computing Machinery, New York, NY, USA, 43–49

work page 2025
[6]

Hitesh Ballani, Paolo Costa, Raphael Behrendt, Daniel Cletheroe, Istvan Haller, Krzysztof Jozwik, Fotini Karinou, Sophie Lange, Kai Shi, Benn Thomsen, and Hugh Williams. 2020. Sirius: A Flat Datacenter Network with Nanosecond Optical Switching. InProceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applicatio...

work page 2020
[7]

Bruck, Ching-Tien Ho, S

J. Bruck, Ching-Tien Ho, S. Kipnis, E. Upfal, and D. Weathersby. 1997. Efficient Algorithms for All-to-All Commu- nications in Multiport Message-Passing Systems.IEEE Transactions on Parallel and Distributed Systems8, 11 (1997), 1143–1156

work page 1997
[8]

Zixian Cai, Zhengyang Liu, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz, Jacob Nelson, and Olli Saarikivi

work page
[9]

InProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’21)

Synthesizing Optimal Collective Algorithms. InProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’21). Association for Computing Machinery, New York, NY, USA, 62–75

work page
[10]

Jiamin Cao, Shangfeng Shi, Jiaqi Gao, Weisen Liu, Yifan Yang, Yichi Xu, Zhilong Zheng, Yu Guan, Kun Qian, Ying Liu, Mingwei Xu, Tianshu Wang, Ning Wang, Jianbo Dong, Binzhang Fu, Dennis Cai, and Ennan Zhai. 2025. SyCCL: Exploiting Symmetry for Efficient Collective Communication Scheduling. InProceedings of the ACM SIGCOMM 2025 Conference (SIGCOMM ’25). As...

work page 2025
[11]

Marco Cococcioni and Lorenzo Fiaschi. 2021. The Big-M Method with the Numerical Infinite M.Optimization Letters 15, 7 (2021), 2455–2468

work page 2021
[12]

Manya Ghobadi. 2022. Emerging Optical Interconnects for AI Systems. InOptical Fiber Communication Conference (OFC) 2022 (Optical Fiber Communication Conference (OFC) 2022). Optica Publishing Group, Th1G.1

work page 2022
[13]

Roger W Hockney. 1994. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2. 20, 3 (1994), 389–398

work page 1994
[14]

Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao, Liang Xiang, Zherui Liu, Zhe Li, Xiaoying Jia, Jianxi Ye, Xin J...

work page 2024
[15]

Norm Jouppi, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, Suvinay Subramanian, Andy Swing, Brian Towles, Clifford Young, Xiang Zhou, Zongwei Zhou, and David A Patterson. 2023. TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings. InProceedings of the 50th Annual Inter...

work page 2023
[16]

Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling Laws for Neural Language Models

work page 2020
[17]

Mehrdad Khani, Manya Ghobadi, Mohammad Alizadeh, Ziyi Zhu, Madeleine Glick, Keren Bergman, Amin Vahdat, Benjamin Klenk, and Eiman Ebrahimi. 2021. SiP-ML: High-Bandwidth Optical Network Interconnects for Machine Learning Training. InProceedings of the 2021 ACM SIGCOMM 2021 Conference (SIGCOMM ’21). Association for Computing Machinery, New York, NY, USA, 657–675

work page 2021
[18]

Leon, Omid Mashayekhi, Joon Ong, Arjun Singh, Mukarram Tariq, Rui Wang, Jianan Zhang, Virginia Beauregard, Patrick Conner, Steve Gribble, Rishi Kapoor, Stephen Kratzer, Nanfang Li, Hong Liu, Karthik Nagaraj, Jason Ornstein, Samir Sawhney, Ryohei Urata, Lorenzo Vicisano, Kevin Yasumura, Shidong Zhang, Junlan Zhou, and Amin Vahdat

work page
[19]

InProceedings of the ACM SIGCOMM 2022 Conference (SIGCOMM ’22)

Jupiter Evolving: Transforming Google’s Datacenter Network via Optical Circuit Switches and Software-Defined Networking. InProceedings of the ACM SIGCOMM 2022 Conference (SIGCOMM ’22). Association for Computing Machinery, New York, NY, USA, 66–85

work page 2022
[20]

Xudong Liao, Yijun Sun, Han Tian, Xinchen Wan, Yilun Jin, Zilong Wang, Zhenghang Ren, Xinyang Huang, Wenxue Li, Kin Fai Tse, Zhizhen Zhong, Guyue Liu, Ying Zhang, Xiaofeng Ye, Yiming Zhang, and Kai Chen. 2025. MixNet: A Runtime Reconfigurable Optical-Electrical Fabric for Distributed Mixture-of-Experts Training. InProceedings of , Vol. 1, No. 1, Article ....

work page 2025
[21]

Hong Liu, Ryohei Urata, Kevin Yasumura, Xiang Zhou, Roy Bannon, Jill Berger, Pedram Dashti, Norm Jouppi, Cedric Lam, Sheng Li, Erji Mao, Daniel Nelson, George Papen, Mukarram Tariq, and Amin Vahdat. 2023. Lightwave Fabrics: At-Scale Optical Circuit Switching for Datacenter and Machine Learning Systems. InProceedings of the ACM SIGCOMM 2023 Conference (ACM...

work page 2023
[22]

Mellette, Rob McGuinness, Arjun Roy, Alex Forencich, George Papen, Alex C

William M. Mellette, Rob McGuinness, Arjun Roy, Alex Forencich, George Papen, Alex C. Snoeren, and George Porter

work page
[23]

InProceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM ’17)

RotorNet: A Scalable, Low-complexity, Optical Datacenter Network. InProceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM ’17). Association for Computing Machinery, New York, NY, USA, 267–280

work page
[24]

Stuart Mitchell, Michael OSullivan, and Iain Dunning. 2011. Pulp: A Linear Programming Toolkit for Python. 65 (2011). https://www.dit.uoi.gr/e-class/modules/document/file.php/216/PAPERS/2011.%20PuLP%20-%20A%20Linear% 20Programming%20Toolkit%20for%20Python.pdf

work page 2011
[25]

Rolf Rabenseifner. 2004. Optimization of Collective Reduction Operations. InComputational Science - ICCS 2004, Takeo Kanade, Josef Kittler, Jon M. Kleinberg, Friedemann Mattern, John C. Mitchell, Moni Naor, Oscar Nierstrasz, C. Pandu Rangan, Bernhard Steffen, Madhu Sudan, Demetri Terzopoulos, Dough Tygar, Moshe Y. Vardi, Gerhard Weikum, Marian Bubak, Geer...

work page 2004
[26]

Peter Sanders, Jochen Speck, and Jesper Larsson Träff. 2025. Two-Tree Algorithms for Full Bandwidth Broadcast, Reduction and Scan. 35, 12 (2025), 581–594. https://doi.org/10.1016/j.parco.2009.09.001

work page doi:10.1016/j.parco.2009.09.001 2025
[27]

Daniele De Sensi, Tommaso Bonato, David Saam, and Torsten Hoefler. 2024. Swing: Short-cutting Rings for Higher Bandwidth Allreduce. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). 1445– 1462

work page 2024
[28]

Aashaka Shah, Vijay Chidambaram, Meghan Cowan, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi, and Rachee Singh. 2023. {TACCL}: Guiding Collective Algorithm Synthesis Using Communication Sketches. In20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). 593–612

work page 2023
[29]

Guanhua Wang, Shivaram Venkataraman, Amar Phanishayee, Jorgen Thelin, Nikhil Devanur, and Ion Stoica. 2020. Blink: Fast and Generic Collectives for Distributed ML. InConference on Machine Learning and Systems (MLSys 2020)

work page 2020
[30]

Hao Wang, Han Tian, Jingrong Chen, Xinchen Wan, Jiacheng Xia, Gaoxiong Zeng, Wei Bai, Junchen Jiang, Yong Wang, and Kai Chen. 2024. Towards {Domain-Specific} Network Transport for Distributed {DNN} Training. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). 1421–1443

work page 2024
[31]

Weiyang Wang, Moein Khazraee, Zhizhen Zhong, Manya Ghobadi, Zhihao Jia, Dheevatsa Mudigere, Ying Zhang, and Anthony Kewitsch. 2023. {TopoOpt}: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs. In20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23) (NSDI’23). 739–767

work page 2023
[32]

Zhenguo Wu, Benjamin Klenk, Larry Dennison, and Keren Bergman. 2025. ACTINA: Adapting Circuit-Switching Techniques for AI Networking Architectures. InProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(New York, NY, USA, 2025-11-15)(SC ’25). Association for Computing Machinery, 1211–1222. https://dl...

work page doi:10.1145/3712285.3759842 2025
[33]

Xuwei Xue, Shaojuan Zhang, Bingli Guo, Wei Ji, Rui Yin, Bin Chen, and Shanguo Huang. 2023. Optical Switching Data Center Networks: Understanding Techniques and Challenges. arXiv:2302.05298 [cs, eess]

work page arXiv 2023
[34]

Xuting Liu, Behnaz Arzani, Siva Kesava Reddy Kakarla, Liangyu Zhao, Vincent Liu, Miguel Castro, Srikanth Kandula, and Luke Marshall. 2024. Rethinking Machine Learning Collective Communication as a Multi-Commodity Flow Problem. InProceedings of the ACM SIGCOMM 2024 Conference (ACM SIGCOMM ’24). Association for Computing Machinery, New York, NY, USA, 16–37

work page 2024
[35]

Zhisheng Ye, Wei Gao, Qinghao Hu, Peng Sun, Xiaolin Wang, Yingwei Luo, Tianwei Zhang, and Yonggang Wen. 2024. Deep Learning Workload Scheduling in GPU Datacenters: A Survey.Comput. Surveys56, 6 (2024), 146:1–146:38

work page 2024
[36]

Liangyu Zhao, Siddharth Pal, Tapan Chugh, Weiyang Wang, Jason Fantl, Prithwish Basu, Joud Khoury, and Arvind Krishnamurthy. 2025. Efficient {Direct-Connect} Topologies for Collective Communications. In22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25). 705–737

work page 2025
[37]

Yazhou Zu, Alireza Ghaffarkhah, Hoang-Vu Dang, Brian Towles, Steven Hand, Safeen Huda, Adekunle Bello, Alexander Kolbasov, Arash Rezaei, Dayou Du, Steve Lacy, Hang Wang, Aaron Wisner, Chris Lewis, and Henri Bahini. 2024. Resiliency at Scale: Managing {Google’s} {TPUv4} Machine Learning Supercomputer. In21st USENIX Symposium on Networked Systems Design and...

work page 2024

[1] [1]

Broadcom Ethernet Network Adapter User Guide

2025. Broadcom Ethernet Network Adapter User Guide. https://techdocs.broadcom.com/us/en/storage-and-ethernet- connectivity/ethernet-nic-controllers/bcm957xxx/adapters.html

work page 2025

[2] [2]

https://www

2025.Ethernet Adapters and Controllers - FastLinQ Performance NICs - 45000 FastLinQ CNA - Marvell. https://www. marvell.com/products/ethernet-adapters-and-controllers/fastlinq-cna-adapters/45000-fastlinq-cna/documents.html

work page 2025

[3] [3]

Intel 2025.Intel®Ethernet 800 Series Product Guide. Intel. https://www.intel.com/content/www/us/en/content- details/709766/intel-ethernet-800-series-product-guide.html

work page 2025

[4] [4]

https://docs.nvidia.com/networking/display/connectx8supernic/port- splitting-configurations

2025.Port Splitting Configurations - NVIDIA Docs. https://docs.nvidia.com/networking/display/connectx8supernic/port- splitting-configurations

work page 2025

[5] [5]

Rukshani Athapathu and George Porter. 2025. Reconfigurability within Collective Communication Algorithms. In Proceedings of the 2nd Workshop on Networks for AI Computing (NAIC ’25). Association for Computing Machinery, New York, NY, USA, 43–49

work page 2025

[6] [6]

Hitesh Ballani, Paolo Costa, Raphael Behrendt, Daniel Cletheroe, Istvan Haller, Krzysztof Jozwik, Fotini Karinou, Sophie Lange, Kai Shi, Benn Thomsen, and Hugh Williams. 2020. Sirius: A Flat Datacenter Network with Nanosecond Optical Switching. InProceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applicatio...

work page 2020

[7] [7]

Bruck, Ching-Tien Ho, S

J. Bruck, Ching-Tien Ho, S. Kipnis, E. Upfal, and D. Weathersby. 1997. Efficient Algorithms for All-to-All Commu- nications in Multiport Message-Passing Systems.IEEE Transactions on Parallel and Distributed Systems8, 11 (1997), 1143–1156

work page 1997

[8] [8]

Zixian Cai, Zhengyang Liu, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz, Jacob Nelson, and Olli Saarikivi

work page

[9] [9]

InProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’21)

Synthesizing Optimal Collective Algorithms. InProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’21). Association for Computing Machinery, New York, NY, USA, 62–75

work page

[10] [10]

Jiamin Cao, Shangfeng Shi, Jiaqi Gao, Weisen Liu, Yifan Yang, Yichi Xu, Zhilong Zheng, Yu Guan, Kun Qian, Ying Liu, Mingwei Xu, Tianshu Wang, Ning Wang, Jianbo Dong, Binzhang Fu, Dennis Cai, and Ennan Zhai. 2025. SyCCL: Exploiting Symmetry for Efficient Collective Communication Scheduling. InProceedings of the ACM SIGCOMM 2025 Conference (SIGCOMM ’25). As...

work page 2025

[11] [11]

Marco Cococcioni and Lorenzo Fiaschi. 2021. The Big-M Method with the Numerical Infinite M.Optimization Letters 15, 7 (2021), 2455–2468

work page 2021

[12] [12]

Manya Ghobadi. 2022. Emerging Optical Interconnects for AI Systems. InOptical Fiber Communication Conference (OFC) 2022 (Optical Fiber Communication Conference (OFC) 2022). Optica Publishing Group, Th1G.1

work page 2022

[13] [13]

Roger W Hockney. 1994. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2. 20, 3 (1994), 389–398

work page 1994

[14] [14]

Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao, Liang Xiang, Zherui Liu, Zhe Li, Xiaoying Jia, Jianxi Ye, Xin J...

work page 2024

[15] [15]

Norm Jouppi, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, Suvinay Subramanian, Andy Swing, Brian Towles, Clifford Young, Xiang Zhou, Zongwei Zhou, and David A Patterson. 2023. TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings. InProceedings of the 50th Annual Inter...

work page 2023

[16] [16]

Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling Laws for Neural Language Models

work page 2020

[17] [17]

Mehrdad Khani, Manya Ghobadi, Mohammad Alizadeh, Ziyi Zhu, Madeleine Glick, Keren Bergman, Amin Vahdat, Benjamin Klenk, and Eiman Ebrahimi. 2021. SiP-ML: High-Bandwidth Optical Network Interconnects for Machine Learning Training. InProceedings of the 2021 ACM SIGCOMM 2021 Conference (SIGCOMM ’21). Association for Computing Machinery, New York, NY, USA, 657–675

work page 2021

[18] [18]

Leon, Omid Mashayekhi, Joon Ong, Arjun Singh, Mukarram Tariq, Rui Wang, Jianan Zhang, Virginia Beauregard, Patrick Conner, Steve Gribble, Rishi Kapoor, Stephen Kratzer, Nanfang Li, Hong Liu, Karthik Nagaraj, Jason Ornstein, Samir Sawhney, Ryohei Urata, Lorenzo Vicisano, Kevin Yasumura, Shidong Zhang, Junlan Zhou, and Amin Vahdat

work page

[19] [19]

InProceedings of the ACM SIGCOMM 2022 Conference (SIGCOMM ’22)

Jupiter Evolving: Transforming Google’s Datacenter Network via Optical Circuit Switches and Software-Defined Networking. InProceedings of the ACM SIGCOMM 2022 Conference (SIGCOMM ’22). Association for Computing Machinery, New York, NY, USA, 66–85

work page 2022

[20] [20]

Xudong Liao, Yijun Sun, Han Tian, Xinchen Wan, Yilun Jin, Zilong Wang, Zhenghang Ren, Xinyang Huang, Wenxue Li, Kin Fai Tse, Zhizhen Zhong, Guyue Liu, Ying Zhang, Xiaofeng Ye, Yiming Zhang, and Kai Chen. 2025. MixNet: A Runtime Reconfigurable Optical-Electrical Fabric for Distributed Mixture-of-Experts Training. InProceedings of , Vol. 1, No. 1, Article ....

work page 2025

[21] [21]

Hong Liu, Ryohei Urata, Kevin Yasumura, Xiang Zhou, Roy Bannon, Jill Berger, Pedram Dashti, Norm Jouppi, Cedric Lam, Sheng Li, Erji Mao, Daniel Nelson, George Papen, Mukarram Tariq, and Amin Vahdat. 2023. Lightwave Fabrics: At-Scale Optical Circuit Switching for Datacenter and Machine Learning Systems. InProceedings of the ACM SIGCOMM 2023 Conference (ACM...

work page 2023

[22] [22]

Mellette, Rob McGuinness, Arjun Roy, Alex Forencich, George Papen, Alex C

William M. Mellette, Rob McGuinness, Arjun Roy, Alex Forencich, George Papen, Alex C. Snoeren, and George Porter

work page

[23] [23]

InProceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM ’17)

RotorNet: A Scalable, Low-complexity, Optical Datacenter Network. InProceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM ’17). Association for Computing Machinery, New York, NY, USA, 267–280

work page

[24] [24]

Stuart Mitchell, Michael OSullivan, and Iain Dunning. 2011. Pulp: A Linear Programming Toolkit for Python. 65 (2011). https://www.dit.uoi.gr/e-class/modules/document/file.php/216/PAPERS/2011.%20PuLP%20-%20A%20Linear% 20Programming%20Toolkit%20for%20Python.pdf

work page 2011

[25] [25]

Rolf Rabenseifner. 2004. Optimization of Collective Reduction Operations. InComputational Science - ICCS 2004, Takeo Kanade, Josef Kittler, Jon M. Kleinberg, Friedemann Mattern, John C. Mitchell, Moni Naor, Oscar Nierstrasz, C. Pandu Rangan, Bernhard Steffen, Madhu Sudan, Demetri Terzopoulos, Dough Tygar, Moshe Y. Vardi, Gerhard Weikum, Marian Bubak, Geer...

work page 2004

[26] [26]

Peter Sanders, Jochen Speck, and Jesper Larsson Träff. 2025. Two-Tree Algorithms for Full Bandwidth Broadcast, Reduction and Scan. 35, 12 (2025), 581–594. https://doi.org/10.1016/j.parco.2009.09.001

work page doi:10.1016/j.parco.2009.09.001 2025

[27] [27]

Daniele De Sensi, Tommaso Bonato, David Saam, and Torsten Hoefler. 2024. Swing: Short-cutting Rings for Higher Bandwidth Allreduce. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). 1445– 1462

work page 2024

[28] [28]

Aashaka Shah, Vijay Chidambaram, Meghan Cowan, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi, and Rachee Singh. 2023. {TACCL}: Guiding Collective Algorithm Synthesis Using Communication Sketches. In20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). 593–612

work page 2023

[29] [29]

Guanhua Wang, Shivaram Venkataraman, Amar Phanishayee, Jorgen Thelin, Nikhil Devanur, and Ion Stoica. 2020. Blink: Fast and Generic Collectives for Distributed ML. InConference on Machine Learning and Systems (MLSys 2020)

work page 2020

[30] [30]

Hao Wang, Han Tian, Jingrong Chen, Xinchen Wan, Jiacheng Xia, Gaoxiong Zeng, Wei Bai, Junchen Jiang, Yong Wang, and Kai Chen. 2024. Towards {Domain-Specific} Network Transport for Distributed {DNN} Training. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). 1421–1443

work page 2024

[31] [31]

Weiyang Wang, Moein Khazraee, Zhizhen Zhong, Manya Ghobadi, Zhihao Jia, Dheevatsa Mudigere, Ying Zhang, and Anthony Kewitsch. 2023. {TopoOpt}: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs. In20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23) (NSDI’23). 739–767

work page 2023

[32] [32]

Zhenguo Wu, Benjamin Klenk, Larry Dennison, and Keren Bergman. 2025. ACTINA: Adapting Circuit-Switching Techniques for AI Networking Architectures. InProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(New York, NY, USA, 2025-11-15)(SC ’25). Association for Computing Machinery, 1211–1222. https://dl...

work page doi:10.1145/3712285.3759842 2025

[33] [33]

Xuwei Xue, Shaojuan Zhang, Bingli Guo, Wei Ji, Rui Yin, Bin Chen, and Shanguo Huang. 2023. Optical Switching Data Center Networks: Understanding Techniques and Challenges. arXiv:2302.05298 [cs, eess]

work page arXiv 2023

[34] [34]

Xuting Liu, Behnaz Arzani, Siva Kesava Reddy Kakarla, Liangyu Zhao, Vincent Liu, Miguel Castro, Srikanth Kandula, and Luke Marshall. 2024. Rethinking Machine Learning Collective Communication as a Multi-Commodity Flow Problem. InProceedings of the ACM SIGCOMM 2024 Conference (ACM SIGCOMM ’24). Association for Computing Machinery, New York, NY, USA, 16–37

work page 2024

[35] [35]

Zhisheng Ye, Wei Gao, Qinghao Hu, Peng Sun, Xiaolin Wang, Yingwei Luo, Tianwei Zhang, and Yonggang Wen. 2024. Deep Learning Workload Scheduling in GPU Datacenters: A Survey.Comput. Surveys56, 6 (2024), 146:1–146:38

work page 2024

[36] [36]

Liangyu Zhao, Siddharth Pal, Tapan Chugh, Weiyang Wang, Jason Fantl, Prithwish Basu, Joud Khoury, and Arvind Krishnamurthy. 2025. Efficient {Direct-Connect} Topologies for Collective Communications. In22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25). 705–737

work page 2025

[37] [37]

Yazhou Zu, Alireza Ghaffarkhah, Hoang-Vu Dang, Brian Towles, Steven Hand, Safeen Huda, Adekunle Bello, Alexander Kolbasov, Arash Rezaei, Dayou Du, Steve Lacy, Hang Wang, Aaron Wisner, Chris Lewis, and Henri Bahini. 2024. Resiliency at Scale: Managing {Google’s} {TPUv4} Machine Learning Supercomputer. In21st USENIX Symposium on Networked Systems Design and...

work page 2024