pith. sign in

arxiv: 2605.26960 · v1 · pith:FLCEAHK2new · submitted 2026-05-26 · 💻 cs.NI · cs.DC

Extreme-Scale Interconnection Networks

Pith reviewed 2026-07-01 15:57 UTC · model grok-4.3

classification 💻 cs.NI cs.DC
keywords interconnection networksleaf-spine topologiesfat-treedragonflyall-to-all collectiveextreme-scale data centersnon-minimal routingnetwork simulator evaluation
0
0 comments X

The pith

Multipass Random Leaf-Spine networks deliver up to 100 percent faster All2All collectives than Dragonfly or Fat-Tree at 100,000 endpoints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Multipass Random Leaf-Spine networks as an alternative to standard Fat-Tree designs for connecting very large numbers of endpoints in data centers. These networks combine features from Orthogonal Fat-Tree and Random Folded Clos topologies and apply non-minimal routing to improve throughput and flexibility. Simulator tests show concrete gains on collective communication patterns that dominate extreme-scale workloads. If the measured speedups hold, systems could support larger AI and scientific computations with the same hardware budget or lower cost for the same performance target.

Core claim

MRLS topologies inherit the scalability advantages of their source constructions while exceeding Fat-Tree throughput and routing flexibility when paired with non-minimal paths; exhaustive simulator runs establish a 50 percent reduction in All2All completion time against Fat-Tree and a 100 percent reduction against Dragonfly for a 100,000-endpoint collective.

What carries the argument

The Multipass Random Leaf-Spine topology together with non-minimal routing strategies that exploit multiple passes through the leaf and spine layers.

If this is right

  • Extreme-scale systems can reach higher endpoint counts without proportional increases in switch radix or cabling cost.
  • Collective operations that dominate AI training and scientific simulation workloads complete in substantially less time under the same traffic load.
  • Routing flexibility increases because non-minimal paths become viable without sacrificing overall network capacity.
  • Designers gain additional topology choices that trade minimal path length for reduced congestion on common traffic patterns.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the simulator results translate to hardware, MRLS could reduce the number of network tiers needed for a given scale, lowering both power and failure rates.
  • The same non-minimal routing approach might be applied to other folded-Clos variants to test whether the observed gains are specific to the random-leaf construction.
  • At scales beyond 100k endpoints the relative advantage may grow or shrink depending on how congestion scales with diameter; targeted larger simulations would clarify the trend.

Load-bearing premise

The interconnection network simulator accurately reproduces the latency, throughput, and congestion behavior that the proposed topologies would exhibit on real extreme-scale hardware.

What would settle it

Execution of the same 100,000-endpoint All2All collective on physical switches wired according to an MRLS layout versus an equivalent Fat-Tree, with direct measurement of completion time.

Figures

Figures reproduced from arXiv: 2605.26960 by Alejandro Cano, Carmen Mart\'inez, Cristina Brinza, Crist\'obal Camarero, Ram\'on Beivide.

Figure 1
Figure 1. Figure 1: Orthogonal Fat Tree with parameter q = 2. It has n = 14 leaf switches and k = 3 up-links at the leaf switches [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: MRLS with n = 14 leaf switches and k = 3 up-links at the leaf switches. Highlighted a pair of nodes at distance 4 and one path. switches that are not connected to any endpoint. In contrast, in direct networks (DN) such as Dragonfly (DF) [5], there is no distinction among switches, and all of them have endpoints connected. The FT is a particularly notable indirect network for being rearrangeably non-blockin… view at source ↗
Figure 3
Figure 3. Figure 3: Scalability spectrum for MRLS with R = 36 and f = 1 together some references. Regions are marked with D = x D∗ = y, meaning that a MRLS in the region has a leaf to leaf diameter of x and a maximum distance between arbitrary switches of y. 0 50 100 150 200 103 104 105 106 36 router radix (number of ports) total number of compute nodes MRLS D=2 MRLS D=D∗=4 MRLS D∗=5 2-level FT 3-level FT 2-level OFT [PITH_F… view at source ↗
Figure 4
Figure 4. Figure 4: Scalability of low-diameter multistage topologies. The bottom part of [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of indirect networks connecting 11k endpoints. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of indirect networks connecting 100k endpoints. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of a MRLS against current top direct networks. All connect 16k endpoints with Cost [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
read the original abstract

Extreme-scale data centers are the backbone of next-generation computing, enabling breakthroughs in science, artificial intelligence, and global innovation through unprecedented processing power and scalability. This work examines leaf-spine network topologies that offer extreme scalability--connecting a vast number of endpoints--while delivering strong performance at low cost. It takes as a starting point two alternatives to the widely used Fat-Tree topology: the Orthogonal Fat-Tree and the Random Folded Clos. The resulting Multipass Random Leaf-Spine (MRLS) networks inherit their advantages and surpass Fat-Trees in both throughput and flexibility. To fully leverage the topological properties of these networks, various non-minimal routing strategies are considered. An exhaustive evaluation using an interconnection network simulator provides insight into the trade-offs and scalability of these topologies under realistic conditions, positioning them as a promising solution for extreme-scale systems. The MRLS achieves a 50% speedup against a Fat-Tree for an All2All collective comprising 100k endpoints, and 100% against Dragonfly networks for the same collective.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes Multipass Random Leaf-Spine (MRLS) networks, extending Orthogonal Fat-Tree and Random Folded Clos designs, as scalable, low-cost alternatives to Fat-Tree topologies for extreme-scale data centers. It examines non-minimal routing strategies and reports, via exhaustive interconnection-network simulation, a 50% speedup over Fat-Tree and 100% over Dragonfly for All2All collectives at 100k endpoints.

Significance. If the simulation results prove reliable, MRLS could supply a flexible, high-throughput topology option for next-generation interconnects. The work's emphasis on scalability and routing trade-offs addresses a practical need in large-scale systems, though the absence of validation details weakens the evidential basis for the reported gains.

major comments (1)
  1. [Abstract / Evaluation] Abstract and evaluation description: The central performance claims (50% speedup vs. Fat-Tree, 100% vs. Dragonfly for 100k-endpoint All2All) are presented solely as outcomes of an interconnection-network simulator without any reference to calibration against hardware, cross-validation, modeled latency/congestion accuracy at scale, traffic-pattern specifics, or error bars. This directly undermines the load-bearing claim of topological superiority, as the deltas could be simulator artifacts.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful review and constructive feedback. We address the major comment point-by-point below and will revise the manuscript to improve clarity on the evaluation methodology.

read point-by-point responses
  1. Referee: [Abstract / Evaluation] Abstract and evaluation description: The central performance claims (50% speedup vs. Fat-Tree, 100% vs. Dragonfly for 100k-endpoint All2All) are presented solely as outcomes of an interconnection-network simulator without any reference to calibration against hardware, cross-validation, modeled latency/congestion accuracy at scale, traffic-pattern specifics, or error bars. This directly undermines the load-bearing claim of topological superiority, as the deltas could be simulator artifacts.

    Authors: We acknowledge that the abstract and evaluation description emphasize the simulation outcomes without sufficient methodological detail. The manuscript evaluates MRLS using a standard interconnection-network simulator under All2All traffic at 100k endpoints, as described in the full evaluation section. To strengthen the evidential basis, we will revise the abstract and add an expanded subsection on the simulator. This will include: references to prior literature validating the simulator's latency and congestion models at scale; explicit traffic-pattern parameters for the All2All collective; and results with error bars from multiple independent runs. Direct hardware calibration at this scale is not feasible, as no 100k-endpoint systems of this type exist for testing, which is why simulation is the established method in the field for extreme-scale topology studies. These additions will clarify that the reported 50% and 100% gains are simulation-derived but rest on transparent, reproducible modeling choices rather than artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity: performance claims are direct simulator outputs

full rationale

The paper reports MRLS speedups (50% vs Fat-Tree, 100% vs Dragonfly) exclusively as outcomes of an interconnection-network simulator run at 100k endpoints. The abstract and reader's summary contain no equations, fitted parameters, self-citations used as load-bearing premises, or definitional reductions. No derivation chain reduces a claimed result to its own inputs by construction; the evaluation is presented as an independent empirical step.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract supplies no explicit free parameters, axioms, or invented entities; the central claims rest on the unstated domain assumption that simulator results generalize to hardware and on the prior literature for the two starting topologies.

axioms (1)
  • domain assumption Leaf-spine topologies can be scaled to extreme endpoint counts while preserving strong performance at low cost.
    Invoked when the paper positions MRLS as a promising solution for extreme-scale systems.

pith-pipeline@v0.9.1-grok · 5721 in / 1157 out tokens · 48406 ms · 2026-07-01T15:57:44.692592+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 4 canonical work pages · 1 internal anchor

  1. [1]

    An analysis and comparison of the development status of green data centers in China,

    J. Lin, S. Hu, Z. Lu, and R. Deng, “An analysis and comparison of the development status of green data centers in China,” in2023 IEEE 8th Interna- tional Conference on Smart Cloud (SmartCloud), IEEE. IEEE Computer Society, 2023, pp. 130– 135

  2. [2]

    Top 10 largest AI datacenters in 2026,

    SemiAnalysis, “Top 10 largest AI datacenters in 2026,” https://www.youtube.com/watch?v= a-9egkpaZUw, Jan. 2026

  3. [3]

    Trends in ai supercomputers,

    K. F. Pilz, J. Sanders, R. Rahman, and L. Heim, “Trends in AI supercomputers,”arXiv preprint arXiv:2504.16026, 2025

  4. [4]

    Top500: The list,

    TOP500 Team, “Top500: The list,” https://www. top500.org/, 2025

  5. [5]

    Technology-driven, highly-scalable dragonfly topology,

    J. Kim, W. J. Dally, S. Scott, and D. Abts, “Technology-driven, highly-scalable dragonfly topology,” in2008 International Symposium on Computer Architecture. IEEE Computer Society, 2008, pp. 77–88

  6. [6]

    Fault-tolerant orthogonal fat-trees as intercon- nection networks,

    M. Valerio, L. Moser, and P. Melliar-Smith, “Fault-tolerant orthogonal fat-trees as intercon- nection networks,” inProceedings 1st Interna- tional Conference on Algorithms and Architec- tures for Parallel Processing, vol. 2, IEEE. IEEE Computer Society, 1995, pp. 749–754

  7. [7]

    Cost-effective diameter-two topologies: analysis and evalua- tion,

    G. Kathareios, C. Minkenberg, B. Prisacari, G. Rodriguez, and T. Hoefler, “Cost-effective diameter-two topologies: analysis and evalua- tion,” inSC ’15: Proceedings of the Interna- tional Conference for High Performance Comput- ing, Networking, Storage and Analysis. Associa- tion for Computing Machinery, 2015, pp. 1–11

  8. [8]

    Random folded Clos topologies for datacenter networks,

    C. Camarero, C. Mart´ ınez, and R. Beivide, “Random folded Clos topologies for datacenter networks,” in2017 IEEE International Sympo- sium on High Performance Computer Architecture (HPCA). IEEE Computer Society, 2017, pp. 193– 204

  9. [9]

    Projective networks: Topologies for large parallel computer systems,

    C. Camarero, C. Mart´ ınez, E. Vallejo, and R. Bei- vide, “Projective networks: Topologies for large parallel computer systems,”IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 7, pp. 2003–2016, 02 2017

  10. [10]

    Slim Fly: A cost effec- tive low-diameter network topology,

    M. Besta and T. Hoefler, “Slim Fly: A cost effec- tive low-diameter network topology,” inSC ’14: Proceedings of the International Conference for High Performance Computing, Networking, Stor- age and Analysis. IEEE Computer Society, 2014, pp. 348–359

  11. [11]

    BCube: a high performance, server-centric network architecture for modular data centers,

    C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu, “BCube: a high performance, server-centric network architecture for modular data centers,” inProceedings of the ACM SIGCOMM 2009 Conference on Data Communication, ser. SIGCOMM ’09. New York, NY, USA: Association for Computing Machinery, 2009, p. 63–74. [Online]. Available: https:...

  12. [12]

    Fat-trees: Universal networks for hardware-efficient supercomputing,

    C. E. Leiserson, “Fat-trees: Universal networks for hardware-efficient supercomputing,”IEEE Trans- actions on Computers, vol. C-34, no. 10, pp. 892– 901, 1985

  13. [13]

    A study of non-blocking switching networks,

    C. Clos, “A study of non-blocking switching networks,”The Bell System Technical Journal, vol. 32, no. 2, pp. 406–424, 1953

  14. [14]

    Communication requirements and interconnect optimization for high-end scientific applications,

    S. Kamil, L. Oliker, A. Pinar, and J. Shalf, “Communication requirements and interconnect optimization for high-end scientific applications,” IEEE Transactions on Parallel and Distributed Systems, vol. 21, no. 2, pp. 188–202, Feb. 2010

  15. [15]

    Characterizing parallel scien- tific applications on commodity clusters: An em- pirical study of a tapered fat-tree,

    E. A. Le´ on, I. Karlin, A. Bhatele, S. H. Langer, C. Chambreau, L. H. Howell, T. D’Hooge, and M. L. Leininger, “Characterizing parallel scien- tific applications on commodity clusters: An em- pirical study of a tapered fat-tree,” inSC ’16: Proceedings of the International Conference for High Performance Computing, Networking, Stor- age and Analysis, Nov....

  16. [16]

    Effective quality- of-service policy for capacity high-performance computing systems,

    A. Jokanovic, J. C. Sancho, J. Labarta, G. Ro- driguez, and C. Minkenberg, “Effective quality- of-service policy for capacity high-performance computing systems,” in2012 IEEE 14th Interna- tional Conference on High Performance Comput- ing and Communication 2012 IEEE 9th Interna- tional Conference on Embedded Software and Sys- tems, Jun. 2012, pp. 598–607

  17. [17]

    Reducing complexity in tree- like computer interconnection networks,

    J. Navaridas, J. Miguel-Alonso, F. J. Ridruejo, and W. Denzel, “Reducing complexity in tree- like computer interconnection networks,”Parallel computing, vol. 36, no. 2-3, pp. 71–85, 2010

  18. [18]

    High throughput data center topology design,

    A. Singla, P. B. Godfrey, and A. Kolla, “High throughput data center topology design,” in11th USENIX Symposium on Networked Systems De- sign and Implementation (NSDI 14). USENIX: The Advanced Computing Systems Association, 2014, pp. 29–41

  19. [19]

    On Moore graphs with diameters 2 and 3,

    A. J. Hoffman and R. R. Singleton, “On Moore graphs with diameters 2 and 3,”IBM Journal of Research and Development, vol. 4, no. 5, pp. 497– 504, 1960

  20. [20]

    Principles and practices of interconnection network,

    W. Dally and B. Towles, “Principles and practices of interconnection network,” 01 2004

  21. [21]

    Minimal rewiring: Efficient live expansion for Clos data center networks,

    S. Zhao, R. Wang, J. Zhou, J. Ong, J. C. Mogul, and A. Vahdat, “Minimal rewiring: Efficient live expansion for Clos data center networks,” in16th USENIX Symposium on Networked Systems De- sign and Implementation (NSDI 19). USENIX: 13 The Advanced Computing Systems Association, 2019, pp. 221–234

  22. [22]

    Jellyfish: networking data centers randomly,

    A. Singla, C.-Y. Hong, L. Popa, and P. B. Godfrey, “Jellyfish: networking data centers randomly,” in Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’12. USA: USENIX Association, 2012, p. 17

  23. [23]

    On random wiring in practicable folded Clos networks for modern datacenters,

    C. Camarero, C. Martınez, and R. Beivide, “On random wiring in practicable folded Clos networks for modern datacenters,”IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 8, pp. 1780–1793, 2018

  24. [24]

    Generating random regular graphs quickly,

    A. Steger and N. C. Wormald, “Generating random regular graphs quickly,”Combinatorics, Probability and Computing, vol. 8, no. 04, pp. 377– 396, 1999

  25. [25]

    Modelling standard and random- ized slimmed folded Clos networks,

    C. Camarero, J. Corral, C. Mart´ ınez, and R. Beivide, “Modelling standard and random- ized slimmed folded Clos networks,” inEuro- pean Conference on Parallel Processing, Springer. Springer, 2020, pp. 185–199

  26. [26]

    Bollob´ as,Random Graphs, 2nd ed

    B. Bollob´ as,Random Graphs, 2nd ed. Cambridge studies in advanced mathematics, 2001

  27. [27]

    Multi- path routing in the Jellyfish network,

    Z. ALzaid, S. Bhowmik, and X. Yuan, “Multi- path routing in the Jellyfish network,” in2021 IEEE international parallel and distributed pro- cessing symposium workshops (IPDPSW), IEEE. IEEE Computer Society, 2021, pp. 832–841

  28. [28]

    Po- larized routing: an efficient and versatile algo- rithm for large direct networks,

    C. Camarero, C. Mart´ ınez, and R. Beivide, “Po- larized routing: an efficient and versatile algo- rithm for large direct networks,” in2021 IEEE Symposium on High-Performance Interconnects (HOTI). IEEE Computer Society, 2021, pp. 52– 59

  29. [29]

    RNG: Flat Datacenter Networks at Scale

    G. Bernardi, R. Mahajan, C. Seshadhri, E. Car- lesso, C. M. Joseph, S. Kumar, P. Manikonda, L. Popa, R. Ram, S. Robinson, and E. Tennent, “Expanding into reality: Random graphs for datacenter networks,”arXiv preprint arXiv:2604.15261, 2026

  30. [30]

    A scheme for fast parallel communication,

    L. G. Valiant, “A scheme for fast parallel communication,”SIAM Journal on Computing, vol. 11, no. 2, pp. 350–361, 1982. [Online]. Available: https://doi.org/10.1137/0211027

  31. [31]

    The CAMINOS interconnection networks sim- ulator,

    C. Camarero, D. Postigo, and P. Fuentes, “The CAMINOS interconnection networks sim- ulator,”Journal of Parallel and Distributed Computing, vol. 204, p. 105136, 2025. [On- line]. Available: https://www.sciencedirect.com/ science/article/pii/S0743731525001030

  32. [32]

    Dragonfly+: Low cost topology for scaling datacenters,

    A. Shpiner, Z. Haramaty, S. Eliad, V. Zdornov, B. Gafni, and E. Zahavi, “Dragonfly+: Low cost topology for scaling datacenters,” in2017 IEEE 3rd International Workshop on High-Performance Interconnection Networks in the Exascale and Big- Data Era (HiPINEB). IEEE, 2017, pp. 1–8

  33. [33]

    Op- timization of collective communication operations in mpich,

    R. Thakur, R. Rabenseifner, and W. Gropp, “Op- timization of collective communication operations in mpich,”The International Journal of High Per- formance Computing Applications, vol. 19, no. 1, pp. 49–66, 2005

  34. [34]

    Beyond fat- trees without antennae, mirrors, and disco-balls,

    S. Kassing, A. Valadarsky, G. Shahaf, M. Schapira, and A. Singla, “Beyond fat- trees without antennae, mirrors, and disco-balls,” inProceedings of the Conference of the ACM Special Interest Group on Data Communication, 2017, pp. 281–294

  35. [35]

    Martingale approach to the coupon col- lection problem,

    N. Kan, “Martingale approach to the coupon col- lection problem,”Journal of Mathematical Sci- ences, vol. 127, no. 1, pp. 1737–1744, 2005. A Average Distance and Proba- bilities This appendix explains the mathematical details be- hind the graphs shown in Figures 3 and 4. Most of Figure 3 can be estimated experimentally by generat- ing many networks near t...