Characterizing the Impact of Congestion in Modern HPC Interconnects

Aldo Artigiani; Dancheng Zhang; Daniele De Sensi; Dirk Pleiter; Francesco Iannone; Karthee Sivalingam; Kexue Zhao; Lorenzo Piarulli; Marco Faltelli; Matteo Turisini

arxiv: 2604.11432 · v1 · submitted 2026-04-13 · 💻 cs.DC

Characterizing the Impact of Congestion in Modern HPC Interconnects

Lorenzo Piarulli , Marco Faltelli , Dirk Pleiter , Karthee Sivalingam , Dancheng Zhang , Kexue Zhao , Matteo Turisini , Francesco Iannone

show 2 more authors

Aldo Artigiani Daniele De Sensi

This is my paper

Pith reviewed 2026-05-10 15:50 UTC · model grok-4.3

classification 💻 cs.DC

keywords HPC interconnectsnetwork congestionInfiniBandCray Slingshotbursty trafficcollective communicationAI workloadscongestion characterization

0 comments

The pith

Modern HPC interconnect fabrics exhibit distinct scale-dependent responses to both steady and bursty congestion patterns typical of AI workloads.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how congestion arises and propagates in current high-performance computing networks when they carry mixed simulation and AI training traffic. It applies both constant heavy loads and controlled intermittent bursts that vary in length, strength, and gaps between them to five different fabrics. The study tracks how these conditions affect collective communication operations and how the effects change as the number of nodes grows. The resulting observations are meant to set realistic expectations for application performance and to highlight where congestion-control and load-balancing improvements would be most useful.

Core claim

Across EDR, HDR, and NDR InfiniBand, Cray Slingshot, and emerging Ethernet fabrics, congestion behavior is not uniform: each fabric shows its own sensitivity to burst duration, intensity, and pause intervals, and these sensitivities become more pronounced at larger system scales, directly influencing the completion time of collective operations.

What carries the argument

Controlled injection of steady congestion and parameterized bursty traffic patterns (varying duration, intensity, and pause length) applied at multiple system sizes to measure fabric-specific responses in collective communication performance.

If this is right

Collective performance models must incorporate fabric-specific and scale-dependent congestion terms rather than assuming uniform behavior.
Congestion-control algorithms should be tuned differently for short intense bursts versus long sustained loads on each fabric type.
Load-balancing strategies for mixed workloads can be refined by using the observed pause-length and intensity thresholds that trigger performance drops.
Ethernet-based designs aligned with emerging standards display congestion traits that can be compared directly against proprietary fabrics for future procurement decisions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If production AI workloads are dominated by short, high-intensity bursts separated by pauses, then fabrics that recover quickly from such bursts will deliver measurably shorter training times at scale.
The scale-dependent effects suggest that network architects may need to provision more adaptive routing or buffering as system sizes exceed current test configurations.
Direct comparison of the five fabrics under identical burst parameters provides a baseline that future congestion-mitigation proposals can be measured against.

Load-bearing premise

The chosen steady and bursty traffic patterns accurately reproduce the congestion that real production AI training and simulation workloads create on large systems.

What would settle it

Running the same collective operations inside an actual large-scale AI training job on one of the tested fabrics and comparing the measured congestion durations, intensities, and resulting slowdowns against the values recorded in the controlled experiments.

Figures

Figures reproduced from arXiv: 2604.11432 by Aldo Artigiani, Dancheng Zhang, Daniele De Sensi, Dirk Pleiter, Francesco Iannone, Karthee Sivalingam, Kexue Zhao, Lorenzo Piarulli, Marco Faltelli, Matteo Turisini.

**Figure 1.** Figure 1: Comparison of time distribution between AlltoAll and AllReduce operations. to measure and analyze the effects of congestion. Attackers are provided with two types of collectives for noise injection: AlltoAll and Incast. The first is used to send as many messages as possible to all nodes, creating a general state of network noise. The latter, instead, focuses the traffic on a single node, attempting to gene… view at source ↗

**Figure 2.** Figure 2: Bursty congestion injection visualization, bursty aggressor on the bottom and victim on the top. E. Evaluation Environments The fabrics have been evaluated under many different architectures, node counts, and topologies. This allowed us to understand congestion under a variety of scenarios. The systems considered were CINECA’s Leonardo, ENEA’s CRESCO8, LUMI, Huawei AI and Computing at Goethe University (… view at source ↗

**Figure 3.** Figure 3: 4 nodes HAICGU sawtooth behavior on 128 MiB messages with [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Steady NSLB analysis in a AlltoAll congestion with 4 victims and 4 aggressor nodes 7.39% of Leonardo’s Booster partition, 8.60% of LUMI’s GPU partition, and 33.68% of the CRESCO8 CPU partition. These results are shown in [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Ratio between uncongested and congested runtimes on CRESCO8, Leonardo and LUMI, from 16 to 256 nodes, and vectors ranging from from 8 [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Ratio between uncongested and congested runtime of 512 bytes, 32KiB and 2MiB [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Ratio between uncongested and congested runtime. 128 nodes [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Ratio between uncongested and congested runtime. 256 nodes [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

read the original abstract

High-performance computing (HPC) systems increasingly support both scalable AI training and large-scale simulation workloads. Both typically rely heavily on collective communication operations. On modern supercomputers, however, network congestion has emerged as a major limitation, driven by heterogeneous traffic patterns resulting from diverse workload mixes. As system scale and active users continue to grow, understanding how today's interconnect technologies respond to congestion is essential for establishing realistic performance expectations and informing future system design. This paper presents a comprehensive characterization of congestion behavior across four major HPC fabrics: EDR InfiniBand, HDR InfiniBand, NDR InfiniBand, Cray Slingshot, and emerging Ethernet fabrics. These fabrics span high-performance proprietary interconnects as well as adaptive Ethernet-based designs aligned with emerging standards such as Ultra Ethernet. We evaluate their responses to both steady congestion and a wide range of bursty patterns that vary in duration, intensity, and pause length, capturing the bursty communication typical of AI workloads. Our study covers multiple scales, examining how congestion manifests differently as system size increases and identifying scale-dependent behaviors that influence collective performance. By analyzing the challenges that arise under these controlled stress conditions, we aim to provide a practical overview of congestion issues and possible optimizations. The insights derived from this evaluation can guide researchers and HPC architects in designing more effective congestion-control mechanisms and network load-balancing strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper presents a comprehensive empirical characterization of congestion behavior across EDR InfiniBand, HDR InfiniBand, NDR InfiniBand, Cray Slingshot, and emerging Ethernet fabrics. It evaluates responses to steady congestion as well as bursty patterns that vary in duration, intensity, and pause length, at multiple system scales, with the aim of capturing traffic typical of AI workloads and informing congestion-control mechanisms and load-balancing strategies.

Significance. If the experimental design and results hold, the work would be significant for providing a broad, multi-fabric comparison of how modern HPC interconnects handle both steady and bursty congestion. The multi-scale analysis and focus on patterns relevant to AI training could offer practical guidance for system design and optimization. The empirical approach using real hardware across proprietary and standards-aligned fabrics is a clear strength.

major comments (2)

Abstract and experimental methodology section: The abstract describes a broad campaign but provides no details on measurement methodology, error bars, statistical significance, or the process for selecting burst parameters; without these, it is impossible to verify whether the reported congestion behaviors are reliable or reproducible.
Abstract: The claim that the controlled bursty patterns 'capture the bursty communication typical of AI workloads' rests on an unverified assumption that independent variation of duration, intensity, and pause length reproduces the correlated, phase-locked all-reduce traffic (identical message sizes, simultaneous incast from thousands of ranks) found in production AI training; this mismatch risks altering queue buildup, credit starvation, and adaptive routing in ways not captured by the steady/bursty matrix.

minor comments (3)

Results sections: Specify the exact quantitative metrics (e.g., latency increase, throughput degradation, packet loss) used to characterize congestion impact for each fabric and pattern.
Figure captions: Include details on the specific burst parameter values plotted and any normalization applied to enable direct comparison across scales and fabrics.
Introduction: Expand the discussion of how the selected fabrics align with emerging Ultra Ethernet standards to strengthen the forward-looking claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which has helped us strengthen the clarity and rigor of the manuscript. We address each major comment below and have made revisions to incorporate the suggestions where appropriate.

read point-by-point responses

Referee: Abstract and experimental methodology section: The abstract describes a broad campaign but provides no details on measurement methodology, error bars, statistical significance, or the process for selecting burst parameters; without these, it is impossible to verify whether the reported congestion behaviors are reliable or reproducible.

Authors: We agree that the abstract and methodology section would benefit from greater specificity to support reproducibility. We have revised the Experimental Methodology section to explicitly describe the measurement approach (high-resolution hardware counters and software timers synchronized across nodes), the computation of error bars (standard deviation across 10 independent runs per configuration), the statistical tests applied (two-tailed t-tests with p < 0.05 threshold for significance), and the burst-parameter selection process (derived from analysis of publicly available AI training traces to span realistic ranges of duration, intensity, and inter-burst pause). A single sentence summarizing these elements has been added to the abstract. revision: yes
Referee: Abstract: The claim that the controlled bursty patterns 'capture the bursty communication typical of AI workloads' rests on an unverified assumption that independent variation of duration, intensity, and pause length reproduces the correlated, phase-locked all-reduce traffic (identical message sizes, simultaneous incast from thousands of ranks) found in production AI training; this mismatch risks altering queue buildup, credit starvation, and adaptive routing in ways not captured by the steady/bursty matrix.

Authors: The referee is correct that independently varying burst parameters does not reproduce the tightly synchronized, phase-locked all-reduce traffic characteristic of production AI training. Our controlled design was chosen to isolate the individual contributions of duration, intensity, and pause length to congestion phenomena, thereby providing interpretable data that can inform both analytical models and more complex workload studies. We have revised the abstract to state that the patterns 'span a range of bursty behaviors relevant to AI workloads' and have added a dedicated paragraph in the Discussion section that acknowledges the limitation, discusses potential differences in queue dynamics and routing behavior, and outlines how the results remain useful for congestion-control design. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical characterization without derivations or fitted predictions

full rationale

The paper conducts direct hardware measurements of congestion responses on multiple interconnect fabrics (EDR/HDR/NDR InfiniBand, Slingshot, Ethernet) under controlled steady and bursty traffic patterns. No equations, models, parameter fits, or first-principles derivations are described; results are reported from experimental runs on external systems. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claims rest on observed behavior rather than any reduction to inputs by construction, satisfying the criteria for a self-contained empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the chosen synthetic traffic patterns are representative of real workloads; no free parameters, mathematical axioms, or invented entities are introduced.

axioms (1)

domain assumption The selected burst durations, intensities, and pause lengths capture the essential communication behavior of AI training and simulation collectives.
Invoked in the abstract when stating that the patterns 'capture the bursty communication typical of AI workloads'.

pith-pipeline@v0.9.0 · 5576 in / 1239 out tokens · 22572 ms · 2026-05-10T15:50:40.356812+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages

[1]

Ultra ethernet,

U. E. Consortium, “Ultra ethernet,” 2024, https://ultraethernet.org/

work page 2024
[2]

TOP500: Ranking of the World’s 500 Fastest Super- computers,

TOP500 Project, “TOP500: Ranking of the World’s 500 Fastest Super- computers,” https://top500.org/, 2025, accessed July 22, 2025

work page 2025
[3]

Ultra ethernet specification version 1.0,

“Ultra ethernet specification version 1.0,” Ultra Ethernet Consortium, Technical Specification, 2024, available from the Ultra Ethernet Con- sortium

work page 2024
[4]

Congestion Control for Large- Scale RDMA Deployments,

Y . Zhu, H. Eran, D. Firestone, C. Guo, M. Lipshteyn, Y . Liron, J. Padhye, S. Raindel, M. H. Yahia, and M. Zhang, “Congestion Control for Large- Scale RDMA Deployments,” inProceedings of the ACM SIGCOMM 2015 Conference (SIGCOMM ’15), London, United Kingdom, 2015, pp. 523–536

work page 2015
[5]

TIMELY: RTT- based Congestion Control for the Datacenter,

R. Mittal, V . T. Lam, N. Dukkipati, E. Blem, H. Wassel, M. Ghobadi, A. Vahdat, Y . Wang, D. Wetherall, and D. Zats, “TIMELY: RTT- based Congestion Control for the Datacenter,” inProceedings of the ACM SIGCOMM 2015 Conference (SIGCOMM ’15), London, United Kingdom, 2015, pp. 537–550

work page 2015
[6]

HPCC: High Precision Congestion Control,

Y . Li, R. Miao, H. H. Liu, Y . Zhuang, F. Feng, L. Tang, Z. Cao, M. Zhang, F. Kelly, M. Alizadeh, and M. Yu, “HPCC: High Precision Congestion Control,” inProceedings of the ACM SIGCOMM 2019 Conference (SIGCOMM ’19), Beijing, China, 2019, pp. 44–58

work page 2019
[7]

Under submission

D. omitted for double-blind reviewing, “Under submission.”

work page
[8]

Ai ecn threshold of lossless queues,

Huawei Support, “Ai ecn threshold of lossless queues,” https://support.huawei.com/enterprise/en/doc/EDOC1100420118/7ade444e/ai- ecn-threshold-of-lossless-queues, 2024, accessed: 2025-12-20

work page 2024
[9]

Analysis of an Equal-Cost Multi-Path Al- gorithm,

C. Hopps, “Analysis of an Equal-Cost Multi-Path Al- gorithm,” RFC 2992, Nov. 2009. [Online]. Available: https://www.ietf.org/rfc/rfc2992.txt

work page 2009
[10]

Hedera: dynamic flow scheduling for data center networks,

M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat, “Hedera: dynamic flow scheduling for data center networks,” inPro- ceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’10. USA: USENIX Association, 2010, p. 19

work page 2010
[11]

Conga: distributed congestion-aware load balancing for datacenters,

M. Alizadeh, T. Edsall, S. Dharmapurikar, R. Vaidyanathan, K. Chu, A. Fingerhut, V . T. Lam, F. Matus, R. Pan, N. Yadav, and G. Varghese, “Conga: distributed congestion-aware load balancing for datacenters,” inProceedings of the 2014 ACM Conference on SIGCOMM, ser. SIGCOMM ’14. New York, NY , USA: Association for Computing Machinery, 2014, p. 503–514. [On...

work page doi:10.1145/2619239.2626316 2014
[12]

Rao, Bruno Ribeiro, and Mohit Tawar- malani

A. Gangidi, R. Miao, S. Zheng, S. J. Bondu, G. Goes, H. Morsy, R. Puri, M. Riftadi, A. J. Shetty, J. Yang, S. Zhang, M. J. Fernandez, S. Gandham, and H. Zeng, “Rdma over ethernet for distributed training at meta scale,” inProceedings of the ACM SIGCOMM 2024 Conference, ser. ACM SIGCOMM ’24. New York, NY , USA: Association for Computing Machinery, 2024, p....

work page doi:10.1145/3651890.3672233 2024
[13]

Data center ethernet and remote direct memory access: Issues at hyperscale,

T. Hoefler, D. Roweth, K. Underwood, R. Alverson, M. Griswold, V . Tabatabaee, M. Kalkunte, S. Anubolu, S. Shen, M. McLaren, A. Kab- bani, and S. Scott, “Data center ethernet and remote direct memory access: Issues at hyperscale,”Computer, vol. 56, no. 7, pp. 67–77, 2023

work page 2023
[14]

Improving datacenter performance and robustness with multipath tcp,

C. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh, D. Wischik, and M. Handley, “Improving datacenter performance and robustness with multipath tcp,”SIGCOMM Comput. Commun. Rev., vol. 41, no. 4, p. 266–277, aug 2011. [Online]. Available: https://doi.org/10.1145/2043164.2018467

work page doi:10.1145/2043164.2018467 2011
[15]

Plb: congestion signals are simple and effective for network load balancing,

M. A. Qureshi, Y . Cheng, Q. Yin, Q. Fu, G. Kumar, M. Moshref, J. Yan, V . Jacobson, D. Wetherall, and A. Kabbani, “Plb: congestion signals are simple and effective for network load balancing,” inProceedings of the ACM SIGCOMM 2022 Conference, ser. SIGCOMM ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 207–218. [Online]. Available:...

work page doi:10.1145/3544216.3544226 2022
[16]

Flowbender: Flow-level adaptive routing for improved latency and throughput in datacenter networks,

A. Kabbani, B. Vamanan, J. Hasan, and F. Duchene, “Flowbender: Flow-level adaptive routing for improved latency and throughput in datacenter networks,” inProceedings of the 10th ACM International on Conference on Emerging Networking Experiments and Technologies, ser. CoNEXT ’14. New York, NY , USA: Association for Computing Machinery, 2014, p. 149–160. [O...

work page doi:10.1145/2674005.2674985 2014
[17]

Let it flow: Resilient asymmetric load balancing with flowlet switching,

E. Vanini, R. Pan, M. Alizadeh, P. Taheri, and T. Edsall, “Let it flow: Resilient asymmetric load balancing with flowlet switching,” in14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). Boston, MA: USENIX Association, Mar. 2017, pp. 407–420. [On- line]. Available: https://www.usenix.org/conference/nsdi17/technical- sessions/pr...

work page 2017
[18]

Presto: Edge-based load balancing for fast datacenter networks,

K. He, E. Rozner, K. Agarwal, W. Felter, J. Carter, and A. Akella, “Presto: Edge-based load balancing for fast datacenter networks,” in Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, ser. SIGCOMM ’15. New York, NY , USA: Association for Computing Machinery, 2015, p. 465–478. [Online]. Available: https://doi.org/10....

work page doi:10.1145/2785956.2787507 2015
[19]

Flowcut Switching: High-Performance Adaptive Routing With In-Order Delivery Guarantees ,

T. Bonato, D. De Sensi, S. Di Girolamo, A. Bataineh, D. Hewson, D. Roweth, and T. Hoefler, “ Flowcut Switching: High-Performance Adaptive Routing With In-Order Delivery Guarantees ,”IEEE Transac- tions on Networking, no. 01, pp. 1–14, Dec. 2025. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/TON.2025.3636209

work page doi:10.1109/ton.2025.3636209 2025
[20]

Multi-path transport for rdma in datacenters,

Y . Lu, G. Chen, B. Li, K. Tan, Y . Xiong, P. Cheng, J. Zhang, E. Chen, and T. Moscibroda, “Multi-path transport for rdma in datacenters,” in Proceedings of the 15th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’18. USA: USENIX Association, 2018, p. 357–371

work page 2018
[21]

On the impact of packet spraying in data center networks,

A. Dixit, P. Prakash, Y . C. Hu, and R. R. Kompella, “On the impact of packet spraying in data center networks,” in2013 Proceedings IEEE INFOCOM, 2013, pp. 2130–2138

work page 2013
[22]

Network load balancing technologies for intelligent computing centers,

W. Wang, F. Chen, P. Cao, L. Shan, T. Wu, and H. Wen, “Network load balancing technologies for intelligent computing centers,”Communica- tions of Huawei Research, no. Issue 9, pp. 13–22, 2025

work page 2025
[23]

InfiniBand Congestion Control: Mod- elling and Validation,

E. G. Gran and S.-A. Reinemo, “InfiniBand Congestion Control: Mod- elling and Validation,” inOMNeT++ 2011 (Workshop at SIMUTools), Barcelona, Spain, 2011

work page 2011
[24]

A Measure- ment Study of Congestion in an InfiniBand Network,

F. Alali, F. Mizero, M. Veeraraghavan, and J. M. Dennis, “A Measure- ment Study of Congestion in an InfiniBand Network,” in2017 Network Traffic Measurement and Analysis Conference (TMA), 2017, pp. 1–9

work page 2017
[25]

Adaptive routing in infiniband hardware,

J. Rocher-Gonz ´alez, E. G. Gran, S.-A. Reinemo, T. Skeie, J. Escudero- Sahuquillo, P. J. Garc´ıa, and F. J. Q. Flor, “Adaptive routing in infiniband hardware,” in2022 22nd IEEE/ACM International Symposium on Clus- ter, Cloud and Internet Computing (CCGrid), 2022, pp. 463–472

work page 2022
[26]

An in-depth analysis of the slingshot interconnect,

D. De Sensi, S. Di Girolamo, K. H. McMahon, D. Roweth, and T. Hoefler, “An in-depth analysis of the slingshot interconnect,” inSC20: International Conference for High Performance Computing, Networking, Storage and Analysis, 2020, pp. 1–14

work page 2020
[27]

Hpe slingshot launched into network space,

D. Roweth, “Hpe slingshot launched into network space,” Cray User Group (CUG) Proceedings, 2022

work page 2022
[28]

Gpcnet: Designing a benchmark suite for inducing and measuring contention in hpc networks,

S. Chunduri, T. Groves, P. Mendygral, B. Austin, J. Balma, K. Kandalla, K. Kumaran, G. Lockwood, S. Parker, S. Warren, N. Wichmann, and N. J. Wright, “Gpcnet: Designing a benchmark suite for inducing and measuring contention in hpc networks,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC...

work page 2019
[29]

Open MPI: Open Source High Performance Computing,

“Open MPI: Open Source High Performance Computing,” https://www.open-mpi.org/, Associated with Software in the Public Interest, 2025, accessed: 2025-09-02

work page 2025
[30]

, title =

L. Pichetti, D. De Sensi, K. Sivalingam, S. Nassyr, D. Cesarini, M. Turisini, D. Pleiter, A. Artigiani, and F. Vella, “Benchmarking ethernet interconnect for hpc/ai workloads,” inProceedings of the SC ’24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, ser. SC- W ’24. IEEE Press, 2025, p. 869–875. [...

work page doi:10.1109/scw63240.2024.00124 2025
[31]

Leonardo: A pan-european pre-exascale supercomputer for hpc and ai applications,

M. Turisini, G. Amati, and M. Cestari, “Leonardo: A pan-european pre-exascale supercomputer for hpc and ai applications,” 2023. [Online]. Available: https://arxiv.org/abs/2307.16885

work page arXiv 2023
[32]

CRESCO ENEA HPC clusters: a working example of a multifabric GPFS Spectrum Scale layout,

F. Iannone, F. Ambrosino, G. Bracco, M. De Rosa, A. Funel, G. Guarnieri, S. Migliori, F. Palombi, G. Ponti, G. Santomauro, and P. Procacci, “CRESCO ENEA HPC clusters: a working example of a multifabric GPFS Spectrum Scale layout,” in2019 International Conference on High Performance Computing Simulation (HPCS), 2019, pp. 1051–1052

work page 2019
[33]

LUMI supercomputer,

LUMI Consortium, “LUMI supercomputer,” https://lumi- supercomputer.eu/, 2024, accessed: 2025-01

work page 2024
[34]

Open edge and hpc initiative,

“Open edge and hpc initiative,” https://www.open-edge-hpc- initiative.org/, 2025, accessed: 2025-08-28

work page 2025
[35]

Analysis of the increase and decrease algorithms for congestion avoidance in com- puter networks,

D.-M. Chiu and R. Jain, “Analysis of the increase and decrease algorithms for congestion avoidance in com- puter networks,”Computer Networks and ISDN Systems, vol. 17, no. 1, pp. 1–14, 1989. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0169755289900196

work page arXiv 1989
[36]

Understanding performance variability on the aries dragonfly network,

T. Groves, Y . Gu, and N. J. Wright, “Understanding performance variability on the aries dragonfly network,” in2017 IEEE International Conference on Cluster Computing (CLUSTER), Sept 2017, pp. 809–813

work page 2017
[37]

The impact of network noise at large-scale communication performance,

T. Hoefler, T. Schneider, and A. Lumsdaine, “The impact of network noise at large-scale communication performance,” in2009 IEEE Inter- national Symposium on Parallel Distributed Processing, May 2009, pp. 1–8

work page 2009
[38]

Unimem: runtime data managementon non-volatile memory-based heterogeneous main memory,

S. Chunduri, K. Harms, S. Parker, V . Morozov, S. Oshin, N. Cherukuri, and K. Kumaran, “Run-to-run variability on xeon phi based cray xc systems,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’17. New York, NY , USA: ACM, 2017, pp. 52:1–52:13. [Online]. Available: http://doi.acm.or...

work page doi:10.1145/3126908.3126926 2017
[39]

Understanding the causes of performance variability in hpc workloads,

D. Skinner and W. Kramer, “Understanding the causes of performance variability in hpc workloads,” inIEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005., Oct 2005, pp. 137–149

work page 2005
[40]

Noise in the clouds: Influence of network performance variability on application scalability,

D. De Sensi, T. De Matteis, K. Taranov, S. Di Girolamo, T. Rahn, and T. Hoefler, “Noise in the clouds: Influence of network performance variability on application scalability,”Proc. ACM Meas. Anal. Comput. Syst., vol. 6, no. 3, Dec. 2022

work page 2022
[41]

Canary: Congestion-aware in-network allreduce using dynamic trees,

D. De Sensi, E. Costa Molero, S. Di Girolamo, L. Van- bever, and T. Hoefler, “Canary: Congestion-aware in-network allreduce using dynamic trees,”Future Generation Computer Systems, vol. 152, pp. 70–82, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167739X23003850

work page 2024
[42]

Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks,

B. Prisacari, G. Rodriguez, P. Heidelberger, D. Chen, C. Minkenberg, and T. Hoefler, “Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks,” inProceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, ser. HPDC ’14. New York, NY , USA: ACM, 2014, pp. 129–

work page 2014
[43]

Available: http://doi.acm.org/10.1145/2600212.2600225

[Online]. Available: http://doi.acm.org/10.1145/2600212.2600225

work page doi:10.1145/2600212.2600225
[44]

Evaluation of an interference-free node allocation policy on fat-tree clusters,

S. D. Pollard, N. Jain, S. Herbein, and A. Bhatele, “Evaluation of an interference-free node allocation policy on fat-tree clusters,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, ser. SC ’18. Piscataway, NJ, USA: IEEE Press, 2018, pp. 26:1–26:13. [Online]. Available: http://dl.acm.org/ci...

work page arXiv 2018
[45]

Quantifying network contention on large parallel machines,

A. Bhatele and L. V . Kal ´e, “Quantifying network contention on large parallel machines,”Parallel Processing Letters, vol. 19, no. 04, pp. 553–572, 2009. [Online]. Available: https://doi.org/10.1142/S0129626409000419

work page doi:10.1142/s0129626409000419 2009
[46]

Watch out for the bully! job interference study on dragonfly network,

X. Yang, J. Jenkins, M. Mubarak, R. B. Ross, and Z. Lan, “Watch out for the bully! job interference study on dragonfly network,” inSC ’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Nov 2016, pp. 750–760

work page 2016
[47]

Trade-off study of localizing communication and balancing network traffic on a dragonfly system,

X. Wang, M. Mubarak, X. Yang, R. B. Ross, and Z. Lan, “Trade-off study of localizing communication and balancing network traffic on a dragonfly system,” in2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2018, pp. 1113–1122

work page 2018
[48]

Mitigating network noise on dragonfly networks through application-aware routing,

D. De Sensi, S. Di Girolamo, and T. Hoefler, “Mitigating network noise on dragonfly networks through application-aware routing,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’19. New York, NY , USA: ACM, 2019, pp. 16:1–16:32. [Online]. Available: http://doi.acm.org/10.1145/3295500.3356196

work page doi:10.1145/3295500.3356196 2019
[49]

Mitigating inter-job interference using adaptive flow-aware routing,

S. A. Smith, C. E. Cromey, D. K. Lowenthal, J. Domke, N. Jain, J. J. Thiagarajan, and A. Bhatele, “Mitigating inter-job interference using adaptive flow-aware routing,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’18, 2018

work page 2018
[50]

Network performance counter monitoring and analysis on the cray xc platform

J. M. Brandt, E. Froese, A. C. Gentile, L. Kaplan, B. A. Allan, and E. J. Walsh, “Network performance counter monitoring and analysis on the cray xc platform.” 5 2016

work page 2016
[51]

Analyzing network health and congestion in dragonfly-based supercomputers,

A. Bhatele, N. Jain, Y . Livnat, V . Pascucci, and P. Bremer, “Analyzing network health and congestion in dragonfly-based supercomputers,” in 2016 IEEE International Parallel and Distributed Processing Sympo- sium (IPDPS), May 2016, pp. 93–102

work page 2016
[52]

Overtime: A tool for analyzing performance variation due to network interference,

R. E. Grant, K. T. Pedretti, and A. Gentile, “Overtime: A tool for analyzing performance variation due to network interference,” in Proceedings of the 3rd Workshop on Exascale MPI, ser. ExaMPI ’15. New York, NY , USA: ACM, 2015, pp. 4:1–4:10. [Online]. Available: http://doi.acm.org/10.1145/2831129.2831133

work page doi:10.1145/2831129.2831133 2015
[53]

Exploring gpu-to-gpu communication: Insights into supercomputer interconnects,

D. De Sensi, L. Pichetti, F. Vella, T. De Matteis, Z. Ren, L. Fusco, M. Turisini, D. Cesarini, K. Lust, A. Trivedi, D. Roweth, F. Spiga, S. Di Girolamo, and T. Hoefler, “Exploring gpu-to-gpu communication: Insights into supercomputer interconnects,” inProceedings of the In- ternational Conference for High Performance Computing, Networking, Storage and Ana...

work page 2024

[1] [1]

Ultra ethernet,

U. E. Consortium, “Ultra ethernet,” 2024, https://ultraethernet.org/

work page 2024

[2] [2]

TOP500: Ranking of the World’s 500 Fastest Super- computers,

TOP500 Project, “TOP500: Ranking of the World’s 500 Fastest Super- computers,” https://top500.org/, 2025, accessed July 22, 2025

work page 2025

[3] [3]

Ultra ethernet specification version 1.0,

“Ultra ethernet specification version 1.0,” Ultra Ethernet Consortium, Technical Specification, 2024, available from the Ultra Ethernet Con- sortium

work page 2024

[4] [4]

Congestion Control for Large- Scale RDMA Deployments,

Y . Zhu, H. Eran, D. Firestone, C. Guo, M. Lipshteyn, Y . Liron, J. Padhye, S. Raindel, M. H. Yahia, and M. Zhang, “Congestion Control for Large- Scale RDMA Deployments,” inProceedings of the ACM SIGCOMM 2015 Conference (SIGCOMM ’15), London, United Kingdom, 2015, pp. 523–536

work page 2015

[5] [5]

TIMELY: RTT- based Congestion Control for the Datacenter,

R. Mittal, V . T. Lam, N. Dukkipati, E. Blem, H. Wassel, M. Ghobadi, A. Vahdat, Y . Wang, D. Wetherall, and D. Zats, “TIMELY: RTT- based Congestion Control for the Datacenter,” inProceedings of the ACM SIGCOMM 2015 Conference (SIGCOMM ’15), London, United Kingdom, 2015, pp. 537–550

work page 2015

[6] [6]

HPCC: High Precision Congestion Control,

Y . Li, R. Miao, H. H. Liu, Y . Zhuang, F. Feng, L. Tang, Z. Cao, M. Zhang, F. Kelly, M. Alizadeh, and M. Yu, “HPCC: High Precision Congestion Control,” inProceedings of the ACM SIGCOMM 2019 Conference (SIGCOMM ’19), Beijing, China, 2019, pp. 44–58

work page 2019

[7] [7]

Under submission

D. omitted for double-blind reviewing, “Under submission.”

work page

[8] [8]

Ai ecn threshold of lossless queues,

Huawei Support, “Ai ecn threshold of lossless queues,” https://support.huawei.com/enterprise/en/doc/EDOC1100420118/7ade444e/ai- ecn-threshold-of-lossless-queues, 2024, accessed: 2025-12-20

work page 2024

[9] [9]

Analysis of an Equal-Cost Multi-Path Al- gorithm,

C. Hopps, “Analysis of an Equal-Cost Multi-Path Al- gorithm,” RFC 2992, Nov. 2009. [Online]. Available: https://www.ietf.org/rfc/rfc2992.txt

work page 2009

[10] [10]

Hedera: dynamic flow scheduling for data center networks,

M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat, “Hedera: dynamic flow scheduling for data center networks,” inPro- ceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’10. USA: USENIX Association, 2010, p. 19

work page 2010

[11] [11]

Conga: distributed congestion-aware load balancing for datacenters,

M. Alizadeh, T. Edsall, S. Dharmapurikar, R. Vaidyanathan, K. Chu, A. Fingerhut, V . T. Lam, F. Matus, R. Pan, N. Yadav, and G. Varghese, “Conga: distributed congestion-aware load balancing for datacenters,” inProceedings of the 2014 ACM Conference on SIGCOMM, ser. SIGCOMM ’14. New York, NY , USA: Association for Computing Machinery, 2014, p. 503–514. [On...

work page doi:10.1145/2619239.2626316 2014

[12] [12]

Rao, Bruno Ribeiro, and Mohit Tawar- malani

A. Gangidi, R. Miao, S. Zheng, S. J. Bondu, G. Goes, H. Morsy, R. Puri, M. Riftadi, A. J. Shetty, J. Yang, S. Zhang, M. J. Fernandez, S. Gandham, and H. Zeng, “Rdma over ethernet for distributed training at meta scale,” inProceedings of the ACM SIGCOMM 2024 Conference, ser. ACM SIGCOMM ’24. New York, NY , USA: Association for Computing Machinery, 2024, p....

work page doi:10.1145/3651890.3672233 2024

[13] [13]

Data center ethernet and remote direct memory access: Issues at hyperscale,

T. Hoefler, D. Roweth, K. Underwood, R. Alverson, M. Griswold, V . Tabatabaee, M. Kalkunte, S. Anubolu, S. Shen, M. McLaren, A. Kab- bani, and S. Scott, “Data center ethernet and remote direct memory access: Issues at hyperscale,”Computer, vol. 56, no. 7, pp. 67–77, 2023

work page 2023

[14] [14]

Improving datacenter performance and robustness with multipath tcp,

C. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh, D. Wischik, and M. Handley, “Improving datacenter performance and robustness with multipath tcp,”SIGCOMM Comput. Commun. Rev., vol. 41, no. 4, p. 266–277, aug 2011. [Online]. Available: https://doi.org/10.1145/2043164.2018467

work page doi:10.1145/2043164.2018467 2011

[15] [15]

Plb: congestion signals are simple and effective for network load balancing,

M. A. Qureshi, Y . Cheng, Q. Yin, Q. Fu, G. Kumar, M. Moshref, J. Yan, V . Jacobson, D. Wetherall, and A. Kabbani, “Plb: congestion signals are simple and effective for network load balancing,” inProceedings of the ACM SIGCOMM 2022 Conference, ser. SIGCOMM ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 207–218. [Online]. Available:...

work page doi:10.1145/3544216.3544226 2022

[16] [16]

Flowbender: Flow-level adaptive routing for improved latency and throughput in datacenter networks,

A. Kabbani, B. Vamanan, J. Hasan, and F. Duchene, “Flowbender: Flow-level adaptive routing for improved latency and throughput in datacenter networks,” inProceedings of the 10th ACM International on Conference on Emerging Networking Experiments and Technologies, ser. CoNEXT ’14. New York, NY , USA: Association for Computing Machinery, 2014, p. 149–160. [O...

work page doi:10.1145/2674005.2674985 2014

[17] [17]

Let it flow: Resilient asymmetric load balancing with flowlet switching,

E. Vanini, R. Pan, M. Alizadeh, P. Taheri, and T. Edsall, “Let it flow: Resilient asymmetric load balancing with flowlet switching,” in14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). Boston, MA: USENIX Association, Mar. 2017, pp. 407–420. [On- line]. Available: https://www.usenix.org/conference/nsdi17/technical- sessions/pr...

work page 2017

[18] [18]

Presto: Edge-based load balancing for fast datacenter networks,

K. He, E. Rozner, K. Agarwal, W. Felter, J. Carter, and A. Akella, “Presto: Edge-based load balancing for fast datacenter networks,” in Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, ser. SIGCOMM ’15. New York, NY , USA: Association for Computing Machinery, 2015, p. 465–478. [Online]. Available: https://doi.org/10....

work page doi:10.1145/2785956.2787507 2015

[19] [19]

Flowcut Switching: High-Performance Adaptive Routing With In-Order Delivery Guarantees ,

T. Bonato, D. De Sensi, S. Di Girolamo, A. Bataineh, D. Hewson, D. Roweth, and T. Hoefler, “ Flowcut Switching: High-Performance Adaptive Routing With In-Order Delivery Guarantees ,”IEEE Transac- tions on Networking, no. 01, pp. 1–14, Dec. 2025. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/TON.2025.3636209

work page doi:10.1109/ton.2025.3636209 2025

[20] [20]

Multi-path transport for rdma in datacenters,

Y . Lu, G. Chen, B. Li, K. Tan, Y . Xiong, P. Cheng, J. Zhang, E. Chen, and T. Moscibroda, “Multi-path transport for rdma in datacenters,” in Proceedings of the 15th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’18. USA: USENIX Association, 2018, p. 357–371

work page 2018

[21] [21]

On the impact of packet spraying in data center networks,

A. Dixit, P. Prakash, Y . C. Hu, and R. R. Kompella, “On the impact of packet spraying in data center networks,” in2013 Proceedings IEEE INFOCOM, 2013, pp. 2130–2138

work page 2013

[22] [22]

Network load balancing technologies for intelligent computing centers,

W. Wang, F. Chen, P. Cao, L. Shan, T. Wu, and H. Wen, “Network load balancing technologies for intelligent computing centers,”Communica- tions of Huawei Research, no. Issue 9, pp. 13–22, 2025

work page 2025

[23] [23]

InfiniBand Congestion Control: Mod- elling and Validation,

E. G. Gran and S.-A. Reinemo, “InfiniBand Congestion Control: Mod- elling and Validation,” inOMNeT++ 2011 (Workshop at SIMUTools), Barcelona, Spain, 2011

work page 2011

[24] [24]

A Measure- ment Study of Congestion in an InfiniBand Network,

F. Alali, F. Mizero, M. Veeraraghavan, and J. M. Dennis, “A Measure- ment Study of Congestion in an InfiniBand Network,” in2017 Network Traffic Measurement and Analysis Conference (TMA), 2017, pp. 1–9

work page 2017

[25] [25]

Adaptive routing in infiniband hardware,

J. Rocher-Gonz ´alez, E. G. Gran, S.-A. Reinemo, T. Skeie, J. Escudero- Sahuquillo, P. J. Garc´ıa, and F. J. Q. Flor, “Adaptive routing in infiniband hardware,” in2022 22nd IEEE/ACM International Symposium on Clus- ter, Cloud and Internet Computing (CCGrid), 2022, pp. 463–472

work page 2022

[26] [26]

An in-depth analysis of the slingshot interconnect,

D. De Sensi, S. Di Girolamo, K. H. McMahon, D. Roweth, and T. Hoefler, “An in-depth analysis of the slingshot interconnect,” inSC20: International Conference for High Performance Computing, Networking, Storage and Analysis, 2020, pp. 1–14

work page 2020

[27] [27]

Hpe slingshot launched into network space,

D. Roweth, “Hpe slingshot launched into network space,” Cray User Group (CUG) Proceedings, 2022

work page 2022

[28] [28]

Gpcnet: Designing a benchmark suite for inducing and measuring contention in hpc networks,

S. Chunduri, T. Groves, P. Mendygral, B. Austin, J. Balma, K. Kandalla, K. Kumaran, G. Lockwood, S. Parker, S. Warren, N. Wichmann, and N. J. Wright, “Gpcnet: Designing a benchmark suite for inducing and measuring contention in hpc networks,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC...

work page 2019

[29] [29]

Open MPI: Open Source High Performance Computing,

“Open MPI: Open Source High Performance Computing,” https://www.open-mpi.org/, Associated with Software in the Public Interest, 2025, accessed: 2025-09-02

work page 2025

[30] [30]

, title =

L. Pichetti, D. De Sensi, K. Sivalingam, S. Nassyr, D. Cesarini, M. Turisini, D. Pleiter, A. Artigiani, and F. Vella, “Benchmarking ethernet interconnect for hpc/ai workloads,” inProceedings of the SC ’24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, ser. SC- W ’24. IEEE Press, 2025, p. 869–875. [...

work page doi:10.1109/scw63240.2024.00124 2025

[31] [31]

Leonardo: A pan-european pre-exascale supercomputer for hpc and ai applications,

M. Turisini, G. Amati, and M. Cestari, “Leonardo: A pan-european pre-exascale supercomputer for hpc and ai applications,” 2023. [Online]. Available: https://arxiv.org/abs/2307.16885

work page arXiv 2023

[32] [32]

CRESCO ENEA HPC clusters: a working example of a multifabric GPFS Spectrum Scale layout,

F. Iannone, F. Ambrosino, G. Bracco, M. De Rosa, A. Funel, G. Guarnieri, S. Migliori, F. Palombi, G. Ponti, G. Santomauro, and P. Procacci, “CRESCO ENEA HPC clusters: a working example of a multifabric GPFS Spectrum Scale layout,” in2019 International Conference on High Performance Computing Simulation (HPCS), 2019, pp. 1051–1052

work page 2019

[33] [33]

LUMI supercomputer,

LUMI Consortium, “LUMI supercomputer,” https://lumi- supercomputer.eu/, 2024, accessed: 2025-01

work page 2024

[34] [34]

Open edge and hpc initiative,

“Open edge and hpc initiative,” https://www.open-edge-hpc- initiative.org/, 2025, accessed: 2025-08-28

work page 2025

[35] [35]

Analysis of the increase and decrease algorithms for congestion avoidance in com- puter networks,

D.-M. Chiu and R. Jain, “Analysis of the increase and decrease algorithms for congestion avoidance in com- puter networks,”Computer Networks and ISDN Systems, vol. 17, no. 1, pp. 1–14, 1989. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0169755289900196

work page arXiv 1989

[36] [36]

Understanding performance variability on the aries dragonfly network,

T. Groves, Y . Gu, and N. J. Wright, “Understanding performance variability on the aries dragonfly network,” in2017 IEEE International Conference on Cluster Computing (CLUSTER), Sept 2017, pp. 809–813

work page 2017

[37] [37]

The impact of network noise at large-scale communication performance,

T. Hoefler, T. Schneider, and A. Lumsdaine, “The impact of network noise at large-scale communication performance,” in2009 IEEE Inter- national Symposium on Parallel Distributed Processing, May 2009, pp. 1–8

work page 2009

[38] [38]

Unimem: runtime data managementon non-volatile memory-based heterogeneous main memory,

S. Chunduri, K. Harms, S. Parker, V . Morozov, S. Oshin, N. Cherukuri, and K. Kumaran, “Run-to-run variability on xeon phi based cray xc systems,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’17. New York, NY , USA: ACM, 2017, pp. 52:1–52:13. [Online]. Available: http://doi.acm.or...

work page doi:10.1145/3126908.3126926 2017

[39] [39]

Understanding the causes of performance variability in hpc workloads,

D. Skinner and W. Kramer, “Understanding the causes of performance variability in hpc workloads,” inIEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005., Oct 2005, pp. 137–149

work page 2005

[40] [40]

Noise in the clouds: Influence of network performance variability on application scalability,

D. De Sensi, T. De Matteis, K. Taranov, S. Di Girolamo, T. Rahn, and T. Hoefler, “Noise in the clouds: Influence of network performance variability on application scalability,”Proc. ACM Meas. Anal. Comput. Syst., vol. 6, no. 3, Dec. 2022

work page 2022

[41] [41]

Canary: Congestion-aware in-network allreduce using dynamic trees,

D. De Sensi, E. Costa Molero, S. Di Girolamo, L. Van- bever, and T. Hoefler, “Canary: Congestion-aware in-network allreduce using dynamic trees,”Future Generation Computer Systems, vol. 152, pp. 70–82, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167739X23003850

work page 2024

[42] [42]

Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks,

B. Prisacari, G. Rodriguez, P. Heidelberger, D. Chen, C. Minkenberg, and T. Hoefler, “Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks,” inProceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, ser. HPDC ’14. New York, NY , USA: ACM, 2014, pp. 129–

work page 2014

[43] [43]

Available: http://doi.acm.org/10.1145/2600212.2600225

[Online]. Available: http://doi.acm.org/10.1145/2600212.2600225

work page doi:10.1145/2600212.2600225

[44] [44]

Evaluation of an interference-free node allocation policy on fat-tree clusters,

S. D. Pollard, N. Jain, S. Herbein, and A. Bhatele, “Evaluation of an interference-free node allocation policy on fat-tree clusters,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, ser. SC ’18. Piscataway, NJ, USA: IEEE Press, 2018, pp. 26:1–26:13. [Online]. Available: http://dl.acm.org/ci...

work page arXiv 2018

[45] [45]

Quantifying network contention on large parallel machines,

A. Bhatele and L. V . Kal ´e, “Quantifying network contention on large parallel machines,”Parallel Processing Letters, vol. 19, no. 04, pp. 553–572, 2009. [Online]. Available: https://doi.org/10.1142/S0129626409000419

work page doi:10.1142/s0129626409000419 2009

[46] [46]

Watch out for the bully! job interference study on dragonfly network,

X. Yang, J. Jenkins, M. Mubarak, R. B. Ross, and Z. Lan, “Watch out for the bully! job interference study on dragonfly network,” inSC ’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Nov 2016, pp. 750–760

work page 2016

[47] [47]

Trade-off study of localizing communication and balancing network traffic on a dragonfly system,

X. Wang, M. Mubarak, X. Yang, R. B. Ross, and Z. Lan, “Trade-off study of localizing communication and balancing network traffic on a dragonfly system,” in2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2018, pp. 1113–1122

work page 2018

[48] [48]

Mitigating network noise on dragonfly networks through application-aware routing,

D. De Sensi, S. Di Girolamo, and T. Hoefler, “Mitigating network noise on dragonfly networks through application-aware routing,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’19. New York, NY , USA: ACM, 2019, pp. 16:1–16:32. [Online]. Available: http://doi.acm.org/10.1145/3295500.3356196

work page doi:10.1145/3295500.3356196 2019

[49] [49]

Mitigating inter-job interference using adaptive flow-aware routing,

S. A. Smith, C. E. Cromey, D. K. Lowenthal, J. Domke, N. Jain, J. J. Thiagarajan, and A. Bhatele, “Mitigating inter-job interference using adaptive flow-aware routing,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’18, 2018

work page 2018

[50] [50]

Network performance counter monitoring and analysis on the cray xc platform

J. M. Brandt, E. Froese, A. C. Gentile, L. Kaplan, B. A. Allan, and E. J. Walsh, “Network performance counter monitoring and analysis on the cray xc platform.” 5 2016

work page 2016

[51] [51]

Analyzing network health and congestion in dragonfly-based supercomputers,

A. Bhatele, N. Jain, Y . Livnat, V . Pascucci, and P. Bremer, “Analyzing network health and congestion in dragonfly-based supercomputers,” in 2016 IEEE International Parallel and Distributed Processing Sympo- sium (IPDPS), May 2016, pp. 93–102

work page 2016

[52] [52]

Overtime: A tool for analyzing performance variation due to network interference,

R. E. Grant, K. T. Pedretti, and A. Gentile, “Overtime: A tool for analyzing performance variation due to network interference,” in Proceedings of the 3rd Workshop on Exascale MPI, ser. ExaMPI ’15. New York, NY , USA: ACM, 2015, pp. 4:1–4:10. [Online]. Available: http://doi.acm.org/10.1145/2831129.2831133

work page doi:10.1145/2831129.2831133 2015

[53] [53]

Exploring gpu-to-gpu communication: Insights into supercomputer interconnects,

D. De Sensi, L. Pichetti, F. Vella, T. De Matteis, Z. Ren, L. Fusco, M. Turisini, D. Cesarini, K. Lust, A. Trivedi, D. Roweth, F. Spiga, S. Di Girolamo, and T. Hoefler, “Exploring gpu-to-gpu communication: Insights into supercomputer interconnects,” inProceedings of the In- ternational Conference for High Performance Computing, Networking, Storage and Ana...

work page 2024