Characterizing the Impact of Congestion in Modern HPC Interconnects
Pith reviewed 2026-05-10 15:50 UTC · model grok-4.3
The pith
Modern HPC interconnect fabrics exhibit distinct scale-dependent responses to both steady and bursty congestion patterns typical of AI workloads.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Across EDR, HDR, and NDR InfiniBand, Cray Slingshot, and emerging Ethernet fabrics, congestion behavior is not uniform: each fabric shows its own sensitivity to burst duration, intensity, and pause intervals, and these sensitivities become more pronounced at larger system scales, directly influencing the completion time of collective operations.
What carries the argument
Controlled injection of steady congestion and parameterized bursty traffic patterns (varying duration, intensity, and pause length) applied at multiple system sizes to measure fabric-specific responses in collective communication performance.
If this is right
- Collective performance models must incorporate fabric-specific and scale-dependent congestion terms rather than assuming uniform behavior.
- Congestion-control algorithms should be tuned differently for short intense bursts versus long sustained loads on each fabric type.
- Load-balancing strategies for mixed workloads can be refined by using the observed pause-length and intensity thresholds that trigger performance drops.
- Ethernet-based designs aligned with emerging standards display congestion traits that can be compared directly against proprietary fabrics for future procurement decisions.
Where Pith is reading between the lines
- If production AI workloads are dominated by short, high-intensity bursts separated by pauses, then fabrics that recover quickly from such bursts will deliver measurably shorter training times at scale.
- The scale-dependent effects suggest that network architects may need to provision more adaptive routing or buffering as system sizes exceed current test configurations.
- Direct comparison of the five fabrics under identical burst parameters provides a baseline that future congestion-mitigation proposals can be measured against.
Load-bearing premise
The chosen steady and bursty traffic patterns accurately reproduce the congestion that real production AI training and simulation workloads create on large systems.
What would settle it
Running the same collective operations inside an actual large-scale AI training job on one of the tested fabrics and comparing the measured congestion durations, intensities, and resulting slowdowns against the values recorded in the controlled experiments.
Figures
read the original abstract
High-performance computing (HPC) systems increasingly support both scalable AI training and large-scale simulation workloads. Both typically rely heavily on collective communication operations. On modern supercomputers, however, network congestion has emerged as a major limitation, driven by heterogeneous traffic patterns resulting from diverse workload mixes. As system scale and active users continue to grow, understanding how today's interconnect technologies respond to congestion is essential for establishing realistic performance expectations and informing future system design. This paper presents a comprehensive characterization of congestion behavior across four major HPC fabrics: EDR InfiniBand, HDR InfiniBand, NDR InfiniBand, Cray Slingshot, and emerging Ethernet fabrics. These fabrics span high-performance proprietary interconnects as well as adaptive Ethernet-based designs aligned with emerging standards such as Ultra Ethernet. We evaluate their responses to both steady congestion and a wide range of bursty patterns that vary in duration, intensity, and pause length, capturing the bursty communication typical of AI workloads. Our study covers multiple scales, examining how congestion manifests differently as system size increases and identifying scale-dependent behaviors that influence collective performance. By analyzing the challenges that arise under these controlled stress conditions, we aim to provide a practical overview of congestion issues and possible optimizations. The insights derived from this evaluation can guide researchers and HPC architects in designing more effective congestion-control mechanisms and network load-balancing strategies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a comprehensive empirical characterization of congestion behavior across EDR InfiniBand, HDR InfiniBand, NDR InfiniBand, Cray Slingshot, and emerging Ethernet fabrics. It evaluates responses to steady congestion as well as bursty patterns that vary in duration, intensity, and pause length, at multiple system scales, with the aim of capturing traffic typical of AI workloads and informing congestion-control mechanisms and load-balancing strategies.
Significance. If the experimental design and results hold, the work would be significant for providing a broad, multi-fabric comparison of how modern HPC interconnects handle both steady and bursty congestion. The multi-scale analysis and focus on patterns relevant to AI training could offer practical guidance for system design and optimization. The empirical approach using real hardware across proprietary and standards-aligned fabrics is a clear strength.
major comments (2)
- Abstract and experimental methodology section: The abstract describes a broad campaign but provides no details on measurement methodology, error bars, statistical significance, or the process for selecting burst parameters; without these, it is impossible to verify whether the reported congestion behaviors are reliable or reproducible.
- Abstract: The claim that the controlled bursty patterns 'capture the bursty communication typical of AI workloads' rests on an unverified assumption that independent variation of duration, intensity, and pause length reproduces the correlated, phase-locked all-reduce traffic (identical message sizes, simultaneous incast from thousands of ranks) found in production AI training; this mismatch risks altering queue buildup, credit starvation, and adaptive routing in ways not captured by the steady/bursty matrix.
minor comments (3)
- Results sections: Specify the exact quantitative metrics (e.g., latency increase, throughput degradation, packet loss) used to characterize congestion impact for each fabric and pattern.
- Figure captions: Include details on the specific burst parameter values plotted and any normalization applied to enable direct comparison across scales and fabrics.
- Introduction: Expand the discussion of how the selected fabrics align with emerging Ultra Ethernet standards to strengthen the forward-looking claims.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback, which has helped us strengthen the clarity and rigor of the manuscript. We address each major comment below and have made revisions to incorporate the suggestions where appropriate.
read point-by-point responses
-
Referee: Abstract and experimental methodology section: The abstract describes a broad campaign but provides no details on measurement methodology, error bars, statistical significance, or the process for selecting burst parameters; without these, it is impossible to verify whether the reported congestion behaviors are reliable or reproducible.
Authors: We agree that the abstract and methodology section would benefit from greater specificity to support reproducibility. We have revised the Experimental Methodology section to explicitly describe the measurement approach (high-resolution hardware counters and software timers synchronized across nodes), the computation of error bars (standard deviation across 10 independent runs per configuration), the statistical tests applied (two-tailed t-tests with p < 0.05 threshold for significance), and the burst-parameter selection process (derived from analysis of publicly available AI training traces to span realistic ranges of duration, intensity, and inter-burst pause). A single sentence summarizing these elements has been added to the abstract. revision: yes
-
Referee: Abstract: The claim that the controlled bursty patterns 'capture the bursty communication typical of AI workloads' rests on an unverified assumption that independent variation of duration, intensity, and pause length reproduces the correlated, phase-locked all-reduce traffic (identical message sizes, simultaneous incast from thousands of ranks) found in production AI training; this mismatch risks altering queue buildup, credit starvation, and adaptive routing in ways not captured by the steady/bursty matrix.
Authors: The referee is correct that independently varying burst parameters does not reproduce the tightly synchronized, phase-locked all-reduce traffic characteristic of production AI training. Our controlled design was chosen to isolate the individual contributions of duration, intensity, and pause length to congestion phenomena, thereby providing interpretable data that can inform both analytical models and more complex workload studies. We have revised the abstract to state that the patterns 'span a range of bursty behaviors relevant to AI workloads' and have added a dedicated paragraph in the Discussion section that acknowledges the limitation, discusses potential differences in queue dynamics and routing behavior, and outlines how the results remain useful for congestion-control design. revision: partial
Circularity Check
No circularity: purely empirical characterization without derivations or fitted predictions
full rationale
The paper conducts direct hardware measurements of congestion responses on multiple interconnect fabrics (EDR/HDR/NDR InfiniBand, Slingshot, Ethernet) under controlled steady and bursty traffic patterns. No equations, models, parameter fits, or first-principles derivations are described; results are reported from experimental runs on external systems. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claims rest on observed behavior rather than any reduction to inputs by construction, satisfying the criteria for a self-contained empirical study.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The selected burst durations, intensities, and pause lengths capture the essential communication behavior of AI training and simulation collectives.
Reference graph
Works this paper leans on
- [1]
-
[2]
TOP500: Ranking of the World’s 500 Fastest Super- computers,
TOP500 Project, “TOP500: Ranking of the World’s 500 Fastest Super- computers,” https://top500.org/, 2025, accessed July 22, 2025
work page 2025
-
[3]
Ultra ethernet specification version 1.0,
“Ultra ethernet specification version 1.0,” Ultra Ethernet Consortium, Technical Specification, 2024, available from the Ultra Ethernet Con- sortium
work page 2024
-
[4]
Congestion Control for Large- Scale RDMA Deployments,
Y . Zhu, H. Eran, D. Firestone, C. Guo, M. Lipshteyn, Y . Liron, J. Padhye, S. Raindel, M. H. Yahia, and M. Zhang, “Congestion Control for Large- Scale RDMA Deployments,” inProceedings of the ACM SIGCOMM 2015 Conference (SIGCOMM ’15), London, United Kingdom, 2015, pp. 523–536
work page 2015
-
[5]
TIMELY: RTT- based Congestion Control for the Datacenter,
R. Mittal, V . T. Lam, N. Dukkipati, E. Blem, H. Wassel, M. Ghobadi, A. Vahdat, Y . Wang, D. Wetherall, and D. Zats, “TIMELY: RTT- based Congestion Control for the Datacenter,” inProceedings of the ACM SIGCOMM 2015 Conference (SIGCOMM ’15), London, United Kingdom, 2015, pp. 537–550
work page 2015
-
[6]
HPCC: High Precision Congestion Control,
Y . Li, R. Miao, H. H. Liu, Y . Zhuang, F. Feng, L. Tang, Z. Cao, M. Zhang, F. Kelly, M. Alizadeh, and M. Yu, “HPCC: High Precision Congestion Control,” inProceedings of the ACM SIGCOMM 2019 Conference (SIGCOMM ’19), Beijing, China, 2019, pp. 44–58
work page 2019
- [7]
-
[8]
Ai ecn threshold of lossless queues,
Huawei Support, “Ai ecn threshold of lossless queues,” https://support.huawei.com/enterprise/en/doc/EDOC1100420118/7ade444e/ai- ecn-threshold-of-lossless-queues, 2024, accessed: 2025-12-20
work page 2024
-
[9]
Analysis of an Equal-Cost Multi-Path Al- gorithm,
C. Hopps, “Analysis of an Equal-Cost Multi-Path Al- gorithm,” RFC 2992, Nov. 2009. [Online]. Available: https://www.ietf.org/rfc/rfc2992.txt
work page 2009
-
[10]
Hedera: dynamic flow scheduling for data center networks,
M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat, “Hedera: dynamic flow scheduling for data center networks,” inPro- ceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’10. USA: USENIX Association, 2010, p. 19
work page 2010
-
[11]
Conga: distributed congestion-aware load balancing for datacenters,
M. Alizadeh, T. Edsall, S. Dharmapurikar, R. Vaidyanathan, K. Chu, A. Fingerhut, V . T. Lam, F. Matus, R. Pan, N. Yadav, and G. Varghese, “Conga: distributed congestion-aware load balancing for datacenters,” inProceedings of the 2014 ACM Conference on SIGCOMM, ser. SIGCOMM ’14. New York, NY , USA: Association for Computing Machinery, 2014, p. 503–514. [On...
-
[12]
Rao, Bruno Ribeiro, and Mohit Tawar- malani
A. Gangidi, R. Miao, S. Zheng, S. J. Bondu, G. Goes, H. Morsy, R. Puri, M. Riftadi, A. J. Shetty, J. Yang, S. Zhang, M. J. Fernandez, S. Gandham, and H. Zeng, “Rdma over ethernet for distributed training at meta scale,” inProceedings of the ACM SIGCOMM 2024 Conference, ser. ACM SIGCOMM ’24. New York, NY , USA: Association for Computing Machinery, 2024, p....
-
[13]
Data center ethernet and remote direct memory access: Issues at hyperscale,
T. Hoefler, D. Roweth, K. Underwood, R. Alverson, M. Griswold, V . Tabatabaee, M. Kalkunte, S. Anubolu, S. Shen, M. McLaren, A. Kab- bani, and S. Scott, “Data center ethernet and remote direct memory access: Issues at hyperscale,”Computer, vol. 56, no. 7, pp. 67–77, 2023
work page 2023
-
[14]
Improving datacenter performance and robustness with multipath tcp,
C. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh, D. Wischik, and M. Handley, “Improving datacenter performance and robustness with multipath tcp,”SIGCOMM Comput. Commun. Rev., vol. 41, no. 4, p. 266–277, aug 2011. [Online]. Available: https://doi.org/10.1145/2043164.2018467
-
[15]
Plb: congestion signals are simple and effective for network load balancing,
M. A. Qureshi, Y . Cheng, Q. Yin, Q. Fu, G. Kumar, M. Moshref, J. Yan, V . Jacobson, D. Wetherall, and A. Kabbani, “Plb: congestion signals are simple and effective for network load balancing,” inProceedings of the ACM SIGCOMM 2022 Conference, ser. SIGCOMM ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 207–218. [Online]. Available:...
-
[16]
Flowbender: Flow-level adaptive routing for improved latency and throughput in datacenter networks,
A. Kabbani, B. Vamanan, J. Hasan, and F. Duchene, “Flowbender: Flow-level adaptive routing for improved latency and throughput in datacenter networks,” inProceedings of the 10th ACM International on Conference on Emerging Networking Experiments and Technologies, ser. CoNEXT ’14. New York, NY , USA: Association for Computing Machinery, 2014, p. 149–160. [O...
-
[17]
Let it flow: Resilient asymmetric load balancing with flowlet switching,
E. Vanini, R. Pan, M. Alizadeh, P. Taheri, and T. Edsall, “Let it flow: Resilient asymmetric load balancing with flowlet switching,” in14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). Boston, MA: USENIX Association, Mar. 2017, pp. 407–420. [On- line]. Available: https://www.usenix.org/conference/nsdi17/technical- sessions/pr...
work page 2017
-
[18]
Presto: Edge-based load balancing for fast datacenter networks,
K. He, E. Rozner, K. Agarwal, W. Felter, J. Carter, and A. Akella, “Presto: Edge-based load balancing for fast datacenter networks,” in Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, ser. SIGCOMM ’15. New York, NY , USA: Association for Computing Machinery, 2015, p. 465–478. [Online]. Available: https://doi.org/10....
-
[19]
Flowcut Switching: High-Performance Adaptive Routing With In-Order Delivery Guarantees ,
T. Bonato, D. De Sensi, S. Di Girolamo, A. Bataineh, D. Hewson, D. Roweth, and T. Hoefler, “ Flowcut Switching: High-Performance Adaptive Routing With In-Order Delivery Guarantees ,”IEEE Transac- tions on Networking, no. 01, pp. 1–14, Dec. 2025. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/TON.2025.3636209
-
[20]
Multi-path transport for rdma in datacenters,
Y . Lu, G. Chen, B. Li, K. Tan, Y . Xiong, P. Cheng, J. Zhang, E. Chen, and T. Moscibroda, “Multi-path transport for rdma in datacenters,” in Proceedings of the 15th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’18. USA: USENIX Association, 2018, p. 357–371
work page 2018
-
[21]
On the impact of packet spraying in data center networks,
A. Dixit, P. Prakash, Y . C. Hu, and R. R. Kompella, “On the impact of packet spraying in data center networks,” in2013 Proceedings IEEE INFOCOM, 2013, pp. 2130–2138
work page 2013
-
[22]
Network load balancing technologies for intelligent computing centers,
W. Wang, F. Chen, P. Cao, L. Shan, T. Wu, and H. Wen, “Network load balancing technologies for intelligent computing centers,”Communica- tions of Huawei Research, no. Issue 9, pp. 13–22, 2025
work page 2025
-
[23]
InfiniBand Congestion Control: Mod- elling and Validation,
E. G. Gran and S.-A. Reinemo, “InfiniBand Congestion Control: Mod- elling and Validation,” inOMNeT++ 2011 (Workshop at SIMUTools), Barcelona, Spain, 2011
work page 2011
-
[24]
A Measure- ment Study of Congestion in an InfiniBand Network,
F. Alali, F. Mizero, M. Veeraraghavan, and J. M. Dennis, “A Measure- ment Study of Congestion in an InfiniBand Network,” in2017 Network Traffic Measurement and Analysis Conference (TMA), 2017, pp. 1–9
work page 2017
-
[25]
Adaptive routing in infiniband hardware,
J. Rocher-Gonz ´alez, E. G. Gran, S.-A. Reinemo, T. Skeie, J. Escudero- Sahuquillo, P. J. Garc´ıa, and F. J. Q. Flor, “Adaptive routing in infiniband hardware,” in2022 22nd IEEE/ACM International Symposium on Clus- ter, Cloud and Internet Computing (CCGrid), 2022, pp. 463–472
work page 2022
-
[26]
An in-depth analysis of the slingshot interconnect,
D. De Sensi, S. Di Girolamo, K. H. McMahon, D. Roweth, and T. Hoefler, “An in-depth analysis of the slingshot interconnect,” inSC20: International Conference for High Performance Computing, Networking, Storage and Analysis, 2020, pp. 1–14
work page 2020
-
[27]
Hpe slingshot launched into network space,
D. Roweth, “Hpe slingshot launched into network space,” Cray User Group (CUG) Proceedings, 2022
work page 2022
-
[28]
Gpcnet: Designing a benchmark suite for inducing and measuring contention in hpc networks,
S. Chunduri, T. Groves, P. Mendygral, B. Austin, J. Balma, K. Kandalla, K. Kumaran, G. Lockwood, S. Parker, S. Warren, N. Wichmann, and N. J. Wright, “Gpcnet: Designing a benchmark suite for inducing and measuring contention in hpc networks,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC...
work page 2019
-
[29]
Open MPI: Open Source High Performance Computing,
“Open MPI: Open Source High Performance Computing,” https://www.open-mpi.org/, Associated with Software in the Public Interest, 2025, accessed: 2025-09-02
work page 2025
-
[30]
L. Pichetti, D. De Sensi, K. Sivalingam, S. Nassyr, D. Cesarini, M. Turisini, D. Pleiter, A. Artigiani, and F. Vella, “Benchmarking ethernet interconnect for hpc/ai workloads,” inProceedings of the SC ’24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, ser. SC- W ’24. IEEE Press, 2025, p. 869–875. [...
-
[31]
Leonardo: A pan-european pre-exascale supercomputer for hpc and ai applications,
M. Turisini, G. Amati, and M. Cestari, “Leonardo: A pan-european pre-exascale supercomputer for hpc and ai applications,” 2023. [Online]. Available: https://arxiv.org/abs/2307.16885
-
[32]
CRESCO ENEA HPC clusters: a working example of a multifabric GPFS Spectrum Scale layout,
F. Iannone, F. Ambrosino, G. Bracco, M. De Rosa, A. Funel, G. Guarnieri, S. Migliori, F. Palombi, G. Ponti, G. Santomauro, and P. Procacci, “CRESCO ENEA HPC clusters: a working example of a multifabric GPFS Spectrum Scale layout,” in2019 International Conference on High Performance Computing Simulation (HPCS), 2019, pp. 1051–1052
work page 2019
-
[33]
LUMI Consortium, “LUMI supercomputer,” https://lumi- supercomputer.eu/, 2024, accessed: 2025-01
work page 2024
-
[34]
“Open edge and hpc initiative,” https://www.open-edge-hpc- initiative.org/, 2025, accessed: 2025-08-28
work page 2025
-
[35]
Analysis of the increase and decrease algorithms for congestion avoidance in com- puter networks,
D.-M. Chiu and R. Jain, “Analysis of the increase and decrease algorithms for congestion avoidance in com- puter networks,”Computer Networks and ISDN Systems, vol. 17, no. 1, pp. 1–14, 1989. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0169755289900196
-
[36]
Understanding performance variability on the aries dragonfly network,
T. Groves, Y . Gu, and N. J. Wright, “Understanding performance variability on the aries dragonfly network,” in2017 IEEE International Conference on Cluster Computing (CLUSTER), Sept 2017, pp. 809–813
work page 2017
-
[37]
The impact of network noise at large-scale communication performance,
T. Hoefler, T. Schneider, and A. Lumsdaine, “The impact of network noise at large-scale communication performance,” in2009 IEEE Inter- national Symposium on Parallel Distributed Processing, May 2009, pp. 1–8
work page 2009
-
[38]
Unimem: runtime data managementon non-volatile memory-based heterogeneous main memory,
S. Chunduri, K. Harms, S. Parker, V . Morozov, S. Oshin, N. Cherukuri, and K. Kumaran, “Run-to-run variability on xeon phi based cray xc systems,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’17. New York, NY , USA: ACM, 2017, pp. 52:1–52:13. [Online]. Available: http://doi.acm.or...
-
[39]
Understanding the causes of performance variability in hpc workloads,
D. Skinner and W. Kramer, “Understanding the causes of performance variability in hpc workloads,” inIEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005., Oct 2005, pp. 137–149
work page 2005
-
[40]
Noise in the clouds: Influence of network performance variability on application scalability,
D. De Sensi, T. De Matteis, K. Taranov, S. Di Girolamo, T. Rahn, and T. Hoefler, “Noise in the clouds: Influence of network performance variability on application scalability,”Proc. ACM Meas. Anal. Comput. Syst., vol. 6, no. 3, Dec. 2022
work page 2022
-
[41]
Canary: Congestion-aware in-network allreduce using dynamic trees,
D. De Sensi, E. Costa Molero, S. Di Girolamo, L. Van- bever, and T. Hoefler, “Canary: Congestion-aware in-network allreduce using dynamic trees,”Future Generation Computer Systems, vol. 152, pp. 70–82, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167739X23003850
work page 2024
-
[42]
Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks,
B. Prisacari, G. Rodriguez, P. Heidelberger, D. Chen, C. Minkenberg, and T. Hoefler, “Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks,” inProceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, ser. HPDC ’14. New York, NY , USA: ACM, 2014, pp. 129–
work page 2014
-
[43]
Available: http://doi.acm.org/10.1145/2600212.2600225
[Online]. Available: http://doi.acm.org/10.1145/2600212.2600225
-
[44]
Evaluation of an interference-free node allocation policy on fat-tree clusters,
S. D. Pollard, N. Jain, S. Herbein, and A. Bhatele, “Evaluation of an interference-free node allocation policy on fat-tree clusters,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, ser. SC ’18. Piscataway, NJ, USA: IEEE Press, 2018, pp. 26:1–26:13. [Online]. Available: http://dl.acm.org/ci...
-
[45]
Quantifying network contention on large parallel machines,
A. Bhatele and L. V . Kal ´e, “Quantifying network contention on large parallel machines,”Parallel Processing Letters, vol. 19, no. 04, pp. 553–572, 2009. [Online]. Available: https://doi.org/10.1142/S0129626409000419
-
[46]
Watch out for the bully! job interference study on dragonfly network,
X. Yang, J. Jenkins, M. Mubarak, R. B. Ross, and Z. Lan, “Watch out for the bully! job interference study on dragonfly network,” inSC ’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Nov 2016, pp. 750–760
work page 2016
-
[47]
Trade-off study of localizing communication and balancing network traffic on a dragonfly system,
X. Wang, M. Mubarak, X. Yang, R. B. Ross, and Z. Lan, “Trade-off study of localizing communication and balancing network traffic on a dragonfly system,” in2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2018, pp. 1113–1122
work page 2018
-
[48]
Mitigating network noise on dragonfly networks through application-aware routing,
D. De Sensi, S. Di Girolamo, and T. Hoefler, “Mitigating network noise on dragonfly networks through application-aware routing,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’19. New York, NY , USA: ACM, 2019, pp. 16:1–16:32. [Online]. Available: http://doi.acm.org/10.1145/3295500.3356196
-
[49]
Mitigating inter-job interference using adaptive flow-aware routing,
S. A. Smith, C. E. Cromey, D. K. Lowenthal, J. Domke, N. Jain, J. J. Thiagarajan, and A. Bhatele, “Mitigating inter-job interference using adaptive flow-aware routing,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’18, 2018
work page 2018
-
[50]
Network performance counter monitoring and analysis on the cray xc platform
J. M. Brandt, E. Froese, A. C. Gentile, L. Kaplan, B. A. Allan, and E. J. Walsh, “Network performance counter monitoring and analysis on the cray xc platform.” 5 2016
work page 2016
-
[51]
Analyzing network health and congestion in dragonfly-based supercomputers,
A. Bhatele, N. Jain, Y . Livnat, V . Pascucci, and P. Bremer, “Analyzing network health and congestion in dragonfly-based supercomputers,” in 2016 IEEE International Parallel and Distributed Processing Sympo- sium (IPDPS), May 2016, pp. 93–102
work page 2016
-
[52]
Overtime: A tool for analyzing performance variation due to network interference,
R. E. Grant, K. T. Pedretti, and A. Gentile, “Overtime: A tool for analyzing performance variation due to network interference,” in Proceedings of the 3rd Workshop on Exascale MPI, ser. ExaMPI ’15. New York, NY , USA: ACM, 2015, pp. 4:1–4:10. [Online]. Available: http://doi.acm.org/10.1145/2831129.2831133
-
[53]
Exploring gpu-to-gpu communication: Insights into supercomputer interconnects,
D. De Sensi, L. Pichetti, F. Vella, T. De Matteis, Z. Ren, L. Fusco, M. Turisini, D. Cesarini, K. Lust, A. Trivedi, D. Roweth, F. Spiga, S. Di Girolamo, and T. Hoefler, “Exploring gpu-to-gpu communication: Insights into supercomputer interconnects,” inProceedings of the In- ternational Conference for High Performance Computing, Networking, Storage and Ana...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.