A Study of Network Congestion in Two Supercomputing High-Speed Interconnects

Ann Gentile; Archit Patke; Eric Roman; Jim Brandt; Mike Showerman; Ravishankar K. Iyer; Saurabh Jha; William T. Kramer; Zbigniew T. Kalbarczyk

arxiv: 1907.05312 · v1 · pith:CLTIU6NJnew · submitted 2019-07-11 · 💻 cs.DC · cs.NI

A Study of Network Congestion in Two Supercomputing High-Speed Interconnects

Saurabh Jha , Archit Patke , Jim Brandt , Ann Gentile , Mike Showerman , Eric Roman , Zbigniew T. Kalbarczyk , William T. Kramer

show 1 more author

Ravishankar K. Iyer

This is my paper

Pith reviewed 2026-05-24 22:51 UTC · model grok-4.3

classification 💻 cs.DC cs.NI

keywords network congestionhigh-speed interconnectssupercomputingpetascale systemsCray GeminiCray Ariestorus topologyDragonFly topology

0 comments

The pith

This paper provides an end-to-end framework for long-term monitoring of network congestion and uses it to study real conditions in two different petascale interconnects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Studies of network congestion have used proxy applications and benchmarks that may not reflect real conditions in high-speed interconnects. This paper introduces an end-to-end framework for monitoring and analyzing congestion over long periods in actual field settings. It demonstrates the framework through an empirical study on two petascale supercomputers, one using Cray Gemini with 3-D torus and the other Cray Aries with DragonFly topology. The work aims to provide data that better represents how congestion affects performance in production use. If correct, this shifts the basis for developing congestion control from artificial benchmarks to observed behavior.

Core claim

The paper establishes an end-to-end framework for monitoring and analysis to support long-term field-congestion characterization studies and applies it to an empirical study of network congestion in petascale systems across Cray Gemini 3-D torus and Cray Aries DragonFly interconnect technologies.

What carries the argument

End-to-end framework for monitoring and analysis of network congestion in high-speed interconnects.

If this is right

Congestion control at the network level can be informed by real field data.
Application placement, mapping, and scheduling at the system level can use actual congestion characteristics.
Long-term studies of congestion become possible with the provided framework.
Comparisons between different topologies like torus and dragonfly can be made based on production workloads.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework might enable similar studies on other interconnect technologies beyond the two examined.
Real congestion data could lead to revised models of performance variation in supercomputing applications.
Future interconnect designs could incorporate lessons from the observed patterns in these systems.

Load-bearing premise

Proxy applications and benchmarks are not representative of the congestion characteristics observed in actual high-speed interconnects during field use.

What would settle it

If measurements using the framework show that congestion patterns match those from proxy applications and benchmarks, the motivation for the new approach would be undermined.

Figures

Figures reproduced from arXiv: 1907.05312 by Ann Gentile, Archit Patke, Eric Roman, Jim Brandt, Mike Showerman, Ravishankar K. Iyer, Saurabh Jha, William T. Kramer, Zbigniew T. Kalbarczyk.

**Figure 1.** Figure 1: Congested link durations vs. PTS threshold for Blue Waters (Gemini) and Edison (Aries) 0 100 200 300 400 500 600 700 0 5 10 15 20 25 30 35 40 45 50 Duration (minutes) PTS Threshold (%) median 99%ile 99.9%ile (a) X+ and X0 100 200 300 400 500 600 700 0 5 10 15 20 25 30 35 40 45 50 Duration (minutes) PTS Threshold (%) median 99%ile 99.9%ile (b) Y+ and Y0 200 400 600 800 1000 1200 0 5 10 15 20 25 30 35 40 4… view at source ↗

**Figure 2.** Figure 2 [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Congested link durations for different link types in Aries the 99.9th percentile duration is approximately 1 minute for Edison and 400 minutes for Blue Waters. However, while Aries manages long bouts of congestion better than Gemini does, application runtime variability due to network performance remains a concern [15]. • Detection of long-duration congestion using traffic measurements can facilitate inte… view at source ↗

read the original abstract

Network congestion in high-speed interconnects is a major source of application run time performance variation. Recent years have witnessed a surge of interest from both academia and industry in the development of novel approaches for congestion control at the network level and in application placement, mapping, and scheduling at the system-level. However, these studies are based on proxy applications and benchmarks that are not representative of field-congestion characteristics of high-speed interconnects. To address this gap, we present (a) an end-to-end framework for monitoring and analysis to support long-term field-congestion characterization studies, and (b) an empirical study of network congestion in petascale systems across two different interconnect technologies: (i) Cray Gemini, which uses a 3-D torus topology, and (ii) Cray Aries, which uses the DragonFly topology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper supplies field measurements of congestion on two real petascale interconnects plus a monitoring framework, which fills a stated gap but leaves the strength of the findings to the full results.

read the letter

The main thing here is real congestion data collected from production petascale runs on Cray Gemini (3-D torus) and Aries (DragonFly), along with a framework built to support ongoing field studies. That is the concrete addition over prior proxy-based work the abstract flags as insufficient. The paper does a straightforward job laying out why benchmarks fall short and then delivering measurements across the two topologies in actual systems. The motivation and the two-part contribution line up without contradiction. The argument stays consistent: the empirical component is meant to supply the representativeness that proxies lack. No load-bearing assumptions appear to be hidden in the structure. Soft spots are mostly about visibility. The abstract gives no numbers, error bars, or method details, so it is difficult to judge how conclusive or reproducible the congestion characterizations turn out to be. That is typical for an empirical systems paper at the abstract stage rather than a flaw in the approach itself. The work is aimed at the HPC systems community that cares about interconnect performance variation and scheduling. Someone who needs actual machine data to calibrate models or placement algorithms would get direct value. It shows clear thinking on the problem and engages the existing literature on why proxies are limited. I would bring it to a reading group focused on measurement and networks. I would not cite it in my own papers unless I needed a reference to this exact Gemini-Aries comparison. It deserves a serious referee to examine the data collection, framework implementation, and any statistical support in the full text.

Referee Report

0 major / 2 minor

Summary. The paper presents (a) an end-to-end framework for monitoring and analysis to enable long-term field studies of network congestion in high-speed interconnects and (b) an empirical study of congestion characteristics on petascale systems using two Cray interconnects: Gemini (3-D torus topology) and Aries (DragonFly topology). The work is motivated by the claim that prior studies rely on proxy applications and benchmarks that fail to capture real field-congestion behavior.

Significance. If the framework and empirical findings hold, the contribution is significant for systems research in high-performance computing. It supplies actual field data from production petascale machines rather than proxies, directly addressing a stated gap in the literature on congestion control and application mapping. The dual-technology comparison (torus vs. DragonFly) provides concrete topology-specific observations that can inform future scheduling and routing work. The empirical focus and provision of a reusable monitoring framework are explicit strengths.

minor comments (2)

Abstract: the description of the two interconnect technologies could include the specific machine names or node counts to allow readers to assess scale immediately.
The framework description would benefit from an explicit statement of measurement overhead and intrusiveness, as this directly affects suitability for long-term field studies.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript and the recommendation for minor revision. The provided summary accurately reflects the paper's focus on an end-to-end monitoring framework and the empirical characterization of congestion on production petascale systems using Gemini and Aries interconnects.

Circularity Check

0 steps flagged

Empirical framework and field study with no derivation chain

full rationale

The paper presents an end-to-end monitoring framework and an empirical characterization of congestion on Gemini and Aries interconnects. No equations, fitted parameters, predictions, or mathematical derivations appear in the abstract or described contributions. The central claim is the delivery of real-world data collection and analysis to address the stated motivation that proxies are unrepresentative; this is a direct empirical contribution rather than a reduction of any result to its own inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked. The derivation chain is empty, consistent with an observational systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that proxies fail to represent field behavior; no free parameters, new entities, or additional axioms are stated in the abstract.

axioms (1)

domain assumption Proxy applications and benchmarks are not representative of field-congestion characteristics of high-speed interconnects
Explicitly stated in the abstract as the motivation for developing the new framework and conducting the empirical study.

pith-pipeline@v0.9.0 · 5702 in / 1180 out tokens · 23642 ms · 2026-05-24T22:51:08.388068+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

There goes the neighborhood: performance degradation due to nearby jobs,

A. Bhatele, K. Mohror, S. H. Langer, and K. E. Isaacs, “There goes the neighborhood: performance degradation due to nearby jobs,” in Proc. International Conference for High Performance Computing, Networking, Storage and Analysis , 2013, pp. 41:1–41:12

work page 2013
[2]

Eval- uating HPC networks via simulation of parallel workloads,

N. Jain, A. Bhatele, S. White, T. Gamblin, and L. V . Kale, “Eval- uating HPC networks via simulation of parallel workloads,” in High Performance Computing, Networking, Storage and Analysis, SC16: International Conference for . IEEE, 2016, pp. 154–165

work page 2016
[3]

Characterizing the inﬂu- ence of system noise on large-scale applications by simulation,

T. Hoeﬂer, T. Schneider, and A. Lumsdaine, “Characterizing the inﬂu- ence of system noise on large-scale applications by simulation,” in Proc. ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis . IEEE, 2010, pp. 1–11

work page 2010
[4]

Topology-aware task mapping for reducing communication contention on large parallel machines,

T. Agarwal, A. Sharma, A. Laxmikant, and L. V . Kal´e, “Topology-aware task mapping for reducing communication contention on large parallel machines,” in Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International . IEEE, 2006, pp. 10–pp

work page 2006
[5]

Quantifying I/O and communication trafﬁc interference on dragonﬂy networks equipped with burst buffers,

M. Mubarak, P. Carns, J. Jenkins, J. K. Li, N. Jain, S. Snyder, R. Ross, C. D. Carothers, A. Bhatele, and K.-L. Ma, “Quantifying I/O and communication trafﬁc interference on dragonﬂy networks equipped with burst buffers,” in Cluster Computing, 2017 IEEE Int’l Conf. on . IEEE, 2017, pp. 204–215

work page 2017
[6]

Watch out for the bully!: job interference study on dragonﬂy network,

X. Yang, J. Jenkins, M. Mubarak, R. B. Ross, and Z. Lan, “Watch out for the bully!: job interference study on dragonﬂy network,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, 2016, p. 64

work page 2016
[7]

Resiliency of HPC Interconnects: A Case Study of Interconnect Failures and Recovery in Blue Waters,

S. Jha, V . Formicola, C. Di Martino, M. Dalton, W. T. Kramer, Z. Kalbarczyk, and R. K. Iyer, “Resiliency of HPC Interconnects: A Case Study of Interconnect Failures and Recovery in Blue Waters,”IEEE Transactions on Dependable and Secure Computing , 2017

work page 2017
[8]

Characterizing supercomputer trafﬁc networks through link-level analysis,

S. Jha, J. Brandt, A. Gentile, Z. Kalbarczyk, and R. Iyer, “Characterizing supercomputer trafﬁc networks through link-level analysis,” in 2018 IEEE International Conference on Cluster Computing (CLUSTER) . IEEE, 2018, pp. 562–570

work page 2018
[9]

Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications,

A. Agelastos et al., “Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications,” in SC14: International Conference for High Performance Computing, Networking, Storage and Analysis , 2014, pp. 154–165

work page 2014
[10]

Measuring con- gestion in high-performance datacenter interconnects,

S. Jha, A. Patke, B. Lim, J. Brandt, A. Gentile, G. Bauer, M. Showerman, L. Kaplan, Z. Kalbarczyk, W. T. Kramer, and R. Iyer, “Measuring con- gestion in high-performance datacenter interconnects,” in 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), Feb 2020

work page 2020
[11]

Blue Waters,

“Blue Waters,” https://bluewaters.ncsa.illinois.edu

work page
[12]

http://www.nersc.gov/users/computational-systems/edison/

“http://www.nersc.gov/users/computational-systems/edison/.”

work page
[13]

Managing System Software for the Cray Linux Environ- ment,

Cray Inc., “Managing System Software for the Cray Linux Environ- ment,” Cray Doc S-2393-5202axx, 2014

work page 2014
[14]

Using the Cray Gemini Performance Counters,

K. Pedretti, C. Vaughan, R. Barrett, K. Devine, and S. Hemmert, “Using the Cray Gemini Performance Counters,” in Proc. Cray User’s Group , 2013

work page 2013
[15]

Performance variability due to job placement on edison,

D. Wang, A. Bhatele, and D. Ghosal, “Performance variability due to job placement on edison,” Poster presented at SC14, Nov , pp. 16–21, 2014

work page 2014
[16]

Topology-aware task mapping for reducing communication contention on large parallel machines,

T. Agarwal, A. Sharma, and L. V . Kal ´e, “Topology-aware task mapping for reducing communication contention on large parallel machines,” in Proc. Int’l IEEE Parallel and Distributed Processing Symposium , 2006. 4

work page 2006

[1] [1]

There goes the neighborhood: performance degradation due to nearby jobs,

A. Bhatele, K. Mohror, S. H. Langer, and K. E. Isaacs, “There goes the neighborhood: performance degradation due to nearby jobs,” in Proc. International Conference for High Performance Computing, Networking, Storage and Analysis , 2013, pp. 41:1–41:12

work page 2013

[2] [2]

Eval- uating HPC networks via simulation of parallel workloads,

N. Jain, A. Bhatele, S. White, T. Gamblin, and L. V . Kale, “Eval- uating HPC networks via simulation of parallel workloads,” in High Performance Computing, Networking, Storage and Analysis, SC16: International Conference for . IEEE, 2016, pp. 154–165

work page 2016

[3] [3]

Characterizing the inﬂu- ence of system noise on large-scale applications by simulation,

T. Hoeﬂer, T. Schneider, and A. Lumsdaine, “Characterizing the inﬂu- ence of system noise on large-scale applications by simulation,” in Proc. ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis . IEEE, 2010, pp. 1–11

work page 2010

[4] [4]

Topology-aware task mapping for reducing communication contention on large parallel machines,

T. Agarwal, A. Sharma, A. Laxmikant, and L. V . Kal´e, “Topology-aware task mapping for reducing communication contention on large parallel machines,” in Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International . IEEE, 2006, pp. 10–pp

work page 2006

[5] [5]

Quantifying I/O and communication trafﬁc interference on dragonﬂy networks equipped with burst buffers,

M. Mubarak, P. Carns, J. Jenkins, J. K. Li, N. Jain, S. Snyder, R. Ross, C. D. Carothers, A. Bhatele, and K.-L. Ma, “Quantifying I/O and communication trafﬁc interference on dragonﬂy networks equipped with burst buffers,” in Cluster Computing, 2017 IEEE Int’l Conf. on . IEEE, 2017, pp. 204–215

work page 2017

[6] [6]

Watch out for the bully!: job interference study on dragonﬂy network,

X. Yang, J. Jenkins, M. Mubarak, R. B. Ross, and Z. Lan, “Watch out for the bully!: job interference study on dragonﬂy network,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, 2016, p. 64

work page 2016

[7] [7]

Resiliency of HPC Interconnects: A Case Study of Interconnect Failures and Recovery in Blue Waters,

S. Jha, V . Formicola, C. Di Martino, M. Dalton, W. T. Kramer, Z. Kalbarczyk, and R. K. Iyer, “Resiliency of HPC Interconnects: A Case Study of Interconnect Failures and Recovery in Blue Waters,”IEEE Transactions on Dependable and Secure Computing , 2017

work page 2017

[8] [8]

Characterizing supercomputer trafﬁc networks through link-level analysis,

S. Jha, J. Brandt, A. Gentile, Z. Kalbarczyk, and R. Iyer, “Characterizing supercomputer trafﬁc networks through link-level analysis,” in 2018 IEEE International Conference on Cluster Computing (CLUSTER) . IEEE, 2018, pp. 562–570

work page 2018

[9] [9]

Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications,

A. Agelastos et al., “Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications,” in SC14: International Conference for High Performance Computing, Networking, Storage and Analysis , 2014, pp. 154–165

work page 2014

[10] [10]

Measuring con- gestion in high-performance datacenter interconnects,

S. Jha, A. Patke, B. Lim, J. Brandt, A. Gentile, G. Bauer, M. Showerman, L. Kaplan, Z. Kalbarczyk, W. T. Kramer, and R. Iyer, “Measuring con- gestion in high-performance datacenter interconnects,” in 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), Feb 2020

work page 2020

[11] [11]

Blue Waters,

“Blue Waters,” https://bluewaters.ncsa.illinois.edu

work page

[12] [12]

http://www.nersc.gov/users/computational-systems/edison/

“http://www.nersc.gov/users/computational-systems/edison/.”

work page

[13] [13]

Managing System Software for the Cray Linux Environ- ment,

Cray Inc., “Managing System Software for the Cray Linux Environ- ment,” Cray Doc S-2393-5202axx, 2014

work page 2014

[14] [14]

Using the Cray Gemini Performance Counters,

K. Pedretti, C. Vaughan, R. Barrett, K. Devine, and S. Hemmert, “Using the Cray Gemini Performance Counters,” in Proc. Cray User’s Group , 2013

work page 2013

[15] [15]

Performance variability due to job placement on edison,

D. Wang, A. Bhatele, and D. Ghosal, “Performance variability due to job placement on edison,” Poster presented at SC14, Nov , pp. 16–21, 2014

work page 2014

[16] [16]

Topology-aware task mapping for reducing communication contention on large parallel machines,

T. Agarwal, A. Sharma, and L. V . Kal ´e, “Topology-aware task mapping for reducing communication contention on large parallel machines,” in Proc. Int’l IEEE Parallel and Distributed Processing Symposium , 2006. 4

work page 2006