On the Power Saving in High-Speed Ethernet-based Networks for Supercomputers and Data Centers

Francisco J. Alfaro-Cort\'es; Francisco J. and\'ujar; Jesus Escudero-Sahuquillo; Jos\'e L. S\'anchez; Miguel S\'anchez de La Rosa

arxiv: 2510.19783 · v4 · submitted 2025-10-22 · 💻 cs.NI · cs.PF

On the Power Saving in High-Speed Ethernet-based Networks for Supercomputers and Data Centers

Miguel S\'anchez de La Rosa , Francisco J. and\'ujar , Jesus Escudero-Sahuquillo , Jos\'e L. S\'anchez , Francisco J. Alfaro-Cort\'es This is my paper

Pith reviewed 2026-05-18 04:28 UTC · model grok-4.3

classification 💻 cs.NI cs.PF

keywords Energy Efficient Ethernetpower savingPerfBoundHPC networksdata centerssupercomputersdynamic power-downpost-exascale networks

0 comments

The pith

An enhanced PerfBound mechanism reduces energy use in high-speed Ethernet networks for supercomputers and data centers with minimal or no performance penalty.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines power-saving techniques for HPC and datacenter networks based on the Energy Efficient Ethernet protocol and its flexibility for conventional or upcoming interconnects. It identifies weaknesses in dynamic power-down methods and proposes enhancements to PerfBound that deliver greater energy reduction. Simulations using traffic patterns from HPC and machine learning applications measure the effects on both performance and energy consumption. A sympathetic reader would care because large-scale systems face rising sustainability and cost pressures, and techniques that cut power without slowing computations could ease those pressures in post-exascale deployments.

Core claim

The central claim is that dynamic power-down mechanisms contain identifiable weaknesses that an enhancement to the PerfBound technique can address, yielding improved energy reduction with minimal or no performance penalty; this is shown through modeling in a simulation framework and experiments on traffic generated by selected HPC and machine learning applications, while targeting emerging post-exascale networks.

What carries the argument

The PerfBound power-saving mechanism, analyzed for weaknesses in dynamic power-down and then extended to improve the energy-performance trade-off.

If this is right

The enhanced technique applies across conventional Ethernet and upcoming versions such as BXI and Omnipath.
Energy consumption varies with the specific traffic patterns of HPC and machine learning applications.
System and network energy use can be lowered while keeping performance degradation at minimal or zero levels.
The approach supports analysis of post-exascale network scales.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the simulation results hold on physical hardware, data-center operators could adopt the enhancement to lower operational energy costs.
Similar analysis of dynamic power-down weaknesses might extend to other high-speed interconnect families beyond Ethernet derivatives.
Broader workload testing could reveal whether the energy gains remain stable under mixed or bursty traffic not covered in the selected patterns.

Load-bearing premise

The simulation framework and selected traffic patterns from HPC and machine learning applications accurately represent real hardware behavior and workloads in supercomputers and data centers.

What would settle it

A hardware measurement on real high-speed Ethernet interconnects running the same workloads that shows substantially larger performance penalties than the simulations predict would disprove the minimal-penalty claim.

Figures

Figures reproduced from arXiv: 2510.19783 by Francisco J. Alfaro-Cort\'es, Francisco J. and\'ujar, Jesus Escudero-Sahuquillo, Jos\'e L. S\'anchez, Miguel S\'anchez de La Rosa.

**Figure 2.** Figure 2: Port switching between Wake and Sleep states due to packet transmissions. This figure shows that the link state starts at Sleep, and at a certain time, it starts transitioning to its Wake state. After tw, the link is active and able to send several packets. Once the link has finished the transmission, it returns to Sleep. Note that every packet burst happening while the link is in Sleep state spends tw wai… view at source ↗

**Figure 3.** Figure 3: Visual representation of Deep Sleep and Fast Wake power states. The different link power levels are a basis for implementing LPI, included in the EEE standard [11]. This has led to several proposals tailored for HPC so that end-to-end latency is not greatly increased by the overhead of turning links on [25, 4]. The idea for HPC environments is to employ Fast Wake so that latency is not greatly affected by … view at source ↗

**Figure 4.** Figure 4: Diagram of LPI using PDT. ready to send them. Meanwhile, for inactivity periods that are on the order of seconds, the overhead on latency for state transitioning is irrelevant. A tP DT value of microseconds is also insignificant compared to the inactivity period in terms of power saving. The issue, as we have mentioned, is the intermittent nature of network activity during communication. When the tP DT is … view at source ↗

**Figure 5.** Figure 5: Port state synchronization between two switch ports. [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Network efficiency for the LAMMPS application. [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: Impact of different tPDT values for LAMMPS. savings with smaller tP DT values. Finally, note that with fixed tP DT values larger than 1ms there are barely energy savings. 4.1.2. Results using PerfBound and PerfBoundCorrect As shown in Figure 8a, we can see the overhead on the application when using PerfBound and PerfBoundCorrect techniques. Just like with a fixed tP DT , the difference between Fast Wake an… view at source ↗

**Figure 8.** Figure 8: Impact of PerfBound and PerfBoundCorrect on the LAMMPS trace. [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗

**Figure 9.** Figure 9: Network efficiency for the PATMOS application. [PITH_FULL_IMAGE:figures/full_fig_p027_9.png] view at source ↗

**Figure 10.** Figure 10: Impact of different tPDT values for PATMOS. Regarding the total energy consumed, leaving links operational after transmission is potentially beneficial for future outgoing packets at the cost of more energy consumed over time. Figure 10b shows the effect each PDT value has on the total energy consumed by the system. As is the case with execution time, the ports experience few transitions because of the tr… view at source ↗

**Figure 11.** Figure 11: Impact of PerfBound and PerfBoundCorrect on the PATMOS trace. [PITH_FULL_IMAGE:figures/full_fig_p028_11.png] view at source ↗

**Figure 12.** Figure 12: Network efficiency for the MLWF application. [PITH_FULL_IMAGE:figures/full_fig_p029_12.png] view at source ↗

**Figure 13.** Figure 13: Impact of different tPDT values for MLWF. we increase tP DT values, the latency starts dropping to lower, more acceptable values. The results for energy saving for this application are shown in Figure 13c. As shown, tP DT values of 10 and 100 µs increase the energy saved in both power-saving states. Lower values provide marginal power-saving values or even increased energy usage in that proportion. Indeed… view at source ↗

**Figure 14.** Figure 14: Impact of PerfBound and PerfBoundCorrect on the MLWF trace. [PITH_FULL_IMAGE:figures/full_fig_p031_14.png] view at source ↗

**Figure 15.** Figure 15: Network efficiency for the ALEXNET application. [PITH_FULL_IMAGE:figures/full_fig_p032_15.png] view at source ↗

**Figure 16.** Figure 16: Impact of different tPDT values for ALEXNET. Circular Hist. Regular Hist. Self-clearing Hist. 1 1 1 "! ! ! ! ! ! (a) Execution time increase. 1 [PITH_FULL_IMAGE:figures/full_fig_p033_16.png] view at source ↗

**Figure 17.** Figure 17: Impact of PerfBound and PerfBoundCorrect on the ALEXNET trace. [PITH_FULL_IMAGE:figures/full_fig_p033_17.png] view at source ↗

read the original abstract

The increase in computation and storage has led to a significant growth in the scale of systems powering applications and services, raising concerns about sustainability and operational costs. In this paper, we explore power-saving techniques in high-performance computing (HPC) and datacenter networks, and their relation with performance degradation. From this premise, we propose leveraging Energy Efficient Ethernet (EEE) protocol, with the flexibility to extend to conventional Ethernet or upcoming Ethernet-derived interconnect versions of BXI and Omnipath. We analyze the PerfBound power-saving mechanism, identifying possible improvements and modeling it into a simulation framework. Through different experiments, we examine its impact on performance and determine the most appropriate interconnect. We also study traffic patterns generated by selected HPC and machine learning applications to evaluate the behavior of power-saving techniques. From these experiments, we provide an analysis of how applications affect system and network energy consumption. Based on this, we disclose the weakness of dynamic power-down mechanisms and propose an approach that improves energy reduction with minimal or no performance penalty. To the best of our knowledge, this work presents the first thorough analysis of PerfBound and an enhancement to the technique, while also targeting emerging post-exascale networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This simulation study of an enhanced PerfBound for EEE power saving gives some useful numbers on HPC/ML traffic but rests on unanchored assumptions about real hardware behavior.

read the letter

The core of the paper is a simulation analysis of PerfBound under EEE, plus a proposed enhancement that aims for better energy reduction with little or no performance cost on the workloads they tested. They model the mechanism, run experiments on traffic from selected HPC and machine learning applications, and compare energy and performance impacts across different interconnect options including extensions toward BXI and OmniPath-style links for post-exascale systems. That part is straightforward and gives concrete data points on how app-generated flows affect idle periods and power draw.

Referee Report

2 major / 2 minor

Summary. The paper claims that dynamic power-down mechanisms in Energy Efficient Ethernet (EEE) for high-speed interconnects in supercomputers and data centers have weaknesses that can be addressed by analyzing and enhancing the PerfBound technique. Using a simulation framework, the authors evaluate its performance and energy impacts on traffic patterns from selected HPC and machine learning applications, identify limitations of existing approaches, and propose an enhancement that achieves greater energy reduction with minimal or no performance penalty. They position this as the first thorough analysis of PerfBound and extend the scope to emerging post-exascale networks based on Ethernet derivatives such as BXI and OmniPath.

Significance. If the simulation-based findings hold under real hardware conditions, the work could inform practical energy-saving strategies for large-scale networks where interconnect power is a growing fraction of total consumption. The emphasis on application-specific traffic analysis and post-exascale relevance addresses a timely sustainability concern, though the absence of hardware anchoring limits immediate deployability.

major comments (2)

[Simulation framework] Simulation framework section: The central claim that the proposed PerfBound enhancement yields improved energy reduction with minimal/no performance penalty rests on simulation results, yet the manuscript provides no hardware measurements or analytical bounds to validate EEE state transition latencies, wake-up overheads, or per-port power draw against real devices. This leaves the generalization to supercomputers and data centers unsupported.
[Experiments and traffic analysis] Traffic patterns and experiments section: The selected HPC and ML application traces are described as representative, but the paper offers no quantitative analysis or proof that they reproduce production-level burstiness, synchronization effects, or idle-period distributions at scale. Without this, the observed performance-energy trade-offs cannot be shown to generalize.

minor comments (2)

[Abstract] Abstract and introduction: The claim of presenting the 'first thorough analysis' of PerfBound would benefit from explicit comparison to prior EEE studies in the related-work section to substantiate novelty.
[Throughout] Notation: Consistent use of abbreviations (e.g., EEE, PerfBound) and clear definition of simulation parameters (e.g., idle thresholds) would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have revised the paper to improve clarity on simulation assumptions and traffic pattern justification.

read point-by-point responses

Referee: [Simulation framework] Simulation framework section: The central claim that the proposed PerfBound enhancement yields improved energy reduction with minimal/no performance penalty rests on simulation results, yet the manuscript provides no hardware measurements or analytical bounds to validate EEE state transition latencies, wake-up overheads, or per-port power draw against real devices. This leaves the generalization to supercomputers and data centers unsupported.

Authors: We agree that hardware measurements would strengthen the validation. The simulation parameters are drawn from IEEE 802.3az specifications, vendor datasheets, and prior EEE studies; we have now added explicit citations and derived analytical bounds for transition latencies in the revised Simulation Framework section. We have also inserted a Limitations subsection that qualifies the generalization to production systems and notes the simulation-based nature of the results. Direct hardware experiments are outside the current scope but are identified as future work. revision: partial
Referee: [Experiments and traffic analysis] Traffic patterns and experiments section: The selected HPC and ML application traces are described as representative, but the paper offers no quantitative analysis or proof that they reproduce production-level burstiness, synchronization effects, or idle-period distributions at scale. Without this, the observed performance-energy trade-offs cannot be shown to generalize.

Authors: The traces originate from publicly documented HPC and ML workloads with citations in the manuscript. In the revision we have added quantitative statistics, including idle-period histograms, burst-size distributions, and comparisons to metrics from prior large-scale network studies. While these additions provide stronger justification, a complete proof of representativeness for every production environment would require proprietary traces beyond our access; we therefore frame the results as indicative for the studied application classes rather than universally generalizable. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on independent simulation experiments

full rationale

The paper conducts a simulation-based study of PerfBound and EEE power-saving mechanisms using traffic patterns from selected HPC and ML applications. No mathematical derivations, equations, or predictions appear that reduce by construction to fitted parameters, self-definitions, or self-citation chains. The proposed enhancement is evaluated directly through experiments in the described framework, with conclusions drawn from observed performance and energy impacts rather than any self-referential logic. This structure is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on the abstract alone, no explicit free parameters, axioms, or invented entities are identifiable. The approach builds on the existing EEE protocol and PerfBound mechanism without introducing new postulated entities.

pith-pipeline@v0.9.0 · 5777 in / 1093 out tokens · 33315 ms · 2026-05-18T04:28:28.859601+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We analyze the PerfBound proposal, identifying possible improvements and modeling it into a simulation framework... propose an approach that improves energy reduction with minimal or no performance penalty.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery and embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The recurrent calculation of a PDT timer expiration value for every port... histogram formed of inactivity period lengths

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 1 internal anchor

[1]

Energy Proportional Datacenter Networks

Dennis Abts, Mike Marty, Philip Wells, Peter Klausler, and Hong Liu. Energy Proportional Datacenter Networks. InProceedings of the Inter- national Symposium on Computer Architecture, pages 338–347, 2010

work page 2010
[2]

Andújar, Juan A

Francisco J. Andújar, Juan A. Villar, Jose L. Sánchez, Francisco J. Alfaro, and Jesús Escudero-Sahuquillo. VEF Traces: A Framework for Modelling MPI Traffic in Interconnection Network Simulators. InThe 1st IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era, pages 841–848, Chicago, IL, USA, Sep 2015

work page 2015
[3]

Andújar, Juan A

Francisco J. Andújar, Juan A. Villar, Jose L. Sánchez, Francisco J. Alfaro, andJesúsEscudero-Sahuquillo. Anopen-sourcefamilyoftoolsto reproduce MPI-based workloads in interconnection network simulators. Journal of Supercomputing, 72(12):4601–4628, 2016

work page 2016
[4]

Andújar, Salvador Coll, Marina Alonso, Juan-Miguel Martínez, PedroLópez, JoséL.Sánchez, andFranciscoJ.Alfaro

Francisco J. Andújar, Salvador Coll, Marina Alonso, Juan-Miguel Martínez, PedroLópez, JoséL.Sánchez, andFranciscoJ.Alfaro. Energy efficient hpc network topologies with on/off links.Future Generation Computer Systems, 139:126–138, 2023

work page 2023
[5]

Andújar, Miguel Sánchez de la Rosa, Jesús Escudero- Sahuquillo, and José L

Francisco J. Andújar, Miguel Sánchez de la Rosa, Jesús Escudero- Sahuquillo, and José L. Sánchez. Extending the VEF traces framework to model data center network workloads.J. Supercomput., 79(1):814– 831, 2023

work page 2023
[6]

High performance interconnect for extreme HPC workloads

Atos. High performance interconnect for extreme HPC workloads. Last accessed: 2022-10-05. 35

work page 2022
[7]

Options for EEE in 100G IEEE 802.3bj (Draft), Septem- ber 2012

Hugh Barrass. Options for EEE in 100G IEEE 802.3bj (Draft), Septem- ber 2012

work page 2012
[8]

Birrittella, Mark Debbage, Ram Huggahalli, James Kunz, Tom Lovett, Todd Rimmer, Keith D

Mark S. Birrittella, Mark Debbage, Ram Huggahalli, James Kunz, Tom Lovett, Todd Rimmer, Keith D. Underwood, and Robert C. Zak. Intel® omni-path architecture: Enabling scalable, high performance fabrics. In2015 IEEE 23rd Annual Symposium on High-Performance Intercon- nects, pages 1–9, 2015

work page 2015
[9]

Patmos: A prototype monte carlo transport code to test high performance archi- tectures

Emeric Brun, Stéphane Chauveau, and Fausto Malvagi. Patmos: A prototype monte carlo transport code to test high performance archi- tectures. In2017 - International Conference on Mathematics and Com- putational Methods Applied to Nuclear Science and Engineering, 2017

work page 2017
[10]

RDMA – iWARP

Chelsio Communications. RDMA – iWARP

work page
[11]

IEEE 802.3az: The road to energy efficient Ethernet.Communications Magazine, IEEE, 48:50 – 56, December 2010

Ken Christensen, Pedro Reviriego, Bruce Nordman, Michael Bennett, Mehrgan Mostowfi, and Juan Antonio Maestro. IEEE 802.3az: The road to energy efficient Ethernet.Communications Magazine, IEEE, 48:50 – 56, December 2010

work page 2010
[12]

The bxi interconnect archi- tecture

Saïd Derradji, Thibaut Palfer-Sollier, Jean-Pierre Panziera, Axel Poudes, and François Wellenreiter Atos. The bxi interconnect archi- tecture. InProceedings of the 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, HOTI ’15, page 18–25, USA, 2015. IEEE Computer Society

work page 2015
[13]

Software-Managed Power Reduction in Infiniband Links

Branimir Dickov, Miquel Pericàs, Paul Carpenter, Nacho Navarro, and Eduard Ayguade. Software-Managed Power Reduction in Infiniband Links. InProceedings of the International Conference on Parallel Pro- cessing, volume 2014, September 2014

work page 2014
[14]

RED-SEA project: Towards a new-generation european interconnect.Microprocess

María Engracia Gómez et al. RED-SEA project: Towards a new-generation european interconnect.Microprocess. Microsystems, 110:105102, 2024

work page 2024
[15]

BullSequana eXascale Interconnect V3: Intelligent Net- work Management Accelerates GPU Performance in AI-HPC

Eviden SAS. BullSequana eXascale Interconnect V3: Intelligent Net- work Management Accelerates GPU Performance in AI-HPC. Technical report, Eviden, November 2024. 36

work page 2024
[16]

Mario Flajslik, Eric Borch, and Mike A. Parker. Megafly: A Topology for Exascale Systems. In Rio Yokota, Michèle Weiland, David Keyes, and Carsten Trinitis, editors,High Performance Computing, pages 289– 310, Cham, 2018. Springer International Publishing

work page 2018
[17]

Reducing the Energy Consumption of Ethernet with Adaptive Link Rate (ALR).Computers, IEEE Transactions on, 57:448–461, May 2008

Chamara Gunaratne, Ken Christensen, Bruce Nordman, and Stephen Suen. Reducing the Energy Consumption of Ethernet with Adaptive Link Rate (ALR).Computers, IEEE Transactions on, 57:448–461, May 2008

work page 2008
[18]

Active/idle toggling with low-power idle.http://www

Robert Hays. Active/idle toggling with low-power idle.http://www. ieee802.org/3/az/public/jan08/hays_01_0108.pdf, January 2008. Presented at IEEE 802.3az Task Force Meeting, January 2008

work page 2008
[19]

Horowitz

Jaeha Kim and Mark A. Horowitz. Adaptive supply serial links with sub-1-V operation and per-pin clock recovery.IEEE Journal of Solid- State Circuits, 37(11):1403–1413, 2002

work page 2002
[20]

Mi- croarchitecture of a High-Radix Router

Jongryoul Kim, William Dally, Brian Towles, and Amit Gupta. Mi- croarchitecture of a High-Radix Router. InACM SIGARCH Computer Architecture News, volume 33, pages 420– 431, July 2005

work page 2005
[21]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks.Commun. ACM, 60(6):84–90, May 2017

work page 2017
[22]

A benchmark dataset for meteorological downscaling

Michael Langguth, Ankit Patnala, Sebastian Lehner, Markus Dabernig, Konrad Mayer, Irene Schicker, GeoSphere Austria, and Paula Harder. A benchmark dataset for meteorological downscaling. InInterna- tional Conference on Learning Representations, number FZJ-2024- 07378. Jülich Supercomputing Center, 2024

work page 2024
[23]

Improvingadaptiveroutingperformanceonlargescalemegaflytopology

Md Nahid Newaz, Md Atiqul Mollah, Peyman Faizian, and Zhou Tong. Improvingadaptiveroutingperformanceonlargescalemegaflytopology. In2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pages 406–416, 2021

work page 2021
[24]

NVIDIA Corporation. RoCE vs. iWARP Competitive Analysis (Whitepaper), February 2017. 37

work page 2017
[25]

Performance evaluation of energy efficient eth- ernet.IEEE Communications Letters, 13(9):697–699, 2009

Pedro Reviriego, José Alberto Hernandez, David Larrabeiti, and Juan Antonio Maestro. Performance evaluation of energy efficient eth- ernet.IEEE Communications Letters, 13(9):697–699, 2009

work page 2009
[26]

Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge.IEEE Micro, 32(2):20– 27, 2012

Efraim Rotem, Alon Naveh, Avinash Ananthakrishnan, Eliezer Weiss- mann, and Doron Rajwan. Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge.IEEE Micro, 32(2):20– 27, 2012

work page 2012
[27]

Power/performance evaluation of energy efficient Ethernet (EEE) for High Performance Computing

Karthikeyan Saravanan, Paul Carpenter, and Alex Ramirez. Power/performance evaluation of energy efficient Ethernet (EEE) for High Performance Computing. InISPASS 2013 - IEEE Interna- tional Symposium on Performance Analysis of Systems and Software, pages 205–214, April 2013

work page 2013
[28]

Saravanan and Paul M

Karthikeyan P. Saravanan and Paul M. Carpenter. Perfbound: Conserv- ing energy with bounded overheads in on/off-based hpc interconnects. IEEE Transactions on Computers, 67(7):960–974, 2018

work page 2018
[29]

Horovod: fast and easy distributed deep learning in TensorFlow

Alexander Sergeev and Mike Del Balso. Horovod: fast and easy dis- tributed deep learning in TensorFlow.CoRR, abs/1802.05799, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[30]

Li Shang, Li-Shiuan Peh, and N.K. Jha. Dynamic voltage scaling with links for power optimization of interconnection networks. InThe Ninth International Symposium on High-Performance Computer Architecture,

work page
[31]

Proceedings., pages 91–102, 2003

HPCA-9 2003. Proceedings., pages 91–102, 2003

work page 2003
[32]

Dragonfly+: Low cost topology for scaling datacenters

Alexander Shpiner, Zachy Haramaty, Saar Eliad, Vladimir Zdornov, Barak Gafni, and Eitan Zahavi. Dragonfly+: Low cost topology for scaling datacenters. In2017 IEEE 3rd International Workshop on High- Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB), pages 1–8, 2017

work page 2017
[33]

3az™/D2., 2009

Physical Layer Specifications.IEEE Draft P802. 3az™/D2., 2009

work page 2009
[34]

CRC Press, FL, USA, 2019

Estela Suarez, Norbert Eicker, and Thomas Lippert.Modular Super- computing Architecture: from Idea to Production; 3rd, volume 3, pages 223–251. CRC Press, FL, USA, 2019

work page 2019
[35]

Green500 list

Top500.org. Green500 list. 38

work page
[36]

Top500 list

Top500.org. Top500 list. Last accessed: 2025-06-17

work page 2025
[37]

Alfaro, José Luis Sánchez Garcia, and Francisco J

Juan Antonio Villar, German Maglione Mathey, Jesús Escudero- Sahuquillo, Pedro Javier García, Francisco J. Alfaro, José Luis Sánchez Garcia, and Francisco J. Quiles. Topgen: A library to provide simula- tion tools with the modeling of interconnection network topologies. In 2018 International Conference on High Performance Computing & Sim- ulation, HPCS 20...

work page 2018
[38]

García, Francisco- J

Pedro Yébenes, Jesus Escudero-Sahuquillo, Pedro J. García, Francisco- J. Alfaro-Cortés, and Francisco J. Quiles. Providing differentiated ser- vices, congestion management, and deadlock freedom in dragonfly net- works with adaptive routing.Concurrency and Computation: Practice and Experience, 29(13):e4066, 2017

work page 2017
[39]

Fat-trees routing and node ordering providing contention free traffic for MPI global collectives.Journal of Parallel and Distributed Computing, 72, 05 2011

Eitan Zahavi. Fat-trees routing and node ordering providing contention free traffic for MPI global collectives.Journal of Parallel and Distributed Computing, 72, 05 2011

work page 2011
[40]

Congestion control for large-scale RDMA deployments

Yibo Zhu, Haggai Eran, Daniel Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron, Jitendra Padhye, Shachar Raindel, Mo- hamad Haj Yahia, and Ming Zhang. Congestion control for large-scale RDMA deployments. InProceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, page 523–536, 2015. 39

work page 2015

[1] [1]

Energy Proportional Datacenter Networks

Dennis Abts, Mike Marty, Philip Wells, Peter Klausler, and Hong Liu. Energy Proportional Datacenter Networks. InProceedings of the Inter- national Symposium on Computer Architecture, pages 338–347, 2010

work page 2010

[2] [2]

Andújar, Juan A

Francisco J. Andújar, Juan A. Villar, Jose L. Sánchez, Francisco J. Alfaro, and Jesús Escudero-Sahuquillo. VEF Traces: A Framework for Modelling MPI Traffic in Interconnection Network Simulators. InThe 1st IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era, pages 841–848, Chicago, IL, USA, Sep 2015

work page 2015

[3] [3]

Andújar, Juan A

Francisco J. Andújar, Juan A. Villar, Jose L. Sánchez, Francisco J. Alfaro, andJesúsEscudero-Sahuquillo. Anopen-sourcefamilyoftoolsto reproduce MPI-based workloads in interconnection network simulators. Journal of Supercomputing, 72(12):4601–4628, 2016

work page 2016

[4] [4]

Andújar, Salvador Coll, Marina Alonso, Juan-Miguel Martínez, PedroLópez, JoséL.Sánchez, andFranciscoJ.Alfaro

Francisco J. Andújar, Salvador Coll, Marina Alonso, Juan-Miguel Martínez, PedroLópez, JoséL.Sánchez, andFranciscoJ.Alfaro. Energy efficient hpc network topologies with on/off links.Future Generation Computer Systems, 139:126–138, 2023

work page 2023

[5] [5]

Andújar, Miguel Sánchez de la Rosa, Jesús Escudero- Sahuquillo, and José L

Francisco J. Andújar, Miguel Sánchez de la Rosa, Jesús Escudero- Sahuquillo, and José L. Sánchez. Extending the VEF traces framework to model data center network workloads.J. Supercomput., 79(1):814– 831, 2023

work page 2023

[6] [6]

High performance interconnect for extreme HPC workloads

Atos. High performance interconnect for extreme HPC workloads. Last accessed: 2022-10-05. 35

work page 2022

[7] [7]

Options for EEE in 100G IEEE 802.3bj (Draft), Septem- ber 2012

Hugh Barrass. Options for EEE in 100G IEEE 802.3bj (Draft), Septem- ber 2012

work page 2012

[8] [8]

Birrittella, Mark Debbage, Ram Huggahalli, James Kunz, Tom Lovett, Todd Rimmer, Keith D

Mark S. Birrittella, Mark Debbage, Ram Huggahalli, James Kunz, Tom Lovett, Todd Rimmer, Keith D. Underwood, and Robert C. Zak. Intel® omni-path architecture: Enabling scalable, high performance fabrics. In2015 IEEE 23rd Annual Symposium on High-Performance Intercon- nects, pages 1–9, 2015

work page 2015

[9] [9]

Patmos: A prototype monte carlo transport code to test high performance archi- tectures

Emeric Brun, Stéphane Chauveau, and Fausto Malvagi. Patmos: A prototype monte carlo transport code to test high performance archi- tectures. In2017 - International Conference on Mathematics and Com- putational Methods Applied to Nuclear Science and Engineering, 2017

work page 2017

[10] [10]

RDMA – iWARP

Chelsio Communications. RDMA – iWARP

work page

[11] [11]

IEEE 802.3az: The road to energy efficient Ethernet.Communications Magazine, IEEE, 48:50 – 56, December 2010

Ken Christensen, Pedro Reviriego, Bruce Nordman, Michael Bennett, Mehrgan Mostowfi, and Juan Antonio Maestro. IEEE 802.3az: The road to energy efficient Ethernet.Communications Magazine, IEEE, 48:50 – 56, December 2010

work page 2010

[12] [12]

The bxi interconnect archi- tecture

Saïd Derradji, Thibaut Palfer-Sollier, Jean-Pierre Panziera, Axel Poudes, and François Wellenreiter Atos. The bxi interconnect archi- tecture. InProceedings of the 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, HOTI ’15, page 18–25, USA, 2015. IEEE Computer Society

work page 2015

[13] [13]

Software-Managed Power Reduction in Infiniband Links

Branimir Dickov, Miquel Pericàs, Paul Carpenter, Nacho Navarro, and Eduard Ayguade. Software-Managed Power Reduction in Infiniband Links. InProceedings of the International Conference on Parallel Pro- cessing, volume 2014, September 2014

work page 2014

[14] [14]

RED-SEA project: Towards a new-generation european interconnect.Microprocess

María Engracia Gómez et al. RED-SEA project: Towards a new-generation european interconnect.Microprocess. Microsystems, 110:105102, 2024

work page 2024

[15] [15]

BullSequana eXascale Interconnect V3: Intelligent Net- work Management Accelerates GPU Performance in AI-HPC

Eviden SAS. BullSequana eXascale Interconnect V3: Intelligent Net- work Management Accelerates GPU Performance in AI-HPC. Technical report, Eviden, November 2024. 36

work page 2024

[16] [16]

Mario Flajslik, Eric Borch, and Mike A. Parker. Megafly: A Topology for Exascale Systems. In Rio Yokota, Michèle Weiland, David Keyes, and Carsten Trinitis, editors,High Performance Computing, pages 289– 310, Cham, 2018. Springer International Publishing

work page 2018

[17] [17]

Reducing the Energy Consumption of Ethernet with Adaptive Link Rate (ALR).Computers, IEEE Transactions on, 57:448–461, May 2008

Chamara Gunaratne, Ken Christensen, Bruce Nordman, and Stephen Suen. Reducing the Energy Consumption of Ethernet with Adaptive Link Rate (ALR).Computers, IEEE Transactions on, 57:448–461, May 2008

work page 2008

[18] [18]

Active/idle toggling with low-power idle.http://www

Robert Hays. Active/idle toggling with low-power idle.http://www. ieee802.org/3/az/public/jan08/hays_01_0108.pdf, January 2008. Presented at IEEE 802.3az Task Force Meeting, January 2008

work page 2008

[19] [19]

Horowitz

Jaeha Kim and Mark A. Horowitz. Adaptive supply serial links with sub-1-V operation and per-pin clock recovery.IEEE Journal of Solid- State Circuits, 37(11):1403–1413, 2002

work page 2002

[20] [20]

Mi- croarchitecture of a High-Radix Router

Jongryoul Kim, William Dally, Brian Towles, and Amit Gupta. Mi- croarchitecture of a High-Radix Router. InACM SIGARCH Computer Architecture News, volume 33, pages 420– 431, July 2005

work page 2005

[21] [21]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks.Commun. ACM, 60(6):84–90, May 2017

work page 2017

[22] [22]

A benchmark dataset for meteorological downscaling

Michael Langguth, Ankit Patnala, Sebastian Lehner, Markus Dabernig, Konrad Mayer, Irene Schicker, GeoSphere Austria, and Paula Harder. A benchmark dataset for meteorological downscaling. InInterna- tional Conference on Learning Representations, number FZJ-2024- 07378. Jülich Supercomputing Center, 2024

work page 2024

[23] [23]

Improvingadaptiveroutingperformanceonlargescalemegaflytopology

Md Nahid Newaz, Md Atiqul Mollah, Peyman Faizian, and Zhou Tong. Improvingadaptiveroutingperformanceonlargescalemegaflytopology. In2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pages 406–416, 2021

work page 2021

[24] [24]

NVIDIA Corporation. RoCE vs. iWARP Competitive Analysis (Whitepaper), February 2017. 37

work page 2017

[25] [25]

Performance evaluation of energy efficient eth- ernet.IEEE Communications Letters, 13(9):697–699, 2009

Pedro Reviriego, José Alberto Hernandez, David Larrabeiti, and Juan Antonio Maestro. Performance evaluation of energy efficient eth- ernet.IEEE Communications Letters, 13(9):697–699, 2009

work page 2009

[26] [26]

Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge.IEEE Micro, 32(2):20– 27, 2012

Efraim Rotem, Alon Naveh, Avinash Ananthakrishnan, Eliezer Weiss- mann, and Doron Rajwan. Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge.IEEE Micro, 32(2):20– 27, 2012

work page 2012

[27] [27]

Power/performance evaluation of energy efficient Ethernet (EEE) for High Performance Computing

Karthikeyan Saravanan, Paul Carpenter, and Alex Ramirez. Power/performance evaluation of energy efficient Ethernet (EEE) for High Performance Computing. InISPASS 2013 - IEEE Interna- tional Symposium on Performance Analysis of Systems and Software, pages 205–214, April 2013

work page 2013

[28] [28]

Saravanan and Paul M

Karthikeyan P. Saravanan and Paul M. Carpenter. Perfbound: Conserv- ing energy with bounded overheads in on/off-based hpc interconnects. IEEE Transactions on Computers, 67(7):960–974, 2018

work page 2018

[29] [29]

Horovod: fast and easy distributed deep learning in TensorFlow

Alexander Sergeev and Mike Del Balso. Horovod: fast and easy dis- tributed deep learning in TensorFlow.CoRR, abs/1802.05799, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[30] [30]

Li Shang, Li-Shiuan Peh, and N.K. Jha. Dynamic voltage scaling with links for power optimization of interconnection networks. InThe Ninth International Symposium on High-Performance Computer Architecture,

work page

[31] [31]

Proceedings., pages 91–102, 2003

HPCA-9 2003. Proceedings., pages 91–102, 2003

work page 2003

[32] [32]

Dragonfly+: Low cost topology for scaling datacenters

Alexander Shpiner, Zachy Haramaty, Saar Eliad, Vladimir Zdornov, Barak Gafni, and Eitan Zahavi. Dragonfly+: Low cost topology for scaling datacenters. In2017 IEEE 3rd International Workshop on High- Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB), pages 1–8, 2017

work page 2017

[33] [33]

3az™/D2., 2009

Physical Layer Specifications.IEEE Draft P802. 3az™/D2., 2009

work page 2009

[34] [34]

CRC Press, FL, USA, 2019

Estela Suarez, Norbert Eicker, and Thomas Lippert.Modular Super- computing Architecture: from Idea to Production; 3rd, volume 3, pages 223–251. CRC Press, FL, USA, 2019

work page 2019

[35] [35]

Green500 list

Top500.org. Green500 list. 38

work page

[36] [36]

Top500 list

Top500.org. Top500 list. Last accessed: 2025-06-17

work page 2025

[37] [37]

Alfaro, José Luis Sánchez Garcia, and Francisco J

Juan Antonio Villar, German Maglione Mathey, Jesús Escudero- Sahuquillo, Pedro Javier García, Francisco J. Alfaro, José Luis Sánchez Garcia, and Francisco J. Quiles. Topgen: A library to provide simula- tion tools with the modeling of interconnection network topologies. In 2018 International Conference on High Performance Computing & Sim- ulation, HPCS 20...

work page 2018

[38] [38]

García, Francisco- J

Pedro Yébenes, Jesus Escudero-Sahuquillo, Pedro J. García, Francisco- J. Alfaro-Cortés, and Francisco J. Quiles. Providing differentiated ser- vices, congestion management, and deadlock freedom in dragonfly net- works with adaptive routing.Concurrency and Computation: Practice and Experience, 29(13):e4066, 2017

work page 2017

[39] [39]

Fat-trees routing and node ordering providing contention free traffic for MPI global collectives.Journal of Parallel and Distributed Computing, 72, 05 2011

Eitan Zahavi. Fat-trees routing and node ordering providing contention free traffic for MPI global collectives.Journal of Parallel and Distributed Computing, 72, 05 2011

work page 2011

[40] [40]

Congestion control for large-scale RDMA deployments

Yibo Zhu, Haggai Eran, Daniel Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron, Jitendra Padhye, Shachar Raindel, Mo- hamad Haj Yahia, and Ming Zhang. Congestion control for large-scale RDMA deployments. InProceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, page 523–536, 2015. 39

work page 2015