EasyRider: Mitigating Power Transients in Datacenter-Scale Training Workloads
Pith reviewed 2026-05-10 09:22 UTC · model grok-4.3
The pith
EasyRider uses rack-level auxiliary energy storage and passive components to keep GPU power swings within grid safety limits without software changes or energy waste.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EasyRider attenuates rack-level power transients from synchronized GPU workloads to levels that satisfy grid infrastructure requirements by combining passive components with actively controlled auxiliary energy storage, while a software monitor maximizes storage lifetime and no modifications are made to training frameworks or energy is dissipated.
What carries the argument
EasyRider rack power architecture: passive filters plus actively controlled auxiliary energy storage whose charge/discharge is governed by a lifetime-maximizing software monitor.
If this is right
- Datacenters could deploy the same rack hardware across mixed GPU generations and workload profiles without rewriting training code.
- Grid operators would see reduced risk of equipment stress from AI clusters even as training jobs scale to larger synchronized groups.
- Energy storage sizing can be chosen to cover the worst-case millisecond transients observed in published traces and testbed runs.
- No extra energy is lost to resistive dissipation because the storage buffers rather than dumps the excess power.
Where Pith is reading between the lines
- If the approach works at rack scale, operators could avoid costly grid upgrades when adding more AI capacity.
- The same storage layer might later support brief ride-through during utility outages if sized and controlled appropriately.
- Heterogeneous clusters mixing training and inference jobs would still benefit because the hardware acts on measured power regardless of job type.
Load-bearing premise
The auxiliary energy storage can survive the frequent charge and discharge cycles created by real AI training patterns without wearing out quickly, and hardware control alone is enough to hold power within grid limits.
What would settle it
A multi-week run on a production-scale rack where either the storage capacity falls below usable levels from cycle wear or measured power ramp rates still exceed grid safety thresholds.
Figures
read the original abstract
Large-scale AI model training workloads use thousands of GPUs operating in tightly synchronized loops. During synchronous communication, start-up, shut-down, and checkpointing, GPU power consumption can swing from peak to idle within milliseconds. These large and rapid load swings endanger grid infrastructure as they induce steep power ramp rates, voltage and frequency shifts, and reactive power transients that can damage transformers, converters, and protection equipment. To solve this problem, we introduce EasyRider, a power architecture to mitigate power fluctuations at the rack level. EasyRider uses passive components and actively-controlled auxiliary energy storage to attenuate rack power swings. A software system continually monitors the energy storage system to maximize its lifetime in the presence of frequent charge/discharge cycles. EasyRider filters rack power variations to be within grid safety requirements without requiring software modifications to AI training frameworks or wasting energy. We evaluate EasyRider on a 400VDC-rated prototype system against published workload traces and our own GPU testbed, demonstrating its effectiveness across heterogeneous power levels and workload power profiles.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces EasyRider, a rack-level power architecture that combines passive components with actively controlled auxiliary energy storage and a monitoring software layer to attenuate millisecond-scale power transients from synchronized GPU training workloads. The design aims to keep rack power variations, voltage excursions, and dP/dt within grid safety limits without modifying AI frameworks or dissipating energy. Effectiveness is asserted via evaluation on a 400 VDC prototype against published traces and a small GPU testbed across heterogeneous power levels.
Significance. If the quantitative claims hold, the work addresses a timely infrastructure bottleneck for hyperscale AI training: rapid load swings that threaten grid equipment. The combination of hardware filtering with lifetime-aware software control, without framework hooks or energy waste, would be a practical contribution to datacenter power management.
major comments (3)
- [Evaluation] Evaluation section: the manuscript asserts that the prototype demonstrates effectiveness against traces and the testbed, yet supplies no quantitative metrics (e.g., achieved dP/dt reduction, peak voltage deviation, or fraction of transients kept inside grid limits), error bars, or statistical analysis of the results. This absence leaves the central effectiveness claim without supporting evidence.
- [System Design / Software Monitoring] Auxiliary storage and lifetime monitoring: the design relies on the storage surviving thousands of high-rate charge/discharge cycles per day and on the software successfully extending its lifetime, but no cycle-life data, degradation model, or closed-loop lifetime measurements are presented. These assumptions are load-bearing for practicality.
- [Control Architecture] Reactive control analysis: the paper claims passive elements plus real-time active control suffice without advance knowledge of collective GPU events (barriers, checkpoints), yet provides no response-time measurements, rack-scale simulation, or worst-case transient response data to confirm excursions remain within limits.
minor comments (3)
- [Hardware Architecture] Specify the exact chemistry or technology of the auxiliary storage (supercapacitor, lithium-ion, etc.) and its key ratings (ESR, cycle life at the observed C-rates).
- [Prototype Implementation] Add component values, schematic details, and measured efficiency of the passive filter network in the prototype description.
- [Figures] Ensure all figures include quantitative axes, legends, and clear comparison between baseline and EasyRider traces.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. The comments identify key areas where additional evidence and analysis would strengthen the manuscript. We address each major comment below and will revise the paper to incorporate the requested quantitative data, models, and measurements.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: the manuscript asserts that the prototype demonstrates effectiveness against traces and the testbed, yet supplies no quantitative metrics (e.g., achieved dP/dt reduction, peak voltage deviation, or fraction of transients kept inside grid limits), error bars, or statistical analysis of the results. This absence leaves the central effectiveness claim without supporting evidence.
Authors: We agree that the evaluation relies primarily on visual comparisons in figures without accompanying numerical summaries or statistical support. In the revised manuscript we will add explicit metrics including achieved dP/dt reductions (with before/after values), peak voltage deviations, the fraction of transients remaining inside grid limits, error bars from repeated runs, and basic statistical analysis across the workload traces and testbed experiments. revision: yes
-
Referee: [System Design / Software Monitoring] Auxiliary storage and lifetime monitoring: the design relies on the storage surviving thousands of high-rate charge/discharge cycles per day and on the software successfully extending its lifetime, but no cycle-life data, degradation model, or closed-loop lifetime measurements are presented. These assumptions are load-bearing for practicality.
Authors: The software layer applies conservative limits on state-of-charge, temperature, and cycle counts using standard degradation models from the literature. We acknowledge that the manuscript does not present the explicit model or projected lifetime numbers. We will add a new subsection describing the degradation model employed, the cycle-life projections under the observed high-rate cycling, and how the monitoring policy extends usable lifetime, supported by references to established battery models. revision: yes
-
Referee: [Control Architecture] Reactive control analysis: the paper claims passive elements plus real-time active control suffice without advance knowledge of collective GPU events (barriers, checkpoints), yet provides no response-time measurements, rack-scale simulation, or worst-case transient response data to confirm excursions remain within limits.
Authors: The architecture description includes the time constants of the passive elements and the sub-millisecond sampling rate of the active controller. We recognize that explicit worst-case response data and simulation results are missing. In the revision we will include measured response times from the 400 VDC prototype, results from rack-scale transient simulations of synchronized GPU workloads, and analysis demonstrating that voltage and dP/dt excursions remain within grid limits under reactive control alone. revision: yes
Circularity Check
No circularity: design proposal with empirical evaluation, no derivations or self-referential reductions.
full rationale
The paper presents a hardware-software architecture for attenuating rack-level power transients using passive components and auxiliary storage, with a monitoring system for lifetime management. Evaluation relies on prototype measurements against workload traces and a GPU testbed. No equations, fitted parameters, predictions, or first-principles derivations appear in the provided text or abstract. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claims rest on the proposed design's measured performance rather than any reduction to prior inputs by construction. This is a standard systems paper with no mathematical chain to inspect for circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Rapid power swings from synchronized GPU training can be attenuated to grid-safe levels by rack-level auxiliary energy storage without software changes to training frameworks.
invented entities (1)
-
EasyRider power architecture
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Rouslan Dimitrov and Harry Petty and Neeraj Srivastava and Mathias Blake. 2025. How New GB300 NVL72 Features Provide Steady Power for AI.https://developer.nvidia.com/blog/how-new-gb300-nvl72- features-provide-steady-power-for-ai/
2025
-
[2]
Kamal Abudu, Uyioghosa Igie, Orlando Minervino, and Richard Hamilton. 2021. Gas turbine efficiency and ramp rate improve- ment through compressed air injection.Proceedings of the Institu- tion of Mechanical Engineers, Part A: Journal of Power and Energy235, 4 (2021), 866–884. arXiv:https://doi.org/10.1177/0957650920932083 doi:10.1177/0957650920932083
-
[3]
2025.Connection Require- ments for Transmission-Connected Data Centres
Alberta Electric System Operator (AESO). 2025.Connection Require- ments for Transmission-Connected Data Centres. Draft for Stake- holder Review. Alberta Electric System Operator, Calgary, Alberta. https://www.aeso.ca/Version dated August 22, 2025
2025
-
[4]
Daiyaan Arfeen, Zhen Zhang, Xinwei Fu, Gregory Ganger, and Yida Wang. 2025. PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training. InProceedings of Machine Learn- ing and Systems, M. Zaharia, G. Joshi, and Y. Lin (Eds.), Vol. 7. MLSys.https://proceedings.mlsys.org/paper_files/paper/2025/file/ 53d3f45797970d323bd8a0d379c525aa-Paper-Conference.pdf
2025
-
[5]
2013.The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition.http://dx.doi.org/10.2200/ S00516ED2V01Y201306CAC024
Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. 2013.The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition.http://dx.doi.org/10.2200/ S00516ED2V01Y201306CAC024
2013
-
[6]
Koenig, Sridutt Bhalachandra, Mehdi Sheikhalishahi, Tapasya Patki, Barry Rountree, and Stephen Poole
Natalie Bates, Girish Ghatikar, Ghaleb Abdulla, Gregory A. Koenig, Sridutt Bhalachandra, Mehdi Sheikhalishahi, Tapasya Patki, Barry Rountree, and Stephen Poole. 2015. Electrical Grid and Supercom- puting Centers: An Investigative Analysis of Emerging Opportu- nities and Challenges.Informatik-Spektrum38, 2 (2015), 111–127. doi:10.1007/s00287-014-0850-0
-
[7]
Saumil Baxi, Kayla Cummings, Alexandre Jacquillat, Sean Lo, Rob McDonald, Konstantina Mellou, Ishai Menache, and Marco Molinaro
-
[8]
arXiv:2501.12725 [math.OC] https://arxiv.org/abs/2501.12725
Online Rack Placement in Large-Scale Data Centers: Online Sampling Optimization and Deployment. arXiv:2501.12725 [math.OC] https://arxiv.org/abs/2501.12725
-
[9]
Ricardo Bianchini, Christian Belady, and Anand Sivasubramaniam
-
[10]
Data Center Power and Energy Management: Past, Present, and Future.IEEE Micro44, 5 (Sept. 2024), 30–36. doi:10.1109/MM.2024. 3426478
-
[11]
Mathias Blake, Martin Hsu, Ivan Goldwasser, Harry Petty, and Jared Huntington. 2025. NVIDIA 800 V HVDC Architecture Will Power the Next Generation of AI Factories. NVIDIA Devel- oper Blog.https://developer.nvidia.com/blog/nvidia-800-v-hvdc- architecture-will-power-the-next-generation-of-ai-factories/
2025
-
[12]
2013.Torsional Dynamics; Large 2-pole and 4-pole Steam Turbine Powertrains (GER-4724)
Eric Buskirk. 2013.Torsional Dynamics; Large 2-pole and 4-pole Steam Turbine Powertrains (GER-4724). Technical Report. General Electric Company.https://www.gevernova.com/content/dam/gepower- new/global/en_US/downloads/gas-new-site/resources/reference/ ger-4724-torsional-dynamics-large-2-and-4-pole-steam-turbine- powertrains.pdf
2013
-
[13]
Sangjin Choi, Inhoe Koo, Jeongseob Ahn, Myeongjae Jeon, and Youngjin Kwon. 2023. EnvPipe: Performance-preserving DNN Train- ing Framework for Saving Energy. In2023 USENIX Annual Technical Conference (USENIX ATC 23). USENIX Association, Boston, MA, 851– 864.https://www.usenix.org/conference/atc23/presentation/choi
2023
-
[14]
Esha Choukse, Brijesh Warrier, Scot Heath, Luz Belmont, April Zhao, Hassan Ali Khan, Brian Harry, Matthew Kappel, Russell J. Hewett, Kushal Datta, Yu Pei, Caroline Lichtenberger, John Siegler, David Lukofsky, Zaid Kahn, Gurpreet Sahota, Andy Sullivan, Charles Fred- erick, Hien Thai, Rebecca Naughton, Daniel Jurnove, Justin Harp, Reid Carper, Nithish Mahal...
-
[15]
Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, and Mosharaf Chowdhury. 2024. Reducing Energy Bloat in Large Model Training. InProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles(Austin, TX, USA)(SOSP ’24). As- sociation for Computing Machinery, New York, NY, USA, 144–159. doi:10.1145/3694715.3695970
-
[16]
2019.Jan- uary 11, 2019 Oscillation Event Report
North American Electric Reliability Corporation. 2019.Jan- uary 11, 2019 Oscillation Event Report. Technical Report. NERC.https://www.nerc.com/globalassets/our-work/reports/event- reports/january_11_oscillation_event_report.pdf
2019
-
[17]
2025.Charac- teristics and Risks of Emerging Large Loads
North American Electric Reliability Corporation. 2025.Charac- teristics and Risks of Emerging Large Loads. Technical Report. NERC.https://www.nerc.com/globalassets/who-we-are/standing- committees/rstc/whitepaper-characteristics-and-risks-of-emerging- large-loads.pdf
2025
-
[18]
DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Haowei Zhang, Honghui Ding, Huaj...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[19]
2025.Grid and Market Condi- tions
Electric Reliability Council of Texas. 2025.Grid and Market Condi- tions. Technical Report. ERCOT.https://www.ercot.com/gridmktinfo/ dashboards
2025
-
[20]
Daniel Ellsworth, Tapasya Patki, Swann Perarnau, Sangmin Seo, Ab- delhalim Amer, Judicael Zounmevo, Rinku Gupta, Kazutomo Yoshii, Henry Hoffman, Allen Malony, Martin Schulz, and Pete Beckman
-
[21]
In2016 IEEE In- ternational Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Systemwide Power Management with Argo. In2016 IEEE In- ternational Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1118–1121. doi:10.1109/IPDPSW.2016.81
-
[22]
Miguel Angel Gonzalez-Salazar, Trevor Kirsten, and Lubos Prch- lik. 2018. Review of the operational flexibility and emissions of gas- and coal-fired power plants in a future with growing renew- ables.Renewable and Sustainable Energy Reviews82 (2018), 1497–1513. doi:10.1016/j.rser.2017.05.278
-
[23]
2021.Recommended Oscillation Analysis for Monitoring and Mitigation Reference Document
North American Electric Reliability Corporation Synchronized Mea- surement Working Group. 2021.Recommended Oscillation Analysis for Monitoring and Mitigation Reference Document. Technical Report. NERC
2021
- [24]
-
[25]
Chang-Hong Hsu, Qingyuan Deng, Jason Mars, and Lingjia Tang
-
[26]
SmoothOperator: Reducing Power Fragmentation and Improving Power Utilization in Large-scale Datacenters. InProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems(Williamsburg, VA, USA)(ASPLOS ’18). Association for Computing Machinery, New York, NY, USA, 535–548. doi:10.1145/3173162.3173190
- [27]
-
[28]
Patrick Kennedy. 2025. Inside the 100K GPU xAI Colossus Cluster that Supermicro helped build for Elon Musk.https://www.supermicro. com/CaseStudies/Success_Story_xAI_Colossus_Cluster.pdf
2025
-
[29]
2003.Frequency control concerns in the North American electric power system
Brendan J Kirby. 2003.Frequency control concerns in the North American electric power system. Technical Report. ORNL
2003
-
[30]
Grzegorz Koszczal, Jan Dobrosolski, Mariusz Matuszek, and Pawel Czarnul. 2023. Performance and Energy Aware Training of a Deep Neural Network in a Multi-GPU Environment with Power Capping. InEuro-Par 2023: Parallel Processing Workshops: Euro-Par 2023 Inter- national Workshops, Limassol, Cyprus, August 28 – September 1, 2023, Revised Selected Papers, Part ...
-
[31]
Kubernetes. 2014. Kubernetes.https://kubernetes.io/
2014
-
[32]
Alok Gautam Kumbhare, Reza Azimi, Ioannis Manousakis, Anand Bonde, Felipe Frujeri, Nithish Mahalingam, Pulkit A Misra, Seyyed Ah- mad Javadi, Bianca Schroeder, Marcus Fontoura, et al . 2021. {Prediction-Based} power oversubscription in cloud platforms. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). 473–487
2021
-
[33]
Lam, Xiaofan Cui, Florian Stroebl, Maitri Uppaluri, Simona Onori, and William C
Vivek N. Lam, Xiaofan Cui, Florian Stroebl, Maitri Uppaluri, Simona Onori, and William C. Chueh. 2025. A decade of insights: Delving into calendar aging trends and implications.Joule9, 1 (2025), 101796. doi:10.1016/j.joule.2024.11.013
-
[34]
Shaohong Li, Xi Wang, Xiao Zhang, Vasileios Kontorinis, Sreeku- mar Kodakara, David Lo, and Parthasarathy Ranganathan. 2020. Thunderbolt:{Throughput-Optimized},{Quality-of-Service-Aware} power capping at scale. In14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 1241–1255
2020
-
[35]
Lefurgy, Karthick Rajamani, Malcolm S
Yang Li, Charles R. Lefurgy, Karthick Rajamani, Malcolm S. Allen- Ware, Guillermo J. Silva, Daniel D. Heimsoth, Saugata Ghose, and Onur Mutlu. 2019. A Scalable Priority-Aware Approach to Managing Data Center Server Power. In2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 701–714. doi:10. 1109/HPCA.2019.00067
- [36]
- [37]
-
[38]
Meta, Inc. 2024. The Llama 3 Herd of Models. arXiv:2407.21783 [cs.AI] https://arxiv.org/abs/2407.21783
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[39]
2024.2024 Work Trend Index Annual Report
North American Electric Reliability Corporation. 2024.2024 Long-Term Reliability Assessment. Technical Report. NERC
-
[40]
NVIDIA Corporation. 2024. Nvidia GB200 NVL72: Specifications and Deployment Details. Blackwell NVL72 system draws 120 kilowatts on FP4 performance
2024
-
[41]
Jeremie Eliahou Ontiveros, Ajey Pandey, and Dylan Patel. 2025. AI Training Load Fluctuations at Gigawatt-scale – Risk of Power Grid Blackout? SemiAnalysis.https://semianalysis.com/2025/06/25/ai- training-load-fluctuations-at-gigawatt-scale-risk-of-power-grid- blackout/
2025
-
[42]
Wright, and Zhengji Zhao
Tapasya Patki, Barry Rountree, Torsten Wilde, Andrea Bartolini, Stephanie Brink, Esa Heiskanen, Sachin Idgunji, Matthias Maiterth, James Rogers, Ermal Rrapaj, Ralf Schneider, Woong Shin, Kathleen Shoga, Christian Simmendinger, Nicholas J. Wright, and Zhengji Zhao
-
[43]
InProceedings of the 39th ACM International Conference on Supercomputing (ICS ’25)
A Global Perspective on Supercomputer Power Provisioning: Case Studies from United States and Europe. InProceedings of the 39th ACM International Conference on Supercomputing (ICS ’25). As- sociation for Computing Machinery, New York, NY, USA, 1034–1051. doi:10.1145/3721145.3734532
-
[44]
Leonardo Piga, Iyswarya Narayanan, Aditya Sundarrajan, Matt Skach, Qingyuan Deng, Biswadip Maity, Manoj Chakkaravarthy, Alison Huang, Abhishek Dhanotia, and Parth Malani. 2024. Expanding Datacenter Capacity with DVFS Boosting: A safe and scalable de- ployment experience. InProceedings of the 29th ACM International Conference on Architectural Support for P...
-
[45]
Penghui Qi, Xinyi Wan, Guangxing Huang, and Min Lin. 2024. Zero Bubble (Almost) Pipeline Parallelism. InThe Twelfth International Con- ference on Learning Representations.https://openreview.net/forum? id=tuzTN0eIO5
2024
-
[46]
Zhihui Shao, Mohammad A. Islam, and Shaolei Ren. 2020. DeepPM: Efficient Power Management in Edge Data Centers using Energy Storage. In2020 IEEE 13th International Conference on Cloud Computing (CLOUD). 370–379. doi:10.1109/CLOUD49709.2020.00058 13
-
[47]
Woong Shin, Vladyslav Oles, Ahmad Maroof Karimi, J. Austin Ellis, and Feiyi Wang. 2021. Revealing power, energy and thermal dynamics of a 200PF pre-exascale supercomputer. InProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(St. Louis, Missouri)(SC ’21). Association for Computing Machinery, New Yor...
-
[48]
Grant L. Stewart, Gregory A. Koenig, Jingjing Liu, Anders Clausen, Sonja Klingert, and Natalie Bates. 2019. Grid Accommodation of Dy- namic HPC Demand. InWorkshop Proceedings of the 48th International Conference on Parallel Processing (ICPP Workshops ’19). Association for Computing Machinery, New York, NY, USA, Article 9, 4 pages. doi:10.1145/3339186.3339214
-
[49]
Dan Swinhoe. 2025. Proposals for 100MW natural gas- powered data center campus rejected in North Carolina. https://www.datacenterdynamics.com/en/news/100mw-natural- gas-powered-data-center-campus-proposed-in-north-carolina/
2025
-
[50]
Energy Information Administration
U.S. Energy Information Administration. 2024. Electricity use in homes.https://www.eia.gov/energyexplained/use-of-energy/ electricity-use-in-homes.php. Accessed: 2026-04-08
2024
-
[51]
Korupolu, David Op- penheimer, Eric Tune, and John Wilkes
Abhishek Verma, Luis Pedrosa, Madhukar R. Korupolu, David Op- penheimer, Eric Tune, and John Wilkes. 2015. Large-scale cluster management at Google with Borg. InProceedings of the European Conference on Computer Systems (EuroSys). Bordeaux, France
2015
-
[52]
Jarred Walton. 2025. Nvidia Shows Off Rubin Ultra with 600,000-Watt Kyber Racks and Infrastructure, Coming in 2027. https://www.tomshardware.com/pc-components/gpus/nvidia- shows-off-rubin-ultra-with-600-000-watt-kyber-racks-and- infrastructure-coming-in-2027Kyber rack architecture targeting 600kW per rack with Rubin Ultra GPUs
2025
-
[53]
C. Wang and S.M. Shahidehpour. 1993. Effects of ramp-rate limits on unit commitment and economic dispatch.IEEE Transactions on Power Systems8, 3 (1993), 1341–1350. doi:10.1109/59.260859
-
[54]
Farui Wang, Weizhe Zhang, Shichao Lai, Meng Hao, and Zheng Wang
-
[55]
Dynamic GPU Energy Optimization for Machine Learning Training Workloads.IEEE Transactions on Parallel and Distributed Systems33, 11 (2022), 2943–2954. doi:10.1109/TPDS.2021.3137867
-
[56]
Keith Watson. 2025. Data Centers – A Good Grid Citi- zen.https://www.ercot.com/files/docs/2025/07/10/Eaton-Data- center-A-Good-Grid-Citizen.pdf
2025
-
[57]
Qiang Wu, Qingyuan Deng, Lakshmi Ganesh, Chang-Hong Hsu, Yun Jin, Sanjeev Kumar, Bin Li, Justin Meza, and Yee Jiun Song. 2016. Dy- namo: facebook’s data center-wide power management system. In Proceedings of the 43rd International Symposium on Computer Archi- tecture(Seoul, Republic of Korea)(ISCA ’16). IEEE Press, 469–480. doi:10.1109/ISCA.2016.48
-
[58]
Tianyuan Wu, Lunxi Cao, Hanfeng Lu, Xiaoxiao Jiang, Yinghao Yu, Siran Yang, Guodong Yang, Jiamang Wang, Lin Qu, Liping Zhang, and Wei Wang. 2026. Attack of the Bubbles: Straggler-Resilient Pipeline Parallelism for Large Model Training. In23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26), Vol. 23.https: //www.usenix.org/confere...
2026
-
[59]
Wanwan Xu, Huiying Cao, Xingyu Lin, Fuchun Shu, Jialu Du, Junzhou Wang, and Junjie Tang. 2023. Data-Driven Semi-Empirical Model Approximation Method for Capacity Degradation of Retired Lithium- Ion Battery Considering SOC Range.Applied Sciences13, 21 (2023). doi:10.3390/app132111943
-
[60]
Jie You, Jae-Won Chung, and Mosharaf Chowdhury. 2023. Zeus: Under- standing and Optimizing GPU Energy Consumption of DNN Training. In20th USENIX Symposium on Networked Systems Design and Im- plementation (NSDI 23). USENIX Association, Boston, MA, 119–139. https://www.usenix.org/conference/nsdi23/presentation/you
2023
-
[61]
Chaojie Zhang, Alok Kumbhare, Ioannis Manousakis, Deli Zhang, Pulkit Misra, Rod Assis, Kyle Woolcock, Nithish Mahalingam, Bri- jesh Warrier, David Gauthier, Lalu Kunnath, Steve Solomon, Os- valdo Morales, Marcus Fontoura, and Ricardo Bianchini. 2021. Flex: High-Availability Datacenters With Zero Reserved Power. InPro- ceedings of the International Symposi...
2021
-
[62]
Dan Zhao, Siddharth Samsi, Joseph McDonald, Baolin Li, David Bestor, Michael Jones, Devesh Tiwari, and Vijay Gadepally. 2023. Sustain- able Supercomputing for AI: GPU Power Capping at HPC Scale. In Proceedings of the 2023 ACM Symposium on Cloud Computing(Santa Cruz, CA, USA)(SoCC ’23). Association for Computing Machinery, New York, NY, USA, 588–596. doi:1...
-
[63]
Wenli Zheng, Kai Ma, and Xiaorui Wang. 2015. TE-Shave: Reducing Data Center Capital and Operating Expenses with Thermal Energy Storage.IEEE Trans. Comput.64, 11 (2015), 3278–3292. doi:10.1109/ TC.2015.2394381 14 A Hardware Components: Values and Sizing A.1 Component Sizing Energy storage capacity:Suppose we are using EasyRider to ride through the power tr...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.