pith. sign in

arxiv: 2604.15594 · v1 · submitted 2026-04-17 · 💻 cs.DC · cs.AI

DataCenterGym: A Physics-Grounded Simulator for Multi-Objective Data Center Scheduling

Pith reviewed 2026-05-10 08:30 UTC · model grok-4.3

classification 💻 cs.DC cs.AI
keywords schedulingcomputethermaldatadatacentergymdynamicsgeo-distributedh-mpc
0
0 comments X

The pith

DataCenterGym is a Gymnasium-compatible simulator integrating compute queueing, building thermal dynamics, localized HVAC, and temperature-dependent degradation for multi-objective geo-distributed data center scheduling, demonstrated with an H-MPC algorithm that outperforms baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Data centers are large facilities packed with servers that run everything from web searches to AI training. Deciding which server handles which task is complex because moving work changes how much power is drawn, how much heat is produced, and how hard the cooling systems must work. These factors interact, yet many existing schedulers treat them in isolation. DataCenterGym creates a single simulation that links job queues, building heat flow, local air conditioning responses, and how high temperatures slow down servers. It uses the standard Gymnasium interface so researchers can plug in their own scheduling algorithms and test them under realistic conditions. The authors also built a hierarchical model predictive control method that plans job placements while looking ahead at thermal and power consequences. Tests on normal operation and changing workloads show this method beats simpler baseline approaches.

Core claim

We present DataCenterGym, a physics-grounded simulation environment for job scheduling in geo-distributed data centers... We also develop a Hierarchical Model Predictive Control (H-MPC) scheduling algorithm that performs distributed job placement while explicitly accounting for thermal and power dynamics. Through experiments on nominal operation and workload sensitivity, we demonstrate how H-MPC improves scheduling performance relative to baseline schedulers.

Load-bearing premise

The integrated models of compute queueing, building thermal dynamics, localized HVAC behavior, and temperature-dependent service degradation are sufficiently accurate representations of real geo-distributed data center physics to make simulation results transferable to practice.

Figures

Figures reproduced from arXiv: 2604.15594 by Nilavra Pathak, Nirmalya Roy, Samadrita Biswas.

Figure 1
Figure 1. Figure 1: Closed-loop interaction in DataCenterGym. The scheduler observes the system state, selects job assignment and cooling actions, and the environ￾ment advances via coupled workload execution, thermal dynamics, and power evolution. A. Problem Formulation We consider an online scheduling problem over C compute clusters distributed across D geo-distributed datacenters. Time is discretized into intervals t = 0, .… view at source ↗
Figure 2
Figure 2. Figure 2: Thermal response under increasing workload. H-MPC actively tracks temperature setpoints, maintaining tightly bounded distributions and preserving [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: System saturation under increasing load. Each curve traces operating [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

Modern datacenters schedule heterogeneous workloads across geo-distributed sites with diverse compute capacities, electricity prices, and thermal conditions. Compute utilization, heat generation, cooling demand, and energy consumption are tightly coupled, yet most existing schedulers abstract these effects and treat them independently. We present \textit{DataCenterGym}, a physics-grounded simulation environment for job scheduling in geo-distributed data centers, designed as a reusable testbed for future research. The simulator integrates compute queueing, building thermal dynamics, localized HVAC behavior, and temperature-dependent service degradation within a Gymnasium-compatible interface. We also develop a Hierarchical Model Predictive Control (H-MPC) scheduling algorithm that performs distributed job placement while explicitly accounting for thermal and power dynamics. Through experiments on nominal operation and workload sensitivity, we demonstrate how H-MPC improves scheduling performance relative to baseline schedulers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Circularity Check

0 steps flagged

No circularity in derivation or prediction chain

full rationale

The paper introduces DataCenterGym as a new Gymnasium-compatible simulator that integrates standard literature models for compute queueing, thermal dynamics, HVAC, and temperature-dependent degradation, plus a new H-MPC algorithm. No equations, first-principles derivations, or predictions are shown that reduce by construction to fitted parameters, self-definitions, or self-citation chains. Performance claims are simulator-internal comparisons under nominal and sensitivity workloads; the contribution is the reusable testbed and algorithm, not a tautological result. This is self-contained engineering work with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the simulator rests on standard domain assumptions about queueing and thermal modeling; no free parameters, invented entities, or ad-hoc axioms are explicitly listed.

axioms (1)
  • domain assumption Standard models of compute queueing, building thermal dynamics, localized HVAC, and temperature-dependent service degradation are adequate for the simulation.
    Invoked implicitly by the description of the integrated simulator.

pith-pipeline@v0.9.0 · 5449 in / 1311 out tokens · 28503 ms · 2026-05-10T08:30:47.987361+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

  1. [1]

    Data centers carbon emissions at crossroads: An empirical study,

    D. Maji, W. A. Hanafy, L. Wu, D. Irwin, P. Shenoy, and R. K. Sitaraman, “Data centers carbon emissions at crossroads: An empirical study,”ACM SIGENERGY Energy Informatics Review, 2025

  2. [2]

    Data centre energy use: Critical review of models and results,

    G. Kamiya and V . C. Coroam ˘a, “Data centre energy use: Critical review of models and results,”IEA 4E TCP Efficient, Demand Flexible Networked Appliances (EDNA), 2025

  3. [3]

    Mak- ing scheduling “cool

    J. D. Moore, J. S. Chase, P. Ranganathan, and R. K. Sharma, “Mak- ing scheduling “cool”: Temperature-aware workload placement in data centers,” inUSENIX Annual Technical Conference, 2005

  4. [4]

    Large-scale cluster management at google with borg,

    A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes, “Large-scale cluster management at google with borg,” in ACM EuroSys, 2015

  5. [5]

    Mesos: A platform for{Fine- Grained}resource sharing in the data center,

    B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica, “Mesos: A platform for{Fine- Grained}resource sharing in the data center,” in8th USENIX Symposium on Networked Systems Design and Implementation (NSDI 11), 2011

  6. [6]

    Nvidia data center gpus: Power and thermal design,

    NVIDIA Corporation, “Nvidia data center gpus: Power and thermal design,” 2022

  7. [7]

    A survey on data center cooling systems: Technol- ogy, power consumption modeling and control strategy optimization,

    Q. Zhang, Z. Meng, X. Hong, Y . Zhan, J. Liu, J. Dong, T. Bai, J. Niu, and M. J. Deen, “A survey on data center cooling systems: Technol- ogy, power consumption modeling and control strategy optimization,” Journal of Systems Architecture, 2021

  8. [8]

    pmapper: power and migration cost aware application placement in virtualized systems,

    A. Verma, P. Ahuja, and A. Neogi, “pmapper: power and migration cost aware application placement in virtualized systems,” inACM Middleware. Springer, 2008

  9. [9]

    Data center cooling using model-predictive control,

    N. Lazic, C. Boutilier, T. Lu, E. Wong, B. Roy, M. Ryu, and G. Imwalle, “Data center cooling using model-predictive control,”NeurIPS, 2018

  10. [10]

    Sus- taingym: Benchmarking reinforcement learning for sustainable energy systems,

    Z. Li, M. Brady, A. Makarova, S. Choi, and C. Callison-Burch, “Sus- taingym: Benchmarking reinforcement learning for sustainable energy systems,” inNeurIPS, 2023

  11. [11]

    Cutting the electric bill for internet-scale systems,

    A. Qureshi, R. Weber, H. Balakrishnan, J. Guttag, and B. Maggs, “Cutting the electric bill for internet-scale systems,” inACM SIGCOMM, 2009

  12. [12]

    Task scheduling in geo-distributed computing: A survey,

    Y . Wu, S. Tang, C. Yu, B. Yang, C. Sun, J. Xiao, and H. Wu, “Task scheduling in geo-distributed computing: A survey,”arXiv preprint arXiv:2501.15504, 2025

  13. [13]

    Profit-sensitive spatial scheduling of multi-application tasks in distributed green clouds,

    H. Yuan, J. Bi, and M. Zhou, “Profit-sensitive spatial scheduling of multi-application tasks in distributed green clouds,”IEEE Transactions on Automation Science and Engineering, 2020

  14. [14]

    Gmta: A geo-aware multi- agent task allocation approach for scientific workflows in container- based cloud,

    M. Niu, B. Cheng, Y . Feng, and J. Chen, “Gmta: A geo-aware multi- agent task allocation approach for scientific workflows in container- based cloud,”IEEE Transactions on Network and Service Management, 2020

  15. [15]

    Joint data center cooling and workload management: A thermal- aware approach,

    S. M. Mirhoseini Nejad, H. Moazamigoodarzi, G. H. Badawy, and D. G. Down, “Joint data center cooling and workload management: A thermal- aware approach,”Future Generation Computer Systems, 2020

  16. [16]

    Cooling-aware and thermal-aware workload placement for green hpc data centers,

    A. Banerjee, T. Mukherjee, G. Varsamopoulos, and S. K. Gupta, “Cooling-aware and thermal-aware workload placement for green hpc data centers,” inInternational conference on green computing. IEEE, 2010

  17. [17]

    Forecasting gas usage for big buildings using generalized additive models and deep learning,

    N. Pathak, A. Ba, J. Ploennigs, and N. Roy, “Forecasting gas usage for big buildings using generalized additive models and deep learning,” in IEEE SMARTCOMP, 2018

  18. [18]

    A bayesian data analytics approach to buildings’ thermal parameter estimation,

    N. Pathak, J. Foulds, N. Roy, N. Banerjee, and R. Robucci, “A bayesian data analytics approach to buildings’ thermal parameter estimation,” in ACM e-Energy, 2019

  19. [19]

    Casper: Carbon- aware scheduling and provisioning for distributed web services,

    A. Souza, S. Jasoria, B. Chakrabarty, A. Bridgwater, A. Lundberg, F. Skogh, A. Ali-Eldin, D. Irwin, and P. Shenoy, “Casper: Carbon- aware scheduling and provisioning for distributed web services,” inIEEE IGSC, 2023

  20. [20]

    Going green for less green: Optimizing the cost of reducing cloud carbon emissions,

    W. A. Hanafyet al., “Going green for less green: Optimizing the cost of reducing cloud carbon emissions,” inACM ASPLOS, 2024

  21. [21]

    F2s-wss: A forecast-driven two-stage workload schedul- ing scheme for carbon-aware geo-distributed data centers with wind power integration,

    X. Zhaiet al., “F2s-wss: A forecast-driven two-stage workload schedul- ing scheme for carbon-aware geo-distributed data centers with wind power integration,”Sustainable Computing: Informatics and Systems, 2025

  22. [22]

    Carbon-aware energy cost optimization of data analytics across geo-distributed data centers,

    Y .-T. Chen, L.-L. Luo, D.-K. Guo, and Q. He, “Carbon-aware energy cost optimization of data analytics across geo-distributed data centers,” Journal of Computer Science and Technology, 2025

  23. [23]

    Hierarchial demand response for colocation data centers,

    H. Xu, X. Jin, and Q. Deng, “Hierarchial demand response for colocation data centers,” inIEEE SMARTCOMP, 2017

  24. [24]

    Resource manage- ment with deep reinforcement learning,

    H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource manage- ment with deep reinforcement learning,” inACM HotNets, 2016

  25. [25]

    Cloudsim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algo- rithms,

    R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. F. De Rose, and R. Buyya, “Cloudsim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algo- rithms,”Software: Practice and Experience, 2011

  26. [26]

    Greencloud: A packet-level simulator of energy-aware cloud computing data centers,

    D. Kliazovich, P. Bouvry, and S. U. Khan, “Greencloud: A packet-level simulator of energy-aware cloud computing data centers,”The Journal of Supercomputing, 2012

  27. [27]

    An open-source simulation platform for benchmarking geo-distributed data center schedulers,

    D. Alves, K. Obraczka, and A. Kabbani, “An open-source simulation platform for benchmarking geo-distributed data center schedulers,”Sim- ulation, 2024

  28. [28]

    Clockwork: A delay-based global scheduling framework for more consistent landing times in the data warehouse,

    M. Valdez-Vivas, V . Sharma, N. Stanisha, S. Li, L. Mi, W. Jiang, A. Kalinin, and J. Metzler, “Clockwork: A delay-based global scheduling framework for more consistent landing times in the data warehouse,” in ACM SigKDD, 2021

  29. [29]

    Opendc 2.0: Convenient modeling and simulation of emerging technologies in cloud datacenters,

    F. Mastenbroek, G. Andreadis, S. Jounaid, W. Lai, J. Burley, J. Bosch, E. Van Eyk, L. Versluis, V . Van Beek, and A. Iosup, “Opendc 2.0: Convenient modeling and simulation of emerging technologies in cloud datacenters,” in2021 IEEE/ACM CCGrid. IEEE, 2021

  30. [30]

    Alibaba cluster trace program,

    Alibaba Group, “Alibaba cluster trace program,” 2018, production clus- ter trace data from Alibaba cloud infrastructure

  31. [31]

    Learning- based model predictive control: Toward safe learning in control,

    L. Hewing, K. P. Wabersich, M. Menner, and M. N. Zeilinger, “Learning- based model predictive control: Toward safe learning in control,”Annual Review of Control, Robotics, and Autonomous Systems, 2020

  32. [32]

    L. A. Barroso, U. H ¨olzle, and P. Ranganathan,The datacenter as a computer: Designing warehouse-scale machines. Springer Nature, 2019