Dual-Loop Control in DCVerse: Advancing Reliable Deployment of AI in Data Centers via Digital Twins

Guangyu Wu; Jimin Jia; Qingang Zhang; Siew-Chien Wong; Yonggang Wen; Yuejun Yan; Zhaoyang Wang

arxiv: 2604.07559 · v1 · submitted 2026-04-08 · 💻 cs.AI

Dual-Loop Control in DCVerse: Advancing Reliable Deployment of AI in Data Centers via Digital Twins

Qingang Zhang , Yuejun Yan , Guangyu Wu , Siew-Chien Wong , Jimin Jia , Zhaoyang Wang , Yonggang Wen This is my paper

Pith reviewed 2026-05-10 17:37 UTC · model grok-4.3

classification 💻 cs.AI

keywords Dual-Loop Control Frameworkdigital twindeep reinforcement learningdata centerenergy efficiencyAI deployment safety

0 comments

The pith

A dual-loop control framework with digital twins safely deploys reinforcement learning in data centers for energy savings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Dual-Loop Control Framework (DLCF) to safely deploy deep reinforcement learning in data centers. It uses a digital twin to pre-evaluate and verify policies from multiple agents before applying them to the physical system. This addresses data scarcity and safety concerns in mission-critical environments. Validation on a real cooling system demonstrated up to 4.09 percent energy savings while meeting service level agreements. The approach also improves policy interpretability and supports trustworthy AI control.

Core claim

The Dual-Loop Control Framework (DLCF) consists of the physical system, a digital twin, and a policy reservoir of diverse DRL agents that interact via a dual-loop mechanism of data acquisition, assimilation, training, pre-evaluation, and expert verification. Theoretical analysis indicates improvements in sample efficiency, generalization, safety, and optimality. Implementation in the DCVerse platform on a real-world data center cooling system achieved up to 4.09% energy savings over conventional strategies without violating SLA requirements, while enhancing interpretability.

What carries the argument

The dual-loop mechanism linking the physical data center, its digital twin for simulation and pre-evaluation, and a reservoir of DRL policies for selection and verification.

If this is right

Real-time policy training and pre-evaluation reduces the risk of unsafe actions in the physical system.
The policy reservoir allows for diverse strategies that can be chosen based on current conditions.
Expert verification step increases trust in the deployed AI policies.
The framework provides a basis for extending AI control to more complex, holistic data center optimizations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending the dual-loop approach to other AI-controlled infrastructure could mitigate deployment risks in high-stakes domains like autonomous systems or medical devices.
If digital twins can be made more accurate over time through assimilation, the framework might enable continuous online optimization with minimal human oversight.
Combining this with multi-agent systems could handle interactions between different subsystems like cooling and power management.

Load-bearing premise

The digital twin provides a sufficiently accurate model of the real data center dynamics for policies to transfer effectively without performance loss or safety issues.

What would settle it

Running the trained policy from the DCVerse platform on the physical data center cooling system for an extended period and measuring whether energy consumption decreases by around 4% while keeping all SLA metrics within required bounds.

Figures

Figures reproduced from arXiv: 2604.07559 by Guangyu Wu, Jimin Jia, Qingang Zhang, Siew-Chien Wong, Yonggang Wen, Yuejun Yan, Zhaoyang Wang.

**Figure 1.** Figure 1: Three key aspects should be considered in algorithm design: system characteristics, optimization objectives, and algorithmic properties. Online Model-Free Model-Based Given the Model Value-Based Actor-Critic Policy-Based DQN SAC PPO Offline Learn the Model AlphaZero Deep Dyna-Q MOReL Off-Policy On-Policy Deep Reinforcement Learning [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: A representative taxonomy of deep reinforcement learning algorithms. model-free DRL [8, 29]. Each category corresponds to a different learning framework and exhibits distinct algorithmic properties in terms of generalization, sample complexity, optimality, safety, and interpretability. Importantly, algorithm design should be aligned with both system characteristics and optimization goals. For instance, sys… view at source ↗

**Figure 3.** Figure 3: Dual-loop control framework for the DRL deployment. Dual-loop control framework: (1)-(4). Single-loop control framework: (5)-(6). DRL problems can also be formulated in a finite-horizon episodic setting without discounting, represented as ⟨, ,𝑀, 𝑟, 𝐻⟩, where 𝐻 is the planning horizon. This formulation suits scenarios such as learning-based MPC, where an agent repeatedly solves finite-horizon optimization… view at source ↗

**Figure 4.** Figure 4: Real-world data center cooling layout comprising three loops: condenser water loop, chilled water loop, and air loop. (a) (b) [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

**Figure 5.** Figure 5: Digital twin layout. (a) Cooling system. (b) Data hall. attributes such as rack placement, airflow organization under cold aisle containment, and the positioning of major cooling components, including CRAH units, CHW loops, and cooling towers. Subsequently, physics-informed models were established for individual cooling system components based on manufacturer specifications. These models characterize the o… view at source ↗

**Figure 6.** Figure 6: Normalized digital twin cooling system calibration results. (a) Total chilled water pump power. (b) Total chiller power. (c) Total cooling plant power [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

**Figure 7.** Figure 7: Comparisons of various control strategies. (a) Normalized total power consumption. (b) SLA: ITE inlet dry-bulb temperature. (c) SLA: ITE inlet relative humidity. (a) (b) (c) Minor Differences Between ‘Baseline’ and ‘CRAH’ Minor Differences Between ‘CRAH’ and ‘CRAH & CHW’ [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: Control actions of various control strategies. (a) CRAH air supply temperature. (b) CRAH air flow rate. (c) Chilled water supply temperature. and CHW supply temperature and systematically evaluated multiple DRL algorithms under diverse hyperparameter settings. Subsequently, an initial policy reservoir comprising various candidate policies was constructed. To ensure safe and effective deployment, we filtere… view at source ↗

**Figure 9.** Figure 9: Normalized power consumption of cooling system equipment. (a) CRAHs. (b) CHW pumps. (c) Chillers. distributions for CRAH supply air temperature setpoints, although the joint “CRAH & CHW” strategy occasionally selects lower setpoints (20–22°C) under specific conditions. For CRAH fan speed ratios, both AI-based strategies favor lower airflow rates than the baseline while maintaining SLA compliance. Regarding… view at source ↗

read the original abstract

The growing scale and complexity of modern data centers present major challenges in balancing energy efficiency with outage risk. Although Deep Reinforcement Learning (DRL) shows strong potential for intelligent control, its deployment in mission-critical systems is limited by data scarcity and the lack of real-time pre-evaluation mechanisms. This paper introduces the Dual-Loop Control Framework (DLCF), a digital twin-based architecture designed to overcome these challenges. The framework comprises three core entities: the physical system, a digital twin, and a policy reservoir of diverse DRL agents. These components interact through a dual-loop mechanism involving real-time data acquisition, data assimilation, DRL policy training, pre-evaluation, and expert verification. Theoretical analysis shows how DLCF can improve sample efficiency, generalization, safety, and optimality. Leveraging DLCF, we implemented the DCVerse platform and validated it through case studies on a real-world data center cooling system. The evaluation shows that our approach achieves up to 4.09% energy savings over conventional control strategies without violating SLA requirements. Additionally, the framework improves policy interpretability and supports more trustworthy DRL deployment. This work provides a foundation for reliable AI-based control in data centers and points toward future extensions for holistic, system-wide optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces the Dual-Loop Control Framework (DLCF), a digital-twin architecture with a physical system, digital twin, and policy reservoir of DRL agents. It claims that the dual-loop mechanism (real-time data acquisition, assimilation, training, pre-evaluation, and expert verification) yields theoretical gains in sample efficiency, generalization, safety, and optimality, and reports empirical validation via the DCVerse platform on a real data-center cooling system that achieves up to 4.09% energy savings over conventional strategies without SLA violations.

Significance. If the digital-twin fidelity and sim-to-real transfer can be quantitatively established, the DLCF would provide a concrete mechanism for safer DRL deployment in mission-critical infrastructure, directly addressing data scarcity and the absence of pre-evaluation. The explicit construction of a policy reservoir and dual-loop interaction is a strength that could be extended to other control domains.

major comments (2)

[Abstract] Abstract: the headline claim of 4.09% energy savings without SLA violations is presented as the outcome of DLCF pre-evaluation inside the digital twin, yet the abstract supplies no error metrics (RMSE on temperature, power, or flow), no calibration procedure against real sensor data, and no sim-to-real gap quantification or ablation showing that twin-predicted rewards match physical outcomes.
[Abstract] Abstract and framework description: the central attribution of savings and safety to the dual-loop pre-evaluation rests on the unstated assumption that the digital twin is sufficiently faithful; without reported fidelity metrics or transfer experiments, the savings cannot be confidently distinguished from direct tuning or other uncontrolled factors.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for greater transparency on digital-twin fidelity in the abstract. We address each comment below and have revised the manuscript to strengthen the presentation of supporting evidence from the case studies.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim of 4.09% energy savings without SLA violations is presented as the outcome of DLCF pre-evaluation inside the digital twin, yet the abstract supplies no error metrics (RMSE on temperature, power, or flow), no calibration procedure against real sensor data, and no sim-to-real gap quantification or ablation showing that twin-predicted rewards match physical outcomes.

Authors: We agree that the abstract, as a concise summary, omits explicit fidelity metrics that appear in the experimental sections. The full manuscript reports calibration against real sensor data and sim-to-real transfer results in the DCVerse case studies, including comparisons of twin-predicted versus observed outcomes. In revision we will add a single sentence to the abstract summarizing these fidelity aspects and directing readers to the relevant sections, thereby clarifying that the reported savings derive from policies pre-evaluated in the calibrated twin rather than from uncontrolled factors. revision: yes
Referee: [Abstract] Abstract and framework description: the central attribution of savings and safety to the dual-loop pre-evaluation rests on the unstated assumption that the digital twin is sufficiently faithful; without reported fidelity metrics or transfer experiments, the savings cannot be confidently distinguished from direct tuning or other uncontrolled factors.

Authors: The manuscript grounds the attribution in both the theoretical analysis of the dual-loop mechanism and the empirical results obtained on the physical data-center system after twin-based pre-evaluation. Detailed fidelity metrics and transfer experiments are already present in the body of the paper. To make this explicit at the abstract level, we will insert a brief clause referencing the calibration and transfer performance documented in the case studies, ensuring readers can immediately distinguish the contribution of the dual-loop pre-evaluation. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's chain proceeds from problem statement to introduction of the DLCF architecture (physical system + digital twin + policy reservoir), dual-loop interaction mechanisms, theoretical analysis of improvements in sample efficiency/generalization/safety/optimality, platform implementation, and empirical validation via real-world case studies reporting 4.09% energy savings. No equations or steps reduce by construction to inputs; no fitted parameters are relabeled as predictions; no load-bearing self-citations or uniqueness theorems imported from prior author work appear in the abstract or framework description. The reported results are tied to external case studies on a physical cooling system rather than internal self-referential constructions, rendering the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Ledger compiled from the abstract only; the central claim rests on the unverified assumption that the digital twin is accurate enough for pre-evaluation.

axioms (1)

domain assumption The digital twin accurately simulates the physical data center dynamics for the purpose of policy pre-evaluation and training.
This assumption is required for the dual-loop mechanism to deliver the claimed improvements in safety, sample efficiency, and optimality.

invented entities (2)

Dual-Loop Control Framework (DLCF) no independent evidence
purpose: Architecture that couples the physical system, digital twin, and policy reservoir through real-time data acquisition, assimilation, training, pre-evaluation, and expert verification.
Core novel contribution introduced by the paper.
Policy reservoir no independent evidence
purpose: Collection of diverse DRL agents intended to improve generalization, safety, and optimality.
Component of the proposed DLCF.

pith-pipeline@v0.9.0 · 5541 in / 1484 out tokens · 100629 ms · 2026-05-10T17:37:17.103442+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

[1]

Ai power: Expanding data center capacity to meet growing demand.McKinsey & Company, October 2024

McKinsey & Company. Ai power: Expanding data center capacity to meet growing demand.McKinsey & Company, October 2024

work page 2024
[2]

Electricity 2024 analysis and forecast to 2026, 2024

International Energy Agency. Electricity 2024 analysis and forecast to 2026, 2024. Available: https://iea.blob.core.windows.net/assets/18f3ed24-4b26-4c83-a3d2-8a1be51c8cc8/Electricity2024-Analysisandforecastto2026.pdf

work page 2024
[3]

Data center evolution: A tutorial on state of the art, issues, and challenges.Computer Networks, 53(17):2939–2965, 2009

Krishna Kant. Data center evolution: A tutorial on state of the art, issues, and challenges.Computer Networks, 53(17):2939–2965, 2009

work page 2009
[4]

2024 global data center survey

Uptime Institute. 2024 global data center survey. Technical report, Uptime Institute, 2024

work page 2024
[5]

Uptime’s 13th Annual Global Data Center Survey Shows Widening Range of Challenges, 2023

Uptime Institute. Uptime’s 13th Annual Global Data Center Survey Shows Widening Range of Challenges, 2023. Available: https://uptimeinstitute.com/about-ui/press-releases/uptimes-13th-annual-global-data-center-survey-shows-widening-range-of-challenges

work page 2023
[6]

The ai disruption: Challenges and guidance for data center design.Artificial Intelligence in Medicine, 138, 2023

Victor Avelar, Patrick Donovan, Paul Lin, and Wendy Torell. The ai disruption: Challenges and guidance for data center design.Artificial Intelligence in Medicine, 138, 2023

work page 2023
[7]

A survey on data center cooling systems: Technology, power consumption modeling and control strategy optimization.Journal of Systems Architecture, 119:102253, 2021

Qingxia Zhang, Zihao Meng, Xianwen Hong, Yuhao Zhan, Jia Liu, Jiabao Dong, Tian Bai, Junyu Niu, and M Jamal Deen. A survey on data center cooling systems: Technology, power consumption modeling and control strategy optimization.Journal of Systems Architecture, 119:102253, 2021

work page 2021
[8]

HussainKahil,ShivaSharma,PetriVälisuo,andMohammedElmusrati.Reinforcementlearningfordatacenterenergyefficiencyoptimization: A systematic literature review and research roadmap.Applied Energy, 389:125734, 2025

work page 2025
[9]

QingangZhang,WeiZeng,QinjieLin,Chin-BoonChng,Chee-KongChui,andPoh-SengLee.Deepreinforcementlearningtowardsreal-world dynamic thermal management of data centers.Applied Energy, 333:120561, 2023

work page 2023
[10]

Deep reinforcement learning for tropical air free-cooled data center control.ACM Transactions on Sensor Networks (TOSN), 17(3):1–28, 2021

Duc Van Le, Rongrong Wang, Yingbo Liu, Rui Tan, Yew-Wah Wong, and Yonggang Wen. Deep reinforcement learning for tropical air free-cooled data center control.ACM Transactions on Sensor Networks (TOSN), 17(3):1–28, 2021

work page 2021
[11]

Transformingcoolingoptimizationforgreendatacenterviadeepreinforcement learning

YuanlongLi,YonggangWen,DachengTao,andKyleGuan. Transformingcoolingoptimizationforgreendatacenterviadeepreinforcement learning. IEEE transactions on cybernetics, 50(5):2002–2013, 2019

work page 2002
[12]

Optimizing energy efficiency for data center via parameterized deep reinforcement learning

Yongyi Ran, Han Hu, Yonggang Wen, and Xin Zhou. Optimizing energy efficiency for data center via parameterized deep reinforcement learning. IEEE Transactions on Services Computing, 16(2):1310–1323, 2022

work page 2022
[13]

Intelligent Dynamic Thermal Control Using Deep Learning and Reinforcement Learning

Zhang Qingang. Intelligent Dynamic Thermal Control Using Deep Learning and Reinforcement Learning. PhD thesis, National University of Singapore (Singapore), 2023

work page 2023
[14]

Safecool:safeandenergy-efficientcoolingmanagementin datacenterswithmodel-basedreinforcementlearning

JianxiongWan,YanduoDuan,XiangGui,ChuyiLiu,LeixiaoLi,andZhiqiangMa. Safecool:safeandenergy-efficientcoolingmanagementin datacenterswithmodel-basedreinforcementlearning. IEEETransactionsonEmergingTopicsinComputationalIntelligence ,7(6):1621–1635, 2023

work page 2023
[15]

Efficientcompute-intensivejoballocationindatacentersviadeepreinforcementlearning

DeliangYi,XinZhou,YonggangWen,andRuiTan. Efficientcompute-intensivejoballocationindatacentersviadeepreinforcementlearning. IEEE Transactions on Parallel and Distributed Systems, 31(6):1474–1485, 2020

work page 2020
[16]

ArezooGhasemiandAminKeshavarzi.Energy-efficientvirtualmachineplacementinheterogeneousclouddatacenters:aclustering-enhanced multi-objective, multi-reward reinforcement learning approach.Cluster Computing, 27(10):14149–14166, 2024

work page 2024
[17]

Arezoo Ghasemi, Abolfazl Toroghi Haghighat, and Amin Keshavarzi. Enhancing virtual machine placement efficiency in cloud data centers: a hybrid approach using multi-objective reinforcement learning and clustering strategies.Computing, 106(9):2897–2922, 2024. 24

work page 2024
[18]

Joint it-facility optimization for green data centers via deep reinforcement learning

Xin Zhou, Ruihang Wang, Yonggang Wen, and Rui Tan. Joint it-facility optimization for green data centers via deep reinforcement learning. IEEE Network, 35(6):255–262, 2021

work page 2021
[19]

Artificial intelligence in orthopaedics: false hope or not? a narrative review along the line of gartner’s hype cycle.EFORT open reviews, 5(10):593–603, 2020

Jacobien HF Oosterhoff and Job N Doornberg. Artificial intelligence in orthopaedics: false hope or not? a narrative review along the line of gartner’s hype cycle.EFORT open reviews, 5(10):593–603, 2020

work page 2020
[20]

Energysavingevaluationofanenergyefficientdata center using a model-free reinforcement learning approach.Applied Energy, 322:119392, 2022

MuhammadHaiqalBinMahbod,ChinBoonChng,PohSengLee,andCheeKongChui. Energysavingevaluationofanenergyefficientdata center using a model-free reinforcement learning approach.Applied Energy, 322:119392, 2022

work page 2022
[21]

Residual physics and post-posed shielding for safe deep reinforcement learning method.IEEE transactions on cybernetics, 54(2):865–876, 2022

Qingang Zhang, Muhammad Haiqal Bin Mahbod, Chin-Boon Chng, Poh-Seng Lee, and Chee-Kong Chui. Residual physics and post-posed shielding for safe deep reinforcement learning method.IEEE transactions on cybernetics, 54(2):865–876, 2022

work page 2022
[22]

Drl-s: Toward safe real-world learning of dynamic thermal management in data center.Expert Systems with Applications, 214:119146, 2023

Qingang Zhang, Chin-Boon Chng, Kaiqi Chen, Poh-Seng Lee, and Chee-Kong Chui. Drl-s: Toward safe real-world learning of dynamic thermal management in data center.Expert Systems with Applications, 214:119146, 2023

work page 2023
[23]

Greendatacentercoolingcontrolviaphysics-guidedsafereinforcement learning

RuihangWang,ZhiweiCao,XinZhou,YonggangWen,andRuiTan. Greendatacentercoolingcontrolviaphysics-guidedsafereinforcement learning. ACM Transactions on Cyber-Physical Systems, 8(2):1–26, 2024

work page 2024
[24]

Towardmodel-assistedsafereinforcementlearningfordatacentercoolingcontrol: A lyapunov-based approach

ZhiweiCao,RuihangWang,XinZhou,andYonggangWen. Towardmodel-assistedsafereinforcementlearningfordatacentercoolingcontrol: A lyapunov-based approach. InProceedings of the 14th ACM International Conference on Future Energy Systems, pages 333–346, 2023

work page 2023
[25]

Uncertainty-aware online learning of dynamic thermal control in data center with imperfect pretrained models.Expert Systems with Applications, 249:123767, 2024

Qingang Zhang, Chin-Boon Chng, Chee-Kong Chui, and Poh-Seng Lee. Uncertainty-aware online learning of dynamic thermal control in data center with imperfect pretrained models.Expert Systems with Applications, 249:123767, 2024

work page 2024
[26]

Data center cooling using model-predictive control

Nevena Lazic, Craig Boutilier, Tyler Lu, Eehern Wong, Binz Roy, MK Ryu, and Greg Imwalle. Data center cooling using model-predictive control. Advances in Neural Information Processing Systems, 31, 2018

work page 2018
[27]

Large-scale data center cooling control via sample-efficient reinforcement learning

Ni Mu, Xiao Hu, Qing-Shan Jia, Xu Zhu, and Xiao He. Large-scale data center cooling control via sample-efficient reinforcement learning. In 2024 IEEE 20th International Conference on Automation Science and Engineering (CASE), pages 2780–2785. IEEE, 2024

work page 2024
[28]

Data center cooling system optimization using offline reinforcement learning.arXiv preprint arXiv:2501.15085, 2025

Xianyuan Zhan, Xiangyu Zhu, Peng Cheng, Xiao Hu, Ziteng He, Hanfei Geng, Jichao Leng, Huiwen Zheng, Chenhui Liu, Tianshun Hong, et al. Data center cooling system optimization using offline reinforcement learning.arXiv preprint arXiv:2501.15085, 2025

work page arXiv 2025
[29]

A survey on offline reinforcement learning: Taxonomy, review, and open problems.IEEE Transactions on Neural Networks and Learning Systems, 2023

Rafael Figueiredo Prudencio, Marcos ROA Maximo, and Esther Luna Colombini. A survey on offline reinforcement learning: Taxonomy, review, and open problems.IEEE Transactions on Neural Networks and Learning Systems, 2023

work page 2023
[30]

The National Academies Press, 2023

National Academies of Sciences, Engineering, and Medicine.Foundational Research Gaps and Future Directions for Digital Twins. The National Academies Press, 2023

work page 2023
[31]

Digital twin modeling.Journal of Manufacturing Systems, 64:372–389, 2022

Fei Tao, Bin Xiao, Qinglin Qi, Jiangfeng Cheng, and Ping Ji. Digital twin modeling.Journal of Manufacturing Systems, 64:372–389, 2022

work page 2022
[32]

Digital transformation: Lights and shadows

Paolo Faraboschi, Eitan Frachtenberg, Phil Laplante, Dejan Milojicic, and Roberto Saracco. Digital transformation: Lights and shadows. Computer, 56(4):123–130, 2023

work page 2023
[33]

Caper:Dual-levelphysics-data fusion with modular metamodels for reliable generalization in predictive digital twins.Applied Energy, 398:126393, 2025

QingangZhang,WenjunLong,RuihangWang,ZhiweiCao,ZhaoyangWang,YuejunYan,andYonggangWen. Caper:Dual-levelphysics-data fusion with modular metamodels for reliable generalization in predictive digital twins.Applied Energy, 398:126393, 2025

work page 2025
[34]

Reinforcement twinning: From digital twins to model-based reinforcement learning.Journal of Computational Science, 82:102421, 2024

Lorenzo Schena, Pedro A Marques, Romain Poletti, Samuel Ahizi, Jan Van den Berghe, and Miguel A Mendez. Reinforcement twinning: From digital twins to model-based reinforcement learning.Journal of Computational Science, 82:102421, 2024

work page 2024
[35]

Digital twin-enhanced deep reinforcement learning for resource management in networks slicing.IEEE Transactions on Communications, 2024

Zhengming Zhang, Yongming Huang, Cheng Zhang, Qingbi Zheng, Luxi Yang, and Xiaohu You. Digital twin-enhanced deep reinforcement learning for resource management in networks slicing.IEEE Transactions on Communications, 2024

work page 2024
[36]

Toward enhanced reinforcement learning-based resource management via digital twin: Opportunities, applications, and challenges.IEEE Network, 2024

Nan Cheng, Xiucheng Wang, Zan Li, Zhisheng Yin, Tom Luan, and Xuemin Sherman Shen. Toward enhanced reinforcement learning-based resource management via digital twin: Opportunities, applications, and challenges.IEEE Network, 2024. 25

work page 2024
[37]

Kaishu Xia, Christopher Sacco, Max Kirkpatrick, Clint Saidy, Lam Nguyen, Anil Kircaliali, and Ramy Harik. A digital twin to train deep reinforcement learning agent for smart manufacturing plants: Environment, interfaces and intelligence.Journal of Manufacturing Systems, 58:210–230, 2021

work page 2021
[38]

Digital twin application for reinforcement learning based optimal scheduling and reliability management enhancement of systems.Solar Energy, 252:29–38, 2023

Jun Zhou, Mei Yang, Yong Zhan, and Li Xu. Digital twin application for reinforcement learning based optimal scheduling and reliability management enhancement of systems.Solar Energy, 252:29–38, 2023

work page 2023
[39]

Deep reinforcement learning-based energy management system enhancement using digital twin for electric vehicles.Energy, 312:133384, 2024

Yiming Ye, Bin Xu, Hanchen Wang, Jiangfeng Zhang, Benjamin Lawler, and Beshah Ayalew. Deep reinforcement learning-based energy management system enhancement using digital twin for electric vehicles.Energy, 312:133384, 2024

work page 2024
[40]

Digital twins for data centers.Computer, 57(10):151–158, 2024

Jyotika Athavale, Cullen Bash, Wesley Brewer, Matthias Maiterth, Dejan Milojicic, Harry Petty, and Soumyendu Sarkar. Digital twins for data centers.Computer, 57(10):151–158, 2024

work page 2024
[41]

Onthesamplecomplexityofreinforcementlearning

ShamMachandranathKakade. Onthesamplecomplexityofreinforcementlearning . UniversityofLondon,UniversityCollegeLondon(United Kingdom), 2003

work page 2003
[42]

Model-basedrlincontextualdecisionprocesses:Pacbounds and exponential improvements over model-free approaches

WenSun,NanJiang,AkshayKrishnamurthy,AlekhAgarwal,andJohnLangford. Model-basedrlincontextualdecisionprocesses:Pacbounds and exponential improvements over model-free approaches. InConference on learning theory, pages 2898–2933. PMLR, 2019

work page 2019
[43]

Mopo: Model-based offline policy optimization.Advances in Neural Information Processing Systems, 33:14129–14142, 2020

Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Y Zou, Sergey Levine, Chelsea Finn, and Tengyu Ma. Mopo: Model-based offline policy optimization.Advances in Neural Information Processing Systems, 33:14129–14142, 2020

work page 2020
[44]

Morel: Model-based offline reinforcement learning

Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, and Thorsten Joachims. Morel: Model-based offline reinforcement learning. Advances in neural information processing systems, 33:21810–21823, 2020

work page 2020
[45]

Red dot analytics help data centres be cool

IMDA. Red dot analytics help data centres be cool. https://www.imda.gov.sg/resources/blog/blog-articles/2025/02/ red-dot-analytics-help-data-centres-be-cool . Accessed: May 14, 2025

work page 2025
[46]

Energyplus: creating a new-generation building energy simulation program.Energy and buildings, 33(4):319–331, 2001

Drury B Crawley, Linda K Lawrie, Frederick C Winkelmann, Walter F Buhl, Y Joe Huang, Curtis O Pedersen, Richard K Strand, Richard J Liesen, Daniel E Fisher, Michael J Witte, et al. Energyplus: creating a new-generation building energy simulation program.Energy and buildings, 33(4):319–331, 2001

work page 2001
[47]

Tianshou: A highly modularized deep reinforcement learning library.Journal of Machine Learning Research, 23(267):1–6, 2022

Jiayi Weng, Huayu Chen, Dong Yan, Kaichao You, Alexis Duburcq, Minghao Zhang, Yi Su, Hang Su, and Jun Zhu. Tianshou: A highly modularized deep reinforcement learning library.Journal of Machine Learning Research, 23(267):1–6, 2022. 26

work page 2022

[1] [1]

Ai power: Expanding data center capacity to meet growing demand.McKinsey & Company, October 2024

McKinsey & Company. Ai power: Expanding data center capacity to meet growing demand.McKinsey & Company, October 2024

work page 2024

[2] [2]

Electricity 2024 analysis and forecast to 2026, 2024

International Energy Agency. Electricity 2024 analysis and forecast to 2026, 2024. Available: https://iea.blob.core.windows.net/assets/18f3ed24-4b26-4c83-a3d2-8a1be51c8cc8/Electricity2024-Analysisandforecastto2026.pdf

work page 2024

[3] [3]

Data center evolution: A tutorial on state of the art, issues, and challenges.Computer Networks, 53(17):2939–2965, 2009

Krishna Kant. Data center evolution: A tutorial on state of the art, issues, and challenges.Computer Networks, 53(17):2939–2965, 2009

work page 2009

[4] [4]

2024 global data center survey

Uptime Institute. 2024 global data center survey. Technical report, Uptime Institute, 2024

work page 2024

[5] [5]

Uptime’s 13th Annual Global Data Center Survey Shows Widening Range of Challenges, 2023

Uptime Institute. Uptime’s 13th Annual Global Data Center Survey Shows Widening Range of Challenges, 2023. Available: https://uptimeinstitute.com/about-ui/press-releases/uptimes-13th-annual-global-data-center-survey-shows-widening-range-of-challenges

work page 2023

[6] [6]

The ai disruption: Challenges and guidance for data center design.Artificial Intelligence in Medicine, 138, 2023

Victor Avelar, Patrick Donovan, Paul Lin, and Wendy Torell. The ai disruption: Challenges and guidance for data center design.Artificial Intelligence in Medicine, 138, 2023

work page 2023

[7] [7]

A survey on data center cooling systems: Technology, power consumption modeling and control strategy optimization.Journal of Systems Architecture, 119:102253, 2021

Qingxia Zhang, Zihao Meng, Xianwen Hong, Yuhao Zhan, Jia Liu, Jiabao Dong, Tian Bai, Junyu Niu, and M Jamal Deen. A survey on data center cooling systems: Technology, power consumption modeling and control strategy optimization.Journal of Systems Architecture, 119:102253, 2021

work page 2021

[8] [8]

HussainKahil,ShivaSharma,PetriVälisuo,andMohammedElmusrati.Reinforcementlearningfordatacenterenergyefficiencyoptimization: A systematic literature review and research roadmap.Applied Energy, 389:125734, 2025

work page 2025

[9] [9]

QingangZhang,WeiZeng,QinjieLin,Chin-BoonChng,Chee-KongChui,andPoh-SengLee.Deepreinforcementlearningtowardsreal-world dynamic thermal management of data centers.Applied Energy, 333:120561, 2023

work page 2023

[10] [10]

Deep reinforcement learning for tropical air free-cooled data center control.ACM Transactions on Sensor Networks (TOSN), 17(3):1–28, 2021

Duc Van Le, Rongrong Wang, Yingbo Liu, Rui Tan, Yew-Wah Wong, and Yonggang Wen. Deep reinforcement learning for tropical air free-cooled data center control.ACM Transactions on Sensor Networks (TOSN), 17(3):1–28, 2021

work page 2021

[11] [11]

Transformingcoolingoptimizationforgreendatacenterviadeepreinforcement learning

YuanlongLi,YonggangWen,DachengTao,andKyleGuan. Transformingcoolingoptimizationforgreendatacenterviadeepreinforcement learning. IEEE transactions on cybernetics, 50(5):2002–2013, 2019

work page 2002

[12] [12]

Optimizing energy efficiency for data center via parameterized deep reinforcement learning

Yongyi Ran, Han Hu, Yonggang Wen, and Xin Zhou. Optimizing energy efficiency for data center via parameterized deep reinforcement learning. IEEE Transactions on Services Computing, 16(2):1310–1323, 2022

work page 2022

[13] [13]

Intelligent Dynamic Thermal Control Using Deep Learning and Reinforcement Learning

Zhang Qingang. Intelligent Dynamic Thermal Control Using Deep Learning and Reinforcement Learning. PhD thesis, National University of Singapore (Singapore), 2023

work page 2023

[14] [14]

Safecool:safeandenergy-efficientcoolingmanagementin datacenterswithmodel-basedreinforcementlearning

JianxiongWan,YanduoDuan,XiangGui,ChuyiLiu,LeixiaoLi,andZhiqiangMa. Safecool:safeandenergy-efficientcoolingmanagementin datacenterswithmodel-basedreinforcementlearning. IEEETransactionsonEmergingTopicsinComputationalIntelligence ,7(6):1621–1635, 2023

work page 2023

[15] [15]

Efficientcompute-intensivejoballocationindatacentersviadeepreinforcementlearning

DeliangYi,XinZhou,YonggangWen,andRuiTan. Efficientcompute-intensivejoballocationindatacentersviadeepreinforcementlearning. IEEE Transactions on Parallel and Distributed Systems, 31(6):1474–1485, 2020

work page 2020

[16] [16]

ArezooGhasemiandAminKeshavarzi.Energy-efficientvirtualmachineplacementinheterogeneousclouddatacenters:aclustering-enhanced multi-objective, multi-reward reinforcement learning approach.Cluster Computing, 27(10):14149–14166, 2024

work page 2024

[17] [17]

Arezoo Ghasemi, Abolfazl Toroghi Haghighat, and Amin Keshavarzi. Enhancing virtual machine placement efficiency in cloud data centers: a hybrid approach using multi-objective reinforcement learning and clustering strategies.Computing, 106(9):2897–2922, 2024. 24

work page 2024

[18] [18]

Joint it-facility optimization for green data centers via deep reinforcement learning

Xin Zhou, Ruihang Wang, Yonggang Wen, and Rui Tan. Joint it-facility optimization for green data centers via deep reinforcement learning. IEEE Network, 35(6):255–262, 2021

work page 2021

[19] [19]

Artificial intelligence in orthopaedics: false hope or not? a narrative review along the line of gartner’s hype cycle.EFORT open reviews, 5(10):593–603, 2020

Jacobien HF Oosterhoff and Job N Doornberg. Artificial intelligence in orthopaedics: false hope or not? a narrative review along the line of gartner’s hype cycle.EFORT open reviews, 5(10):593–603, 2020

work page 2020

[20] [20]

Energysavingevaluationofanenergyefficientdata center using a model-free reinforcement learning approach.Applied Energy, 322:119392, 2022

MuhammadHaiqalBinMahbod,ChinBoonChng,PohSengLee,andCheeKongChui. Energysavingevaluationofanenergyefficientdata center using a model-free reinforcement learning approach.Applied Energy, 322:119392, 2022

work page 2022

[21] [21]

Residual physics and post-posed shielding for safe deep reinforcement learning method.IEEE transactions on cybernetics, 54(2):865–876, 2022

Qingang Zhang, Muhammad Haiqal Bin Mahbod, Chin-Boon Chng, Poh-Seng Lee, and Chee-Kong Chui. Residual physics and post-posed shielding for safe deep reinforcement learning method.IEEE transactions on cybernetics, 54(2):865–876, 2022

work page 2022

[22] [22]

Drl-s: Toward safe real-world learning of dynamic thermal management in data center.Expert Systems with Applications, 214:119146, 2023

Qingang Zhang, Chin-Boon Chng, Kaiqi Chen, Poh-Seng Lee, and Chee-Kong Chui. Drl-s: Toward safe real-world learning of dynamic thermal management in data center.Expert Systems with Applications, 214:119146, 2023

work page 2023

[23] [23]

Greendatacentercoolingcontrolviaphysics-guidedsafereinforcement learning

RuihangWang,ZhiweiCao,XinZhou,YonggangWen,andRuiTan. Greendatacentercoolingcontrolviaphysics-guidedsafereinforcement learning. ACM Transactions on Cyber-Physical Systems, 8(2):1–26, 2024

work page 2024

[24] [24]

Towardmodel-assistedsafereinforcementlearningfordatacentercoolingcontrol: A lyapunov-based approach

ZhiweiCao,RuihangWang,XinZhou,andYonggangWen. Towardmodel-assistedsafereinforcementlearningfordatacentercoolingcontrol: A lyapunov-based approach. InProceedings of the 14th ACM International Conference on Future Energy Systems, pages 333–346, 2023

work page 2023

[25] [25]

Uncertainty-aware online learning of dynamic thermal control in data center with imperfect pretrained models.Expert Systems with Applications, 249:123767, 2024

Qingang Zhang, Chin-Boon Chng, Chee-Kong Chui, and Poh-Seng Lee. Uncertainty-aware online learning of dynamic thermal control in data center with imperfect pretrained models.Expert Systems with Applications, 249:123767, 2024

work page 2024

[26] [26]

Data center cooling using model-predictive control

Nevena Lazic, Craig Boutilier, Tyler Lu, Eehern Wong, Binz Roy, MK Ryu, and Greg Imwalle. Data center cooling using model-predictive control. Advances in Neural Information Processing Systems, 31, 2018

work page 2018

[27] [27]

Large-scale data center cooling control via sample-efficient reinforcement learning

Ni Mu, Xiao Hu, Qing-Shan Jia, Xu Zhu, and Xiao He. Large-scale data center cooling control via sample-efficient reinforcement learning. In 2024 IEEE 20th International Conference on Automation Science and Engineering (CASE), pages 2780–2785. IEEE, 2024

work page 2024

[28] [28]

Data center cooling system optimization using offline reinforcement learning.arXiv preprint arXiv:2501.15085, 2025

Xianyuan Zhan, Xiangyu Zhu, Peng Cheng, Xiao Hu, Ziteng He, Hanfei Geng, Jichao Leng, Huiwen Zheng, Chenhui Liu, Tianshun Hong, et al. Data center cooling system optimization using offline reinforcement learning.arXiv preprint arXiv:2501.15085, 2025

work page arXiv 2025

[29] [29]

A survey on offline reinforcement learning: Taxonomy, review, and open problems.IEEE Transactions on Neural Networks and Learning Systems, 2023

Rafael Figueiredo Prudencio, Marcos ROA Maximo, and Esther Luna Colombini. A survey on offline reinforcement learning: Taxonomy, review, and open problems.IEEE Transactions on Neural Networks and Learning Systems, 2023

work page 2023

[30] [30]

The National Academies Press, 2023

National Academies of Sciences, Engineering, and Medicine.Foundational Research Gaps and Future Directions for Digital Twins. The National Academies Press, 2023

work page 2023

[31] [31]

Digital twin modeling.Journal of Manufacturing Systems, 64:372–389, 2022

Fei Tao, Bin Xiao, Qinglin Qi, Jiangfeng Cheng, and Ping Ji. Digital twin modeling.Journal of Manufacturing Systems, 64:372–389, 2022

work page 2022

[32] [32]

Digital transformation: Lights and shadows

Paolo Faraboschi, Eitan Frachtenberg, Phil Laplante, Dejan Milojicic, and Roberto Saracco. Digital transformation: Lights and shadows. Computer, 56(4):123–130, 2023

work page 2023

[33] [33]

Caper:Dual-levelphysics-data fusion with modular metamodels for reliable generalization in predictive digital twins.Applied Energy, 398:126393, 2025

QingangZhang,WenjunLong,RuihangWang,ZhiweiCao,ZhaoyangWang,YuejunYan,andYonggangWen. Caper:Dual-levelphysics-data fusion with modular metamodels for reliable generalization in predictive digital twins.Applied Energy, 398:126393, 2025

work page 2025

[34] [34]

Reinforcement twinning: From digital twins to model-based reinforcement learning.Journal of Computational Science, 82:102421, 2024

Lorenzo Schena, Pedro A Marques, Romain Poletti, Samuel Ahizi, Jan Van den Berghe, and Miguel A Mendez. Reinforcement twinning: From digital twins to model-based reinforcement learning.Journal of Computational Science, 82:102421, 2024

work page 2024

[35] [35]

Digital twin-enhanced deep reinforcement learning for resource management in networks slicing.IEEE Transactions on Communications, 2024

Zhengming Zhang, Yongming Huang, Cheng Zhang, Qingbi Zheng, Luxi Yang, and Xiaohu You. Digital twin-enhanced deep reinforcement learning for resource management in networks slicing.IEEE Transactions on Communications, 2024

work page 2024

[36] [36]

Toward enhanced reinforcement learning-based resource management via digital twin: Opportunities, applications, and challenges.IEEE Network, 2024

Nan Cheng, Xiucheng Wang, Zan Li, Zhisheng Yin, Tom Luan, and Xuemin Sherman Shen. Toward enhanced reinforcement learning-based resource management via digital twin: Opportunities, applications, and challenges.IEEE Network, 2024. 25

work page 2024

[37] [37]

Kaishu Xia, Christopher Sacco, Max Kirkpatrick, Clint Saidy, Lam Nguyen, Anil Kircaliali, and Ramy Harik. A digital twin to train deep reinforcement learning agent for smart manufacturing plants: Environment, interfaces and intelligence.Journal of Manufacturing Systems, 58:210–230, 2021

work page 2021

[38] [38]

Digital twin application for reinforcement learning based optimal scheduling and reliability management enhancement of systems.Solar Energy, 252:29–38, 2023

Jun Zhou, Mei Yang, Yong Zhan, and Li Xu. Digital twin application for reinforcement learning based optimal scheduling and reliability management enhancement of systems.Solar Energy, 252:29–38, 2023

work page 2023

[39] [39]

Deep reinforcement learning-based energy management system enhancement using digital twin for electric vehicles.Energy, 312:133384, 2024

Yiming Ye, Bin Xu, Hanchen Wang, Jiangfeng Zhang, Benjamin Lawler, and Beshah Ayalew. Deep reinforcement learning-based energy management system enhancement using digital twin for electric vehicles.Energy, 312:133384, 2024

work page 2024

[40] [40]

Digital twins for data centers.Computer, 57(10):151–158, 2024

Jyotika Athavale, Cullen Bash, Wesley Brewer, Matthias Maiterth, Dejan Milojicic, Harry Petty, and Soumyendu Sarkar. Digital twins for data centers.Computer, 57(10):151–158, 2024

work page 2024

[41] [41]

Onthesamplecomplexityofreinforcementlearning

ShamMachandranathKakade. Onthesamplecomplexityofreinforcementlearning . UniversityofLondon,UniversityCollegeLondon(United Kingdom), 2003

work page 2003

[42] [42]

Model-basedrlincontextualdecisionprocesses:Pacbounds and exponential improvements over model-free approaches

WenSun,NanJiang,AkshayKrishnamurthy,AlekhAgarwal,andJohnLangford. Model-basedrlincontextualdecisionprocesses:Pacbounds and exponential improvements over model-free approaches. InConference on learning theory, pages 2898–2933. PMLR, 2019

work page 2019

[43] [43]

Mopo: Model-based offline policy optimization.Advances in Neural Information Processing Systems, 33:14129–14142, 2020

Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Y Zou, Sergey Levine, Chelsea Finn, and Tengyu Ma. Mopo: Model-based offline policy optimization.Advances in Neural Information Processing Systems, 33:14129–14142, 2020

work page 2020

[44] [44]

Morel: Model-based offline reinforcement learning

Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, and Thorsten Joachims. Morel: Model-based offline reinforcement learning. Advances in neural information processing systems, 33:21810–21823, 2020

work page 2020

[45] [45]

Red dot analytics help data centres be cool

IMDA. Red dot analytics help data centres be cool. https://www.imda.gov.sg/resources/blog/blog-articles/2025/02/ red-dot-analytics-help-data-centres-be-cool . Accessed: May 14, 2025

work page 2025

[46] [46]

Energyplus: creating a new-generation building energy simulation program.Energy and buildings, 33(4):319–331, 2001

Drury B Crawley, Linda K Lawrie, Frederick C Winkelmann, Walter F Buhl, Y Joe Huang, Curtis O Pedersen, Richard K Strand, Richard J Liesen, Daniel E Fisher, Michael J Witte, et al. Energyplus: creating a new-generation building energy simulation program.Energy and buildings, 33(4):319–331, 2001

work page 2001

[47] [47]

Tianshou: A highly modularized deep reinforcement learning library.Journal of Machine Learning Research, 23(267):1–6, 2022

Jiayi Weng, Huayu Chen, Dong Yan, Kaichao You, Alexis Duburcq, Minghao Zhang, Yi Su, Hang Su, and Jun Zhu. Tianshou: A highly modularized deep reinforcement learning library.Journal of Machine Learning Research, 23(267):1–6, 2022. 26

work page 2022