Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/157736
Title: Toward physics-guided safe deep reinforcement learning for green data center cooling control
Authors: Wang, Ruihang
Zhang, Xinyi
Zhou, Xin
Wen, Yonggang
Tan, Rui
Keywords: Engineering::Computer science and engineering::Computer applications::Physical sciences and engineering
Issue Date: 2022
Source: Wang, R., Zhang, X., Zhou, X., Wen, Y. & Tan, R. (2022). Toward physics-guided safe deep reinforcement learning for green data center cooling control. 2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS), 159-169. https://dx.doi.org/10.1109/ICCPS54341.2022.00021
Project: NRF2020NRF-CG001-027 
metadata.dc.contributor.conference: 2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS)
Abstract: Deep reinforcement learning (DRL) has shown good performance in tackling Markov decision process (MDP) problems. As DRL optimizes a long-term reward, it is a promising approach to improving the energy efficiency of data center cooling. However, enforcement of thermal safety constraints during DRL's state exploration is a main challenge. The widely adopted reward shaping approach adds negative reward when the exploratory action results in unsafety. Thus, it needs to experience sufficient unsafe states before it learns how to prevent unsafety. In this paper, we propose a safety-aware DRL framework for single-hall data center cooling control. It applies offline imitation learning and online post-hoc rectification to holistically prevent thermal unsafety during online DRL. In particular, the post-hoc rectification searches for the minimum modification to the DRL-recommended action such that the rectified action will not result in unsafety. The rectification is designed based on a thermal state transition model that is fitted using historical safe operation traces and able to extrapolate the transitions to unsafe states explored by DRL. Extensive evaluation for chilled water and direct expansion cooled data centers in two climate conditions shows that our approach saves 22.7% to 26.6\% total data center power compared with conventional control, reduces safety violations by 94.5% to 99\% compared with reward shaping.
URI: https://hdl.handle.net/10356/157736
DOI: 10.1109/ICCPS54341.2022.00021
Schools: School of Computer Science and Engineering 
Research Centres: Data Management and Analytics Lab
Rights: © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/ICCPS54341.2022.00021.
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Conference Papers

SCOPUSTM   
Citations 50

3
Updated on Sep 25, 2023

Web of ScienceTM
Citations 50

1
Updated on Sep 22, 2023

Page view(s)

82
Updated on Sep 30, 2023

Download(s) 50

109
Updated on Sep 30, 2023

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.