Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/178253
Title: Deep learning workload scheduling in GPU datacenters: a survey
Authors: Ye, Zhisheng
Gao, Wei
Hu, Qinghao
Sun, Peng
Wang, Xiaolin
Luo, Yingwei
Zhang, Tianwei
Wen, Yonggang
Keywords: Computer and Information Science
Issue Date: 2024
Source: Ye, Z., Gao, W., Hu, Q., Sun, P., Wang, X., Luo, Y., Zhang, T. & Wen, Y. (2024). Deep learning workload scheduling in GPU datacenters: a survey. ACM Computing Surveys, 56(6), 146-. https://dx.doi.org/10.1145/3638757
Project: IAF-ICP 
Journal: ACM Computing Surveys
Abstract: Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The development of a DL model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU accelerators have been collectively constructed into a GPU datacenter. An efficient scheduler design for a GPU datacenter is crucially important to reduce operational cost and improve resource utilization. However, traditional approaches designed for big data or high-performance computing workloads can not support DL workloads to fully utilize the GPU resources. Recently, many schedulers are proposed to tailor for DL workloads in GPU datacenters. This article surveys existing research efforts for both training and inference workloads. We primarily present how existing schedulers facilitate the respective workloads from the scheduling objectives and resource utilization manner. Finally, we discuss several promising future research directions including emerging DL workloads, advanced scheduling decision making, and underlying hardware resources. A more detailed summary of the surveyed paper and code links can be found at our project website: https://github.com/S-Lab-System-Group/Awesome-DL-Scheduling-Papers
URI: https://hdl.handle.net/10356/178253
ISSN: 0360-0300
DOI: 10.1145/3638757
Schools: College of Computing and Data Science 
Research Centres: S-Lab
Rights: © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM. All rights reserved.
Fulltext Permission: none
Fulltext Availability: No Fulltext
Appears in Collections:CCDS Journal Articles

SCOPUSTM   
Citations 50

4
Updated on Dec 5, 2024

Page view(s)

193
Updated on Dec 9, 2024

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.