Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/178253
Full metadata record
DC FieldValueLanguage
dc.contributor.authorYe, Zhishengen_US
dc.contributor.authorGao, Weien_US
dc.contributor.authorHu, Qinghaoen_US
dc.contributor.authorSun, Pengen_US
dc.contributor.authorWang, Xiaolinen_US
dc.contributor.authorLuo, Yingweien_US
dc.contributor.authorZhang, Tianweien_US
dc.contributor.authorWen, Yonggangen_US
dc.date.accessioned2024-06-10T00:51:10Z-
dc.date.available2024-06-10T00:51:10Z-
dc.date.issued2024-
dc.identifier.citationYe, Z., Gao, W., Hu, Q., Sun, P., Wang, X., Luo, Y., Zhang, T. & Wen, Y. (2024). Deep learning workload scheduling in GPU datacenters: a survey. ACM Computing Surveys, 56(6), 146-. https://dx.doi.org/10.1145/3638757en_US
dc.identifier.issn0360-0300en_US
dc.identifier.urihttps://hdl.handle.net/10356/178253-
dc.description.abstractDeep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The development of a DL model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU accelerators have been collectively constructed into a GPU datacenter. An efficient scheduler design for a GPU datacenter is crucially important to reduce operational cost and improve resource utilization. However, traditional approaches designed for big data or high-performance computing workloads can not support DL workloads to fully utilize the GPU resources. Recently, many schedulers are proposed to tailor for DL workloads in GPU datacenters. This article surveys existing research efforts for both training and inference workloads. We primarily present how existing schedulers facilitate the respective workloads from the scheduling objectives and resource utilization manner. Finally, we discuss several promising future research directions including emerging DL workloads, advanced scheduling decision making, and underlying hardware resources. A more detailed summary of the surveyed paper and code links can be found at our project website: https://github.com/S-Lab-System-Group/Awesome-DL-Scheduling-Papersen_US
dc.description.sponsorshipAgency for Science, Technology and Research (A*STAR)en_US
dc.language.isoenen_US
dc.relationIAF-ICPen_US
dc.relation.ispartofACM Computing Surveysen_US
dc.rights© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM. All rights reserved.en_US
dc.subjectComputer and Information Scienceen_US
dc.titleDeep learning workload scheduling in GPU datacenters: a surveyen_US
dc.typeJournal Articleen
dc.contributor.schoolCollege of Computing and Data Scienceen_US
dc.contributor.researchS-Laben_US
dc.identifier.doi10.1145/3638757-
dc.identifier.scopus2-s2.0-85188808018-
dc.identifier.issue6en_US
dc.identifier.volume56en_US
dc.identifier.spage146en_US
dc.subject.keywordsDeep learning systemsen_US
dc.subject.keywordsDatacenter schedulingen_US
dc.description.acknowledgementThe research is supported under the National Key R&D Program of China under Grant No. 2022YFB4500701 and the RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in kind contributions from the industry partner(s). It is also supported by the National Science Foundation of China (Nos. 62032001, 62032008, 62372011).en_US
item.grantfulltextnone-
item.fulltextNo Fulltext-
Appears in Collections:CCDS Journal Articles

SCOPUSTM   
Citations 50

4
Updated on Dec 5, 2024

Page view(s)

193
Updated on Dec 9, 2024

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.