Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/156566
Title: | Zeus: interpretable ML-based job scheduling in GPU datacentres | Authors: | Amrita, Ravishankar | Keywords: | Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Simulation and modeling |
Issue Date: | 2022 | Publisher: | Nanyang Technological University | Source: | Amrita, R. (2022). Zeus: interpretable ML-based job scheduling in GPU datacentres. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/156566 | Abstract: | Hardware accelerators such as GPUs are essential for the development of Deep Learning (DL) models - as their training process is compute-intensive. A growing number of organisations have employed expensive multi-tenant GPU clusters to run distributed DL training jobs. Efficient job schedulers are required to maximise GPU cluster utilisation and minimise job completion time and operation cost. In this study, we develop Zeus, an interpretable ML-based, non-intrusive job scheduler that ensures resource fairness, thus providing a better user experience. Zeus accommodates the concern of unreliability of black-box Machine Learning (ML) models by being 100% interpretable, thus avoiding any related deployment concerns in practical scenarios. The interpretability of our model helps reveal interesting dependencies between the training job’s details and the expected job duration and associated trends. Further, our scheduler does not require users to make any modifications to the source code or the underlying DL framework, thereby being completely non-intrusive in nature and consequently, more practical. Finally, we use a GPU datacenter simulator to analyse the efficiency of our scheduler in terms of two metrics: (1) Average Job Completion Time and (2) Average Queueing time. | URI: | https://hdl.handle.net/10356/156566 | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | SCSE Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Ravishankar_Amrita_FYP_Report.pdf Restricted Access | Final Year Project Report | 1.66 MB | Adobe PDF | View/Open |
Page view(s)
29
Updated on May 20, 2022
Download(s)
10
Updated on May 20, 2022
Google ScholarTM
Check
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.