Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/184249
Title: | Evaluating job scheduling algorithms in cloud: from heuristics and stratus to domain-enhanced deep learning | Authors: | Xu, Yinfeng | Keywords: | Computer and Information Science | Issue Date: | 2025 | Publisher: | Nanyang Technological University | Source: | Xu, Y. (2025). Evaluating job scheduling algorithms in cloud: from heuristics and stratus to domain-enhanced deep learning. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/184249 | Project: | CCDS24-0121 | Abstract: | This research presents a comprehensive investigation of cloud job scheduling algorithms, transitioning from conventional heuristic approaches to advanced machine learning (ML) techniques. The objective is to optimize resource allocation in heterogeneous cloud environments, where scheduling efficiency critically influences operational costs, resource utilization, and service quality. The study adopts a progressive evaluation framework, beginning with baseline heuristic methods (First-Fit and Best-Fit) and the Stratus scheduler, then advancing to novel implementations incorporating deep neural networks (DNN) and deep reinforcement learning (DRL). Experiments are conducted using the Google Cluster Trace dataset, enabling realistic workload simulations and multi-dimensional performance benchmarking. Results indicate that the Stratus scheduler, which leverages elasticity-aware task placement, runtime binning, and cost-aware instance scaling, reduces costs by 15-30\% compared to traditional heuristics, while increasing memory utilization from 71.46\% to 85.73\%. Further gains are realized by integrating domain knowledge into the learning-based schedulers: DRL methods augmented with a Cost-Capacity Graph (CCG) reduce operational expenses by approximately 50\% relative to Stratus, and a DNN approach with CCG achieves the highest CPU utilization (72\%) alongside a 16\% cost saving over Stratus. These findings underscore the value of hybrid approaches that combine adaptive learning and explicit domain heuristics consistently outperforming both purely heuristic and purely learned policies. A principal contribution of this work is the development and assessment of domain-knowledge-enhanced learning frameworks, notably the CCG and a Runtime Category Manager, which refine instance selection decisions through cost-efficiency relationships and runtime similarity groupings. The study acknowledges certain limitations, including assumptions inherent in simulation-based evaluations, the substantial training data required for ML models, and non-trivial inference overheads. Future research directions include investigating transformer-based architectures, multi-agent reinforcement learning, multi-objective optimization for energy efficiency and fairness, and extending the framework to serverless and edge-cloud workloads. Overall, this research offers valuable insights into cloud resource management, demonstrating that the integration of domain-specific knowledge and machine learning can significantly enhance resource utilization and cost efficiency in dynamic cloud environments. | URI: | https://hdl.handle.net/10356/184249 | Schools: | College of Computing and Data Science | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | CCDS Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Final_Report_Xu_Yinfeng.pdf Restricted Access | 3.29 MB | Adobe PDF | View/Open |
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.