Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/184249
Title: Evaluating job scheduling algorithms in cloud: from heuristics and stratus to domain-enhanced deep learning
Authors: Xu, Yinfeng
Keywords: Computer and Information Science
Issue Date: 2025
Publisher: Nanyang Technological University
Source: Xu, Y. (2025). Evaluating job scheduling algorithms in cloud: from heuristics and stratus to domain-enhanced deep learning. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/184249
Project: CCDS24-0121
Abstract: This research presents a comprehensive investigation of cloud job scheduling algorithms, transitioning from conventional heuristic approaches to advanced machine learning (ML) techniques. The objective is to optimize resource allocation in heterogeneous cloud environments, where scheduling efficiency critically influences operational costs, resource utilization, and service quality. The study adopts a progressive evaluation framework, beginning with baseline heuristic methods (First-Fit and Best-Fit) and the Stratus scheduler, then advancing to novel implementations incorporating deep neural networks (DNN) and deep reinforcement learning (DRL). Experiments are conducted using the Google Cluster Trace dataset, enabling realistic workload simulations and multi-dimensional performance benchmarking. Results indicate that the Stratus scheduler, which leverages elasticity-aware task placement, runtime binning, and cost-aware instance scaling, reduces costs by 15-30\% compared to traditional heuristics, while increasing memory utilization from 71.46\% to 85.73\%. Further gains are realized by integrating domain knowledge into the learning-based schedulers: DRL methods augmented with a Cost-Capacity Graph (CCG) reduce operational expenses by approximately 50\% relative to Stratus, and a DNN approach with CCG achieves the highest CPU utilization (72\%) alongside a 16\% cost saving over Stratus. These findings underscore the value of hybrid approaches that combine adaptive learning and explicit domain heuristics consistently outperforming both purely heuristic and purely learned policies. A principal contribution of this work is the development and assessment of domain-knowledge-enhanced learning frameworks, notably the CCG and a Runtime Category Manager, which refine instance selection decisions through cost-efficiency relationships and runtime similarity groupings. The study acknowledges certain limitations, including assumptions inherent in simulation-based evaluations, the substantial training data required for ML models, and non-trivial inference overheads. Future research directions include investigating transformer-based architectures, multi-agent reinforcement learning, multi-objective optimization for energy efficiency and fairness, and extending the framework to serverless and edge-cloud workloads. Overall, this research offers valuable insights into cloud resource management, demonstrating that the integration of domain-specific knowledge and machine learning can significantly enhance resource utilization and cost efficiency in dynamic cloud environments.
URI: https://hdl.handle.net/10356/184249
Schools: College of Computing and Data Science 
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:CCDS Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
Final_Report_Xu_Yinfeng.pdf
  Restricted Access
3.29 MBAdobe PDFView/Open

Page view(s)

82
Updated on May 6, 2025

Download(s)

4
Updated on May 6, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.