Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/64312
Title: Optimization techniques on job scheduling and resource allocation for MapReduce system
Authors: Tang, Shanjiang
Keywords: DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems
DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems
Issue Date: 2015
Source: Tang, S. (2015). Optimization techniques on job scheduling and resource allocation for MapReduce system. Doctoral thesis, Nanyang Technological University, Singapore.
Abstract: MapReduce has become a popular high performance computing paradigm for large-scale data processing. Hadoop, an open source implementation of MapReduce, has been widely deployed in large clusters containing thousands of machines by companies such as Yahoo! and Facebook to support batch processing for large jobs submitted from multiple users (i.e., MapReduce workloads). However, there are certainly a lot of room to improve the performance and fairness of Hadoop. In this thesis, we focus on optimization techniques on job scheduling and resource allocation to improve the performance and fairness of Hadoop system. First, we focus on the performance optimization for MapReduce workloads under FIFO scheduler without changing the source code of Hadoop by using job re-ordering approach. We consider two different kinds of production workloads, i.e., offline MapReduce workloads and online MapReduce workloads. The performance metrics used are makespan and total completion time. For offline workloads, we propose several job ordering algorithms. Based on the offline approaches, we further propose a prototype system called MROrder to optimize the performance for online MapReduce workloads. The experimental results show that our job ordering methods can significantly improve the performance of Hadoop for both offline and online workloads. Second, instead of keeping the default static MapReduce resource allocation model where the number of map slots and reduce slots are pre-configured and not fungible, we relax the model constrain to allow slots to be reallocated to either map or reduce tasks depending on their needs through modifying the source code of Hadoop. A dynamic fair resource allocation and scheduling system called DynamicMR is proposed and implemented in Hadoop. It can improve the performance of MapReduce workloads while ensuring the fairness without any information about MapReduce jobs. The experimental results validate the effectiveness of our DynamicMR. Moreover, it can also be applied for FIFO scheduler. Finally, besides the optimization for Hadoop MRv1, we also optimize the fair resource allocation for YARN (i.e., Hadoop MRv2). Specifically, we consider pay-as-you-use computing (e.g., cloud computing) and find that the traditional fair policy is not suitable for such computing system. To address this, we propose a Long-Term Resource Fairness (LTRF) and implement it in YARN by developing LTYARN, a long-term YARN fair scheduler. Our experimental results show that it leads to better resource fairness than existing fair scheduler. Thus, in this thesis, we have addressed some of the optimization problems on job scheduling and resource allocation for MapReduce system under different scenarios. We have proposed new algorithms and frameworks to improve the performance and fairness for Hadoop system. The proposed algorithms and frameworks will be options for users who want to optimize the performance of their MapReduce workloads or ensure fairness, according to their needs and conditions.
URI: https://hdl.handle.net/10356/64312
DOI: 10.32657/10356/64312
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
Thesis_ShanjiangTang.pdfMain article13.39 MBAdobe PDFThumbnail
View/Open

Page view(s) 50

314
Updated on May 8, 2021

Download(s) 20

159
Updated on May 8, 2021

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.