Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/46718
Title: | MapReduce and its applications in heterogeneous environment | Authors: | Tan, Yu Shyang | Keywords: | DRNTU::Engineering::Computer science and engineering::Computer systems organization::Computer system implementation | Issue Date: | 2011 | Source: | Tan, Y. S. (2011). MapReduce and its applications in heterogeneous environment. Master’s thesis, Nanyang Technological University, Singapore. | Abstract: | As the data growth rate outpace that of the processing capabilities of CPUs, reaching Petascale, technologies and tools that can effectively process such huge datasets become increasingly important. Two major approaches are currently adopted to address this issue: use of specialized hardware accelerators such as GPGPU and developing new data intensive processing tools. In the case of the former, the trend shows an increasing number of GPGPU clusters being used in high performance computing. In the latter, Google introduced a framework coupled programming model called MapReduce for massive distributed parallel processing. In this thesis, I investigated the possibility of leveraging on these two technologies, so as to create an environment where users can harness the potentials of hardware accelerators in processing huge datasets, in a distributed and parallel manner. Hadoop, an open source implementation of MapReduce is first analysed. This initial study looks into the performance of Hadoop when processing small datasets, something which Hadoop is not designed for. The study uses several metrics such as the input file size, the size of dataset and locality of data and looked into some of the parameters that can affect performance of the MapReduce flow with respect to the dataset. The study provided an insight to MapReduce and how data can be decomposed into sub data partitions so that the data can be managed by the accelerators while having minimal negative impact on the performance. | URI: | https://hdl.handle.net/10356/46718 | DOI: | 10.32657/10356/46718 | Schools: | School of Computer Engineering | Research Centres: | Parallel and Distributed Computing Centre | Fulltext Permission: | open | Fulltext Availability: | With Fulltext |
Appears in Collections: | SCSE Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Master thesis hardbound.docx | 1.5 MB | Microsoft Word | View/Open |
Page view(s) 50
584
Updated on Mar 16, 2025
Download(s) 10
387
Updated on Mar 16, 2025
Google ScholarTM
Check
Altmetric
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.