Please use this identifier to cite or link to this item:
Title: High performance data processing systems in cloud
Authors: Tan, Xuan Min
Keywords: DRNTU::Engineering::Computer science and engineering::Data::Data structures
DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems
Issue Date: 2015
Abstract: Whenever the term “Big Data” was mentioned, it was often closely associated with technologies like Apache Hadoop and the “NoSQL” class of databases such as MongoDB and Neo4j. It was possible to stream real-time data analytics using these technologies with ease and these analytics usually accomplished in 20 minutes or less. Over the past recent years, there were many such open source technologies emerged in the market but how many of them were really efficient and suitable for processing iterative data like graph. Some of the graph processing systems such as GraphLab and Apache Giraph were inspired by Bulk Synchronous Parallel (BSP) model while others like Hadoop follows the Google’s MapReduce framework. In this project, both BSP model and MapReduce framework were intensively studied using two prominent open source projects, Hadoop and Giraph. A series of large graph processing were executed on both systems and their results were analyzed. The experiments show that Giraph is more surpassing in processing iterative data.
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
  Restricted Access
Final Year Report1.23 MBAdobe PDFView/Open

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.