Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/183943
Title: Parallel processing framework for analyzing taxi mobility data
Authors: Ho, Guo Liang Ken
Keywords: Computer and Information Science
Issue Date: 2025
Publisher: Nanyang Technological University
Source: Ho, G. L. K. (2025). Parallel processing framework for analyzing taxi mobility data. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/183943
Project: CCDS24-0722
Abstract: The unprecedented growth of data generation has created a demand and need for efficient data processing frameworks that can handle large volumes of raw, structured and unstructured data. These data then undergo transformation and preprocessing to turn them into a suitable format that is easy to perform data analysis and machine learning. This report examines the limitations of using pandas with multiprocessing for large-scale data processing and explores parallel processing frameworks such as Hadoop and Apache Spark. A mobility data set spanning 253GB, consisting of taxi trips data between January to December 2022 in Singapore was provided by Land Transport Authority (LTA) and used to study the performance of different data processing frameworks. A Spark cluster was set up to process the raw mobility data, utilizing profiling techniques to identify and resolve logical and hardware bottlenecks, ensuring optimal utilization of the compute resources Spark ran on. A comparison between performance revealed that Apache Spark consistently outperforms pandas with multiprocessing when processing large mobility datasets across multiple months. Subsequently, the processed mobility data was structured using different data models, namely One Big Table and Fact-Dimension and their query performance were evaluated. Eventually, these data models were hosted on Google BigQuery, where curated multi-layered data models were implemented to optimize data retrieval, improve accessibility and support different analytics and machine learning applications.
URI: https://hdl.handle.net/10356/183943
Schools: College of Computing and Data Science 
Research Centres: Singapore-ETH Centre
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:CCDS Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
CCDS24-0722-FinalSubmission.pdf
  Restricted Access
1.67 MBAdobe PDFView/Open

Page view(s)

92
Updated on May 7, 2025

Download(s)

1
Updated on May 7, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.