Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/70109
Title: Tools for analysis of large-scale networks (I) algorithms, analytics and visualization
Authors: Chua, Chee Ann
Keywords: DRNTU::Engineering::Computer science and engineering
Issue Date: 2017
Abstract: Data is a valuable asset, but only for people who have adequate skills of data mining and apply them to analyze and reveal the trends or patterns that are hidden inside the otherwise unstructured data. This project aimed to create a tool that is able to help the user to gain insights from a large-scale dataset by applying multiple data mining processes on the data and visualizing the results. Among all the social media sites, Twitter was chosen and 500 million raw tweets were used as the dataset in this project. Only some part of the information from the tweets would be extracted for analysis, specifically, geo-coordinates, timestamp, and the tweet content itself. To ensure that data was perfectly cleansed, data preprocessing had been performed to filter out those records with the missing attributes. The analysis will consist of two data mining techniques: one is cluster analysis for the geo-coordinates, and the other one is topic modeling analysis for the content of the tweets. Meanwhile, these two techniques were not only performed solely in their area but they were also integrated together to build other features like tracking system, which could reveal the user’s mobility and active places from the big data. With all these features, the developed tool was able to turn all these raw data into useful and valuable information.
URI: http://hdl.handle.net/10356/70109
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
Chua Chee Ann FYP Report.pdf
  Restricted Access
2.82 MBAdobe PDFView/Open

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.