Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/46456
Title: Twitter archive system
Authors: Ong, Ann Aik.
Keywords: DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
Issue Date: 2011
Abstract: Twitter is a popular source of text data for mining and analysis as there is a large amount of free data available and easily accessible on Twitter. However, before data could be mined from Twitter, data has to be collected from Twitter. The purpose of this project is to design and develop a reliable data collector which will periodically collect data from selective Twitter users using a scheduler, based on the users’ pattern of tweeting and analyzes the collected data. The entire data collection and analysis process is fully automated and it is expected to be running 24/7/365. The java desktop application is developed using NetBeans IDE 6.7.1 with MySQL Server 5.0.91 as the data storage and Twitter4J as the java library to communicate with Twitter API. The testing of the data collector is spread over a period of 3 days, from 20th to 23rd September 2011. Within these 3 days of data collection, 56,961 users were captured. 20,779 of them are Singapore users while 36,182 are non Singapore users. Apart from that, 244,192 tweets were downloaded and 144,042 of follow relationships were found. The objective of this project has been met as the data collector was found to have successfully collected a large amount of data from Twitter within the 3 days of data collection. For optimal performance of the data collector, the implementation of a multithreaded scheduler is highly recommended for future improvement.
URI: http://hdl.handle.net/10356/46456
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
SCE10-0464.pdf
  Restricted Access
1.35 MBAdobe PDFView/Open

Page view(s) 50

322
checked on Oct 26, 2020

Download(s) 50

8
checked on Oct 26, 2020

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.