Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/77003
Title: Development of a distributed crawler to collect online game playing traces
Authors: Zhang, Yuance
Keywords: DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems
Issue Date: 2019
Abstract: Over the years, the form of computer games has been evolving. From having to play alone or playing face to face, players from across the world nowadays can easily play in the same game through the internet. As the players involved in a game become more and more, an accompanying problem has emerged. Due to a developing multi-player game's nature, the balance of the game, tuned by the various game parameters, is necessary to be updated from time to time. That is where the demand for the data collection needs comes in. In order to present a better platform for the gamers, data generated from these online games have become an intriguing data source to be analysed. One of the methods to extract such data is using a web crawler. However, due to the enormous amount of data stored in the cloud database, a normal web crawler would have one or more than one shortage in areas such as scalability, portability performance, monitoring, fault tolerance, etc. Therefore, this FYP project focused on the development of a distributed crawler for collection of online game playing traces, for the according data research and analysis jobs to be carried out. A general-purpose API high-performance crawler is a good solution. In this project, a distributed system with components including Python Scrapy, MongoDB Cluster, Redis DB and Docker was designed and implemented from scratch. The innovation of overriding Scrapy Framework from a single server crawler to a distributed crawler by Redis Server as a message shared queue. The master-slave method, data clustering, and docker swarm are all in the tech stack of this project. At last, system tests including operation evaluation, fault tolerance and load test were carried out to verify the system. To have further exploration of this project, a general-purpose distributed API crawler framework is a goal, letting the user define their own crawling logic but keeping all features of this system including highly portable, automatic failover, load balancing, high availability and scalability.
URI: http://hdl.handle.net/10356/77003
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
Amended FYP Report.pdf
  Restricted Access
FYP Final Report3.26 MBAdobe PDFView/Open

Page view(s)

103
Updated on Jun 22, 2021

Download(s) 50

23
Updated on Jun 22, 2021

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.