Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/77003
Title: | Development of a distributed crawler to collect online game playing traces | Authors: | Zhang, Yuance | Keywords: | DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems |
Issue Date: | 2019 | Abstract: | Over the years, the form of computer games has been evolving. From having to play alone or playing face to face, players from across the world nowadays can easily play in the same game through the internet. As the players involved in a game become more and more, an accompanying problem has emerged. Due to a developing multi-player game's nature, the balance of the game, tuned by the various game parameters, is necessary to be updated from time to time. That is where the demand for the data collection needs comes in. In order to present a better platform for the gamers, data generated from these online games have become an intriguing data source to be analysed. One of the methods to extract such data is using a web crawler. However, due to the enormous amount of data stored in the cloud database, a normal web crawler would have one or more than one shortage in areas such as scalability, portability performance, monitoring, fault tolerance, etc. Therefore, this FYP project focused on the development of a distributed crawler for collection of online game playing traces, for the according data research and analysis jobs to be carried out. A general-purpose API high-performance crawler is a good solution. In this project, a distributed system with components including Python Scrapy, MongoDB Cluster, Redis DB and Docker was designed and implemented from scratch. The innovation of overriding Scrapy Framework from a single server crawler to a distributed crawler by Redis Server as a message shared queue. The master-slave method, data clustering, and docker swarm are all in the tech stack of this project. At last, system tests including operation evaluation, fault tolerance and load test were carried out to verify the system. To have further exploration of this project, a general-purpose distributed API crawler framework is a goal, letting the user define their own crawling logic but keeping all features of this system including highly portable, automatic failover, load balancing, high availability and scalability. | URI: | http://hdl.handle.net/10356/77003 | Schools: | School of Computer Science and Engineering | Rights: | Nanyang Technological University | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | SCSE Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Amended FYP Report.pdf Restricted Access | FYP Final Report | 3.26 MB | Adobe PDF | View/Open |
Page view(s)
251
Updated on Mar 27, 2024
Download(s) 50
26
Updated on Mar 27, 2024
Google ScholarTM
Check
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.