Please use this identifier to cite or link to this item:
Title: Online shopping sites crawler
Authors: Leong, Letitia Justina Si En
Keywords: DRNTU::Engineering
Issue Date: 2016
Abstract: The advancement of technology brought about a wide range of benefits for society but also inevitably contributed to fast-paced lifestyles. Increasingly, people now prefer to carry out their shopping activities online and at the same time, look for innovative new ways to obtain the best bargain. Therefore, the aim of this project is to design a way to collect merchants’ data from multiple shopping sites and display them into a platform that enables shoppers to perform product comparison. Firstly, a shopping site crawler was developed using Scrapy framework to initiate crawling and scraping from different shopping sites. As every website is structured differently, the scraping process gets a little more complicated. In order for the web crawler to extract specific data from a website, it requires their XPaths to be specified. That is why, a Tkinter program was created to alleviate this problem of code rework while providing convenience in configuring new and existing web spiders. Secondly, collected merchants’ data will undergo the process of text mining whereby preprocessing, clustering and topic modelling take place. Clustering and topic modelling were used to detect interesting patterns for grouping similar products together and to discover attractive topics. These results will be presented to the shoppers in a way that allow them to search for their desired products easily and efficiently. Thirdly, a frontend web application was established to display recommended products, appealing product themes as well as all merchants that provides the same or one kind of products. In addition, filters were also implemented to facilitate users’ preferences search. Lastly, a backend web application was also set up to manage any product related data within the database. By the end of the project, all objectives were successfully accomplished. There were some unresolved limitations found within the developed system due to time constraint and limited manpower. However, these limitations along with the suggestions for further enhancement can be looked into and brushed up in the future.
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
  Restricted Access
3.8 MBAdobe PDFView/Open

Page view(s)

Updated on May 12, 2021

Download(s) 50

Updated on May 12, 2021

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.