Please use this identifier to cite or link to this item:
Title: Efficient singapore room rental search with data mining
Authors: Koh, Fabian
Keywords: DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
Issue Date: 2014
Abstract: The author wants to answer the question: how Data Mining techniques can be utilised to improve the efficiency of room rental search? With this, the first objective of this study is to develop a clustering method in the context of Singapore Room Rental listing retrieval, called Relevance-based Clustering. The proposed clustering method adds geographical relationship among the textual relevance search results. The second objective is to develop a Rental Property Search Engine to demonstrate the result of applying Relevance-based Clustering to achieve efficient room rental search in Singapore. The essential part of this process is the ability to extract geographical information from webpages. The author narrows the scope of the study down to Singapore property websites, whereby the geographical information can be easily extracted from the map latitude and longitude information available in all of the major property websites in Singapore. The rental property search engine is custom-coded by the author using Python 2.7 programming language and is being deployed on Google App Engine (GAE) cloud hosting platform. The search engine consists of a property content web crawler that crawls rental section of Singapore property websites, and downloads content from each URL into the Listing table. Next, Data Pre-processing process is used to cleanse and tokenize the downloaded content to create and update into Inverted Index. Processed URLs are recorded into the Done-Process table to prevent duplicate effort. Upon receiving user query input, the query text will be cleansed and tokenized by Query Parsing process before passing over to Scoring and Ranking process to convert into vector form for Cosine Similarity score computation. The scoring will be ranked and the top K number of listings will form the Top K List. The Top K List is used to compute the URL Spherical Distance Matrix and clustering is performed on the URL Spherical Distance Matrix to discover geographical relationship among the top K textual relevance listings. The clustered result is converted into HTML format and returned to the user. The Information Retrieval (IR) effectiveness of the search engine based on K value = 100 has a low average F-Measure of 26%. Whereas, IR effectiveness based on K value = 20 has a better average F-Measure of 78%.
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:WKWSCI Theses

Files in This Item:
File Description SizeFormat 
  Restricted Access
1.3 MBAdobe PDFView/Open

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.