A study of geographical neighborhood influence to business rating prediction
Date of Issue2014
School of Computer Engineering
Rating prediction is to predict the preference rating of a user to an item that she has not rated before, and it is one of the most popular and fundamental problems in recommendation systems. Using the business review data from Yelp, we study the problem of business rating prediction in this thesis. A business here can be a restaurant, a shopping mall, a nightlife club or other kind of businesses. Different from most other types of items that have been studied in various recommender sys- tems (e.g., movie, song, book), a business in Yelp physically exists at a geographical location, and most businesses have geographical neighbors within walking distance. When a user visits a business, there is a good chance that she walks by its neighbors. Through data analysis on Yelp, we find that there exists weak positive correlation between a business’s ratings and its neighbors’ ratings, and the positive correlation in ratings is independent of the categories of the businesses and/or their neighbors. Based on this observation, we assume that a user’s rating to a given business is determined by both the intrinsic characteristics of the business and the extrinsic characteristics of its geographical neighbors. Using the widely adopted latent factor model for rating prediction, in our proposed solution, we use two kinds of latent factors to model a business: one for its intrinsic characteristics and the other for its extrinsic characteristics. More specifically, the former encodes the intrinsic charac- teristics of a business (e.g., taste of food and quality of service) observable by users who have interacted with the business. The latter encodes the extrinsic characteris- tics of a business (e.g., hygiene standard) in influencing its geographical neighbors observable by the “pass-by” visitors. We conduct extensive experiments on the Yelp dataset to evaluate the proposed models, and compare the models with state-of-the-art baseline methods. We show that by incorporating geographical neighborhood influences, much lower prediction error is achieved than the baseline models including Biased MF, SVD++, and Social MF. The prediction error is further reduced by incorporating influences from business category and review content.
DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval