Please use this identifier to cite or link to this item:
|Title:||Lexical knowledge-based machine learning method for sentiment analysis||Authors:||Heng, Lai Xiang||Keywords:||DRNTU::Engineering::Computer science and engineering::Information systems||Issue Date:||2015||Abstract:||Before doing any sentiment analysis or classifications, one would need labelled reviews (either a positive or negative sentiment) to do further data mining or natural language processing. Labelling of reviews are done manually and are usually time-consuming and demanding. In this paper, we proposed a new learning algorithm, which is to combine supervised learning with the pre-compiled opinion lexicons. Using this algorithm, manpower and time needed are greatly reduced as it will not require manually labelling of reviews. For this project, customers’ reviews on restaurants will be used from the rich pool of Yelp dataset. There are a total of five steps to the new algorithm: 1) Building two pseudo positive and negative documents. 2) Computation on the pairwise document similarity between the review documents and the positive and negative documents using either the Cosine Similarity or Euclidean Distance approach. 3) Labelling the reviews to either a positive or negative sentiment based on the similarity results. 4) Rank the reviews. 5) Selecting top 2,000 reviews, each 1,000 from the positive and negative labelled documents for sentiment classification model building. In this experiment, we looked into both Naïve Bayes and Support Vector Machine (SVM) classifiers. Three different feature extraction methods namely bag of words model, bag of words model with stopwords removed and using of significant bigrams are used for training the classifier. Out of the three, the use of significant bigrams performed the best by achieving 67% in accuracy whereas the bag of words model performed the worst for Naïve Bayes classifier. On the other hand, SVM classifier performs well in both bag of words model and bag of words model with stopwords removed, achieving an accuracy of about 99%. However, this may indicate an overfitting due to the large sparse of features. Nevertheless, this experiment shows that the automation system of labelling the reviews is possible and it is one step closer in achieving to the goal.||URI:||http://hdl.handle.net/10356/62824||Rights:||Nanyang Technological University||Fulltext Permission:||restricted||Fulltext Availability:||With Fulltext|
|Appears in Collections:||SCSE Student Reports (FYP/IA/PA/PI)|
Updated on Apr 15, 2021
Updated on Apr 15, 2021
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.