Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/179725
Title: Sentiment analysis based on statistical machine learning and deep learning
Authors: Zhu, Wen
Keywords: Computer and Information Science
Engineering
Issue Date: 2024
Publisher: Nanyang Technological University
Source: Zhu, W. (2024). Sentiment analysis based on statistical machine learning and deep learning. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/179725
Project: ISM-DISS-03925 
Abstract: Sentiment analysis is a fundamental yet challenging task in the field of natural language processing, aiming to identify and extract subjective information from text data. It has wide applications in areas such as business intelligence and social media monitoring. This dissertation primarily explores text sentiment analysis methods, focusing on effectively capturing and classifying subjective emotions in short text data from social media. Through a systematic literature review, the study traces the evolution of sentiment analysis from rule-based methods using sentiment lexicons to modern deep learning techniques. It compares the performance of statistical machine learning methods (logistic regression and Naive Bayes) with deep learning techniques (LSTM and BERT) in text sentiment recognition tasks. This dissertation discusses key steps in text preprocessing, including data cleaning, tokenization, stopword removal, and stemming, as well as the databases chosen for each type of experiment. It explores feature extraction techniques, particularly frequency-based methods and TF-IDF. Detailed descriptions of model implementation, training, and testing processes are provided, with a focus on how different text preprocessing techniques and model parameter settings affect sentiment classification performance. Finally, experiments compare the performance of statistical machine learning methods and deep learning methods in understanding text sentiment, with a detailed analysis of the results. Our experiments validated that the TF-IDF feature extraction method has significant advantages over traditional frequency-based feature extraction methods in statistical machine learning models. The results highlight the effectiveness of TF-IDF in evaluating the importance of keywords in texts, particularly when handling large volumes of text data. Moreover, the BERT model demonstrated superior performance across all test databases, significantly surpassing the LSTM model in accuracy, especially when dealing with complex and noisy datasets. BERT's robustness is particularly notable, attributed to its deep bidirectional contextual understanding, enabling it to capture more nuanced emotional expressions in text. Pre-trained on large-scale text data, BERT learns rich language patterns, providing a solid foundation for fine-tuning on specific tasks. This dissertation not only provides a theoretical foundation and practical guidance for selecting sentiment analysis techniques but also offers insights into choosing the most suitable model based on data characteristics and task requirements. Additionally, the research discusses hyperparameter tuning, strategies for handling imbalanced datasets, and how transfer learning can enhance model generalization. These findings and recommendations will help advance the application and development of sentiment analysis technology in business intelligence, social media monitoring, and other related fields.
URI: https://hdl.handle.net/10356/179725
Schools: School of Electrical and Electronic Engineering 
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Theses

Files in This Item:
File Description SizeFormat 
Zhu Wen-Dissertation.pdf
  Restricted Access
2.32 MBAdobe PDFView/Open

Page view(s)

127
Updated on Feb 8, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.