Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/179725
Title: | Sentiment analysis based on statistical machine learning and deep learning | Authors: | Zhu, Wen | Keywords: | Computer and Information Science Engineering |
Issue Date: | 2024 | Publisher: | Nanyang Technological University | Source: | Zhu, W. (2024). Sentiment analysis based on statistical machine learning and deep learning. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/179725 | Project: | ISM-DISS-03925 | Abstract: | Sentiment analysis is a fundamental yet challenging task in the field of natural language processing, aiming to identify and extract subjective information from text data. It has wide applications in areas such as business intelligence and social media monitoring. This dissertation primarily explores text sentiment analysis methods, focusing on effectively capturing and classifying subjective emotions in short text data from social media. Through a systematic literature review, the study traces the evolution of sentiment analysis from rule-based methods using sentiment lexicons to modern deep learning techniques. It compares the performance of statistical machine learning methods (logistic regression and Naive Bayes) with deep learning techniques (LSTM and BERT) in text sentiment recognition tasks. This dissertation discusses key steps in text preprocessing, including data cleaning, tokenization, stopword removal, and stemming, as well as the databases chosen for each type of experiment. It explores feature extraction techniques, particularly frequency-based methods and TF-IDF. Detailed descriptions of model implementation, training, and testing processes are provided, with a focus on how different text preprocessing techniques and model parameter settings affect sentiment classification performance. Finally, experiments compare the performance of statistical machine learning methods and deep learning methods in understanding text sentiment, with a detailed analysis of the results. Our experiments validated that the TF-IDF feature extraction method has significant advantages over traditional frequency-based feature extraction methods in statistical machine learning models. The results highlight the effectiveness of TF-IDF in evaluating the importance of keywords in texts, particularly when handling large volumes of text data. Moreover, the BERT model demonstrated superior performance across all test databases, significantly surpassing the LSTM model in accuracy, especially when dealing with complex and noisy datasets. BERT's robustness is particularly notable, attributed to its deep bidirectional contextual understanding, enabling it to capture more nuanced emotional expressions in text. Pre-trained on large-scale text data, BERT learns rich language patterns, providing a solid foundation for fine-tuning on specific tasks. This dissertation not only provides a theoretical foundation and practical guidance for selecting sentiment analysis techniques but also offers insights into choosing the most suitable model based on data characteristics and task requirements. Additionally, the research discusses hyperparameter tuning, strategies for handling imbalanced datasets, and how transfer learning can enhance model generalization. These findings and recommendations will help advance the application and development of sentiment analysis technology in business intelligence, social media monitoring, and other related fields. | URI: | https://hdl.handle.net/10356/179725 | Schools: | School of Electrical and Electronic Engineering | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | EEE Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Zhu Wen-Dissertation.pdf Restricted Access | 2.32 MB | Adobe PDF | View/Open |
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.