Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/156503
Title: Sentic computing for social good: sentiment analysis on toxic comment
Authors: Wang Jingtan
Keywords: Engineering::Computer science and engineering
Issue Date: 2022
Publisher: Nanyang Technological University
Source: Wang Jingtan (2022). Sentic computing for social good: sentiment analysis on toxic comment. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/156503
Project: SCSE21-0238
Abstract: With the neural network revolutions and increased computational power, Artificial Intelligence has been applied in various fields for improving life, such as concept-level sentiment analysis. We focused on one of the sentiment analysis applications: toxic comments detection. These inappropriate messages, hiding in the massive data, result in verbal violence to the receiver. Therefore, we aimed to detect the toxicity of content given raw textual input, outputting whether toxic or not. We selected an open-source multilabel dataset with around 150k samples. Each sentence is assigned 6 categories of toxic behaviors. We intended to predict the belonging of a text in these 6 labels. To achieve this, we reviewed and experimented the state-of-art methods in this field, known as the pre-trained model. We then improved the models based on the issues we noticed during experiments: imbalanced multilabel. We reviewed various approaches discussed in papers and journals, such as external knowledge of minority labels, cost-sensitive metrics, and resampling. We then compared them for an effective way to address the imbalance. Note that due to resources constraint, we only sampled ten percent of original data for our experimentation. Overall, we discovered the best fitting pre-trained model, BERT, and improved it in the imbalanced multilabel classification by using focal loss and random oversampling. We hope the reviews, the experimentation, and the result can contribute to the toxic comment challenge. We also pointed out the limitation in this project: the lack of resources and some unexpected behaviors, as well as possible future directions: active learning and data-augmentation supported resampling.
URI: https://hdl.handle.net/10356/156503
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
Sentic Computing for Social Good-Sentiment Analysis on Toxic Comment.pdf
  Restricted Access
1.92 MBAdobe PDFView/Open

Page view(s)

20
Updated on May 14, 2022

Download(s)

5
Updated on May 14, 2022

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.