Please use this identifier to cite or link to this item:
Title: Detection of hate speech on social media
Authors: Zou, Yunting
Keywords: Engineering::Computer science and engineering::Computing methodologies
Issue Date: 2022
Publisher: Nanyang Technological University
Source: Zou, Y. (2022). Detection of hate speech on social media. Master's thesis, Nanyang Technological University, Singapore.
Abstract: Social media has become the most popular platform on the internet because of its speed of dissemination and the amount of information available to users. However, the immediacy of social media is so high that hate speech cannot be effectively regulated. Extracting and identifying hate speech and the users who disseminate it from the vast amount of information in social media is a difficult task. Text classification is one of the most widely used natural language processing techniques. In order to realize the classification and detection of hate speech, an NLP￾based text classification model is used to achieve the classification task of different semantic texts by extracting text features and training the model. In this dissertation, multiple text classification models are implemented simultaneously, and their performance is compared in the context of specific performance tasks with different experimental data. A machine learning-based approach to social media hate speech detection is proposed. In this dissertation, an SVM text classification and detection model based on the TF￾IDF text representation is proposed, using Twitter as the object of study. The TF-IDF information of the text is extracted by the algorithm as the text representation and combined with SVM to achieve the textual binary classification task of hate speech and normal speech. In addition, a deep learning-based social media hate speech detection method is proposed. This dissertation applies the deep learning model CNN to the task of hate speech recognition. Text representation by constructing word vectors using word2vec weighting. Through training, the CNN model can achieve text classification tasks. At the same time, a text classification model based on BERT is proposed and implemented, i and the internal principle, training method and transformer frame structure of BERT are studied. In the simulation of the dissertation, a total of more than 100,000 social media text data from different sources are extracted and divided into training sets of different sizes through screening annotations. Finally, it is concluded that the hate speech detection model based on CNN neural network works better when the text dataset size is 24000 with a F1 value of 0.827763 than the SVM machine learning model with a F1 value of 0.755817. When the text dataset size is 15000, the BERT-based hate speech detection model works better with a F1 value of 0.735808 than the CNN-based neural network with a F1 value of 0.677092.
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Theses

Files in This Item:
File Description SizeFormat 
final dissertation.pdf
  Restricted Access
Detection of Hate Speech on Social media2.6 MBAdobe PDFView/Open

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.