Please use this identifier to cite or link to this item:
Title: Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques
Authors: Ma, Shuting
Keywords: Engineering::Electrical and electronic engineering
Issue Date: 2022
Publisher: Nanyang Technological University
Source: Ma, S. (2022). Detecting novel and interested topics from open sources based on deep neural network and natural language processing techniques. Master's thesis, Nanyang Technological University, Singapore.
Abstract: One of the factors threatening the security of coastal countries is piracy. With the Cov-19 pandemic, piracy incidents have also become more frequent than usual, making it a challenge to the safety of residents and social stability. At the same time, published news reports on open resources for piracy incidents are truly treasure for piracy research. With the maturity of artificial intelligence technology and the continuous development of Natural Language Processing, how to reasonably use these open resource text materials for analysis has become an important research direction. This project first introduces the possible applications of NLP to pirate news materials. The relevant piracy news materials were collected from the open resources, marked and cleaned to form a new dataset related to this topic. Four mainstream text classification models, textCNN, Bi-LSTM, Transformer, and Bert, theoretical introductions and practical tests are carried out, and Bert is finally selected as the base model. To address the imbalanced data classification problem, this project proposes and explores a variety of methods combined with deep learning and machine learning. On the one hand, data resampling has been achieved to improve the balance of the dataset. On the other hand, with Bert has been chosen to do classification, Costive-SVM is constructed in a fully connected layer with Triplet Loss to separate the labels of positive and negative samples. After fine-tuning, the performance of the model has been improved, where the over-fitting problem in the optimization process is solved as well. Finally, the F1 score improved from 0.46 to 0.87.
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Theses

Files in This Item:
File Description SizeFormat 
  Restricted Access
4.86 MBAdobe PDFView/Open

Page view(s)

Updated on Jun 29, 2022


Updated on Jun 29, 2022

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.