Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/149304
Title: Deep learning-based automatic document categorization and organization
Authors: Foo, Shawn Nicholas Say Yan
Keywords: Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Issue Date: 2021
Publisher: Nanyang Technological University
Source: Foo, S. N. S. Y. (2021). Deep learning-based automatic document categorization and organization. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/149304
Abstract: Given the vast improvement in information technology today, document classification has become a major research area of Natural Language Processing. Previously, document classification was done by using Traditional Machine Learning algorithm to categorize online documents. However, Traditional Machine Learning algorithms have shown to be unable to cope with the massive amount of online information generated daily. On the other hand, Deep Learning algorithms’ performance increases with data. Therefore, we introduce Deep Learning models to perform the document classification task, using the large amount of information data being generated daily. This project aims to build an AI system that performs document classification by using Deep Learning-based methods. In my work, 5 Deep Learning-based models are compared and evaluated. The coarse-grained classification task involves the Deep Learning-based models classifying news articles into 5 entry-level categories: Economy, Fuel Price, Illegal Fishing, Weather and Climate, and Others. A fine-grained classification task was also conducted in this project using news articles in Fuel Price category to further classify them into two subcategories: Price Increase and Price Decrease. It was identified that the model that uses TF-IDF word representation and Feedforward Artificial Neural Network outperformed all the other models with classification accuracy of 98% and 88.25% for coarse-grained and fine-grained classification task, respectively. News classification allows us to detect the occurrence of certain events. In particular, the abovementioned news classification done in this project contributes to detecting piracy in the Straits of Malacca. The project has successfully evaluated the Deep Learning-based model best use for document classification of news articles and can be utilized to analyze the trend of piracy occurring in Straits of Malacca.
URI: https://hdl.handle.net/10356/149304
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
Shawn_Nicholas_Foo_FYP_FINAL_REPORT.pdf
  Restricted Access
Deep Learning-based Automatic Document Categorization and Organization1.6 MBAdobe PDFView/Open

Page view(s)

51
Updated on Jan 21, 2022

Download(s)

7
Updated on Jan 21, 2022

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.