Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/71212
Title: Automatic document classification
Authors: Zhao, Zinian
Keywords: DRNTU::Engineering::Electrical and electronic engineering
Issue Date: 2017
Abstract: Sentiment analysis has been increasingly viewed as a major research area of Natural Language Processing from both an academic and an industrial standpoint. Automatic classification of natural language unit has become a major target of sentiment analysis. Current models on document classification, however, are limited to short text span and could not yield an accurate classification on news and articles. This project aims to present an effective solution to document classification of real news and articles, which is a pipelined system consisting of representation learning and classification. In my work, various document representation learning methods and classification techniques have been investigated. In total of 9 models have been created and evaluated to classify real news and articles into three categories: Positive, Neutral and Negative. With elaborate experiments, our results show Word Embeddings (WE) and Random Forests (RF) model outperformed all pre-existing models with a classification accuracy as high as 60%. Further more, we will present in this report an automatic news analytics system using the WE and RF model. Given a keyword, the system can classify related news extracted from online sources. This project has successfully designed models that perform accurate document classification of news and can be utilized in a wide range of applications such as analyzing market trend and building investment decisions.
URI: http://hdl.handle.net/10356/71212
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
FYP_Final_Report_ZHAOZINIAN.pdf
  Restricted Access
2.04 MBAdobe PDFView/Open

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.