Please use this identifier to cite or link to this item:
Title: Automatic document summarization
Authors: Xu, Hengjie
Keywords: DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Issue Date: 2017
Abstract: Text summarization, an important branch of Natural Language Processing (NLP), has attracted an increasingly amount of research and engineering interest due to the explosion of information nowadays. Currently, most summarization applications have been devoted to social media and structured reports, with little attention paid to news-article analytics. This project aims to achieve automatic text summarization of a vast number of news articles using a few key sentences. It is a pipelined system consisting of text representation models and clustering algorithms (with cluster centroids as key sentences). 8 summarization techniques were evaluated both on the article level and sentence level. After research, we choose Bag of Words (BoW) with Latent Semantic Analysis (LSA) and Spherical K-Means as this combination stands out among all the 8 combinations. In particular, on the article level, the combination produces a score of 0.94, a 17.5% boost compared to our baseline from literature. It reflects that our proposed clustering technique is fairly robust and accurate. This project is consolidated into a single web application. The user interface allows users to obtain relevant news articles based on their input, such as subject names, date range and sources. For subsequent analysis of these news articles, Named Entity Recognition (NER) algorithm is refined and applied to extract major entities, such as places, person and organizations, as preliminary analysis. Eventually, news articles are summarized with sentences using our optimal model of summarization.
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
FYP Final Report_Xu Hengjie.pdf
  Restricted Access
FYP Report1.91 MBAdobe PDFView/Open

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.