Please use this identifier to cite or link to this item:
Title: Automatic summarizer for web documents
Authors: Chia, Pei Qi
Keywords: DRNTU::Engineering
Issue Date: 2014
Abstract: As the world globalize, internet is being used around the world. This resulted in the web documents in texts, growing exponentially. It is not suitable to read through all the text information online and just to find and sieve out what you need. Using unsupervised clustering algorithms, the author had created an automatic summarizer that summarizes long documents into short summaries. This thesis will discuss various natural language processing techniques and data mining concepts that are used within the software with primary focus on Lemmatization. These allows the gathering of similar meaning words as well as clustering algorithms Hierarchical Agglomerative Clustering and K-means. The methodology is using the top down and incremental approach to design and build a reliable and functional summarizer. This thesis also explains the functionalities of the summarizer with different implemented tests for greater confidence. They are then observe and evaluate on its flexibility to different text inputs and the logicality of the output summaries. The thesis would then conclude with the suggestion of increasing the usage of natural language process to aid computers in the 'understanding' text information and the probably of using soft clustering approach. All in all, the objective of the project is met and the thesis provides the reader the necessary knowledge to develop a summarizer using the clustering process depicted.
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
  Restricted Access
Main Article for automatic summarizing3.31 MBAdobe PDFView/Open

Page view(s)

Updated on Nov 25, 2020


Updated on Nov 25, 2020

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.