Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/65902
Title: Machine learning techniques for knowledge extraction from text
Authors: Wang, Zhaochun
Keywords: DRNTU::Engineering::Electrical and electronic engineering
Issue Date: 2016
Abstract: With the development of machine learning techniques, it opens up more opportunities for users to simulate a person’s attitude and evaluation towards a text by computers. And considering the increasing amount of online information, text summarization for the huge amount of documents conducted by humans will be very time-consuming and impossible. Therefore, it is very meaningful to conducted research on automatic document summarization (ADS). This paper proposes two automatic document summarization methods which based on latent semantic analysis (LSA) and nonnegative matrix factorization (NMF) algorithms to select some sentences or words which retain the main point of original documents to form a brief summary. Both methods are aimed at to learn semantic features for each sentence and select the important sentences based on the learned representation. In details, some programs assists users to decompose each sentence into a collection of semantic features and each semantic feature can be regarded as a high-level feature composed of the whole vocabulary. The selection of sentences is based on clustering method which can find the latent structure on the sentence level. In addition, we performed our methods on DUC 2001, which is a public and widely-used document summarization datasets. The experimental conclusions demonstrate that LSA and NMF methods are able to achieve a high accuracy and precision. Besides that, the difference between LSA and NMF has been compared and the parameters’ sensitivity in these methods, including the reduced dimension and the length of the input summary, has been analyzed. Keywords Automatic document summarization, Latent semantic analysis, Nonnegative matrix factorization, Semantic features, Document Understanding Conference
URI: http://hdl.handle.net/10356/65902
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Theses

Files in This Item:
File Description SizeFormat 
Wang Zhaochun.pdf
  Restricted Access
1.95 MBAdobe PDFView/Open

Page view(s) 1

161
checked on Oct 24, 2020

Download(s) 1

15
checked on Oct 24, 2020

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.