Please use this identifier to cite or link to this item:
|Title:||Machine learning techniques for knowledge extraction from text||Authors:||Wang, Zhaochun||Keywords:||DRNTU::Engineering::Electrical and electronic engineering||Issue Date:||2016||Abstract:||With the development of machine learning techniques, it opens up more opportunities for users to simulate a person’s attitude and evaluation towards a text by computers. And considering the increasing amount of online information, text summarization for the huge amount of documents conducted by humans will be very time-consuming and impossible. Therefore, it is very meaningful to conducted research on automatic document summarization (ADS). This paper proposes two automatic document summarization methods which based on latent semantic analysis (LSA) and nonnegative matrix factorization (NMF) algorithms to select some sentences or words which retain the main point of original documents to form a brief summary. Both methods are aimed at to learn semantic features for each sentence and select the important sentences based on the learned representation. In details, some programs assists users to decompose each sentence into a collection of semantic features and each semantic feature can be regarded as a high-level feature composed of the whole vocabulary. The selection of sentences is based on clustering method which can find the latent structure on the sentence level. In addition, we performed our methods on DUC 2001, which is a public and widely-used document summarization datasets. The experimental conclusions demonstrate that LSA and NMF methods are able to achieve a high accuracy and precision. Besides that, the difference between LSA and NMF has been compared and the parameters’ sensitivity in these methods, including the reduced dimension and the length of the input summary, has been analyzed. Keywords Automatic document summarization, Latent semantic analysis, Nonnegative matrix factorization, Semantic features, Document Understanding Conference||URI:||http://hdl.handle.net/10356/65902||Fulltext Permission:||restricted||Fulltext Availability:||With Fulltext|
|Appears in Collections:||EEE Theses|
Page view(s) 1161
checked on Oct 24, 2020
checked on Oct 24, 2020
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.