Please use this identifier to cite or link to this item:
Full metadata record
DC FieldValueLanguage
dc.contributor.authorKho, William.
dc.description.abstractKnowledge lies in various sources and can be found in different format and shape. One of the greatest source of knowledge we often rely on in our daily lives is non-other than the internet. There, information is mostly encoded in unstructured text or documents. In addition, knowledge extraction from text/documents these days relies on manual entry, which is often time-consuming and laborious. In order to solve this problem, a human-like intelligent agent that is capable of reasoning and decision making is built. The objective of this project is to integrate web-documents from multiple sources and classify them using the LSA (Latent Semantic Analysis) technique. A number of websites originated from a Google query input go through several processes such as text parsing, HTML tags removal, TF-IDF term weighting and normalization, also cosine similarity grouping through SVD (Singular Value Decomposition). The system is built as a Java application and able to filter and group closely related documents by building a vector space model.en_US
dc.format.extent62 p.en_US
dc.rightsNanyang Technological University
dc.subjectDRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systemsen_US
dc.titleIntegration and classification of documents from multiple sourcesen_US
dc.typeFinal Year Project (FYP)en_US
dc.contributor.supervisorMao Kezhien_US
dc.contributor.schoolSchool of Electrical and Electronic Engineeringen_US
dc.description.degreeBachelor of Engineeringen_US
item.fulltextWith Fulltext-
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)
Files in This Item:
File Description SizeFormat 
  Restricted Access
FYP Report2.76 MBAdobe PDFView/Open

Page view(s)

Updated on Jul 15, 2024


Updated on Jul 15, 2024

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.