Please use this identifier to cite or link to this item:
Title: Toolkit development for high-dimensional data pre-processing, clustering and analysis
Authors: Hu, Yao.
Keywords: DRNTU::Engineering::Electrical and electronic engineering
Issue Date: 2012
Abstract: In this report, the author documents the software project that designs and implements a high dimensional data processing toolkit. The developed toolkit is called WordTagger, that automatically labels a vocabulary of computer science words to provide the categorical information of the word-space by using ACM taxonomy as reference [1]. The word categorical information can be used as another source of the prior knowledge to incorporate with that from the document-space into the existing semi-supervised coclustering algorithms. The author has successfully implemented this toolkit WordTagger and conducted tests to evaluate its effectiveness and efficiency. Some preliminary experiments have also been conducted to show the WordTagger labeled words could be used as an additional word-space prior knowledge source. This is done by making modifications to an existing semi-supervised approach SS-HFCR to accept prior knowledge from both document and word-space, which is referred as dual SS-HFCR. However, in the report, we show that dual SS-HFCR is unable to perform as good as expected with the categorical information from word-space provided by WordTagger. The limitations of the current integration of WordTagger and dual SS-HFCR are identified and discussed. The future work is suggested and summarized in the end of the report.
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Theses

Files in This Item:
File Description SizeFormat 
  Restricted Access
Main Report2.18 MBAdobe PDFView/Open

Page view(s) 5

checked on Oct 30, 2020

Download(s) 5

checked on Oct 30, 2020

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.