Please use this identifier to cite or link to this item:
Title: Semi-supervised clustering algorithms for web documents
Authors: Hua, Yunke.
Keywords: DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Issue Date: 2013
Abstract: Clustering is one of the most popular data mining techniques in order to finding the user-desired pattern accurately and efficiently from huge amount of data flow. However, due to the curse of dimensionality, clustering high-dimensional data like web documents and biological data can be a challenging task as the cluster patterns are difficult to be found in the high-dimensional space. In this project, a new semi-supervised fuzzy co-clustering algorithm called SSFCR is proposed based on the original fuzzy co-clustering with Ruspini’s condition (FCR) algorithm. Due to the overlapping nature of the real world data, fuzzy clustering is used. Co-clustering is adopted since it can simultaneously clustering the features to dynamically reduce the dimensionality of the object clustering space, which is suitable for clustering high-dimensional data like the web documents. For the semi-supervised method, some prior knowledge in the form of two sets of pair-wise constraints is introduced in the clustering process to improve the accuracy and efficiency. Each constraint specifies whether a pair of documents “must-link”(must be in the same cluster) or “cannot-link”(must be in different clusters) with each other. The categorical label of the pair-wise constraints can be taken from either the ground-truth label information or the user assigned categorical values. The whole clustering process is treated as solving a maximization problem of an aggregation cost function with the semi-supervised terms. By applying the Lagrange multiplier method, the update membership rules for the new semi-supervised SSFCR are derived. Next, extensive experimental study is carried out on several large benchmark datasets using various parameter settings to show the improvement on accuracy, stability and efficiency of the new SSFCR algorithm.
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
  Restricted Access
1.22 MBAdobe PDFView/Open

Page view(s) 10

checked on Oct 26, 2020

Download(s) 10

checked on Oct 26, 2020

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.