Please use this identifier to cite or link to this item:
Title: Information theoretic feature selection clustering
Authors: Quan, Yu Teng.
Keywords: DRNTU::Engineering::Computer science and engineering::Information systems::Database management
Issue Date: 2012
Abstract: Clustering is part of data mining where data mining is a process in which it is used to analyze data from various angles to discover new patterns from large data sets, finding the co-relation in order to transform the information into reliable and tangible data. However, data mining is usually concerned with large and high-dimensional data and most of the current algorithms researchers have implemented are sensitive to scale or high-dimensionality or both. Type of features played an important role in data mining where some of the features are the crux for clustering while others may just obstruct the process. A way to conquer such problems is to select a subset of key features. To further improve on the accuracy of clustering, a non-parametric estimation of average class entropies can be used in search of a clustering algorithm that maximize the estimated mutual information between clusters and data points. Several methods have been found and implemented such as the k-Means algorithm which uses the properties of distance measures and reduces computing cost while lacking in accuracy. To counter the lack of accuracy while still maintaining efficiency, the Nonparametric Information Clustering (NIC) algorithm is used to divide set of objects into groups where data points will be processed and maneuvered using a specific distance towards a cluster center from each point. It is applicable in situations where input parameters are unknown as it is nonparametric. This is tested against k-Means with different sets of data and results have shown that NIC has better performance in terms of accuracy. To look further into the accuracy, an error rate function will be implemented to check the correctness of each cluster.
Schools: School of Computer Engineering 
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
  Restricted Access
906.59 kBAdobe PDFView/Open

Page view(s)

Updated on Jun 18, 2024


Updated on Jun 18, 2024

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.