Robust models and novel similarity measures for high-dimensional data clustering
Nguyen, Duc Thang
Date of Issue2012
School of Electrical and Electronic Engineering
The purpose of this thesis is to present our research works on some of the fundamental issues encountered in high-dimensional data clustering. From our study of the current literature, we list out a few important problems that are still open for solutions in the field, and propose the appropriate solutions for these problems. We investigate how statistics, machine learning and meta-heuristics techniques can be used to improve existing methods or develop novel models for unsupervised learning of high-dimensional data. Our goals are to develop efficient clustering algorithms that could reflect the natural properties of high-dimensional data, be robust to outliers and less sensitive to initialization; algorithm that are simple and fast, easily applicable and still produce good clustering quality. The main contributions of this thesis include a robust model-based clustering algorithm which is capable of handling noisy data, a novel similarity measure and its resulted algorithms for clustering text document data, and other related studies to help improve existing clustering algorithms.
DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications