Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/142941
Title: High dimensional clustering for mixture models
Authors: Liu, Yiming
Keywords: Science::Mathematics
Issue Date: 2020
Publisher: Nanyang Technological University
Source: Liu, Y. (2020). High dimensional clustering for mixture models. Doctoral thesis, Nanyang Technological University, Singapore.
Abstract: Clustering is an essential subject in unsupervised learning. It is a common technique used in many fields, including machine learning, statistics, bioinformatics, and computer graphics. Classifying samples into homogeneous groups is based on different criterions. In this thesis, we focus on the clusters that are characterized by the different parameters (i.e., means and covariances), and we study the clustering method for the high dimensional mixture data. According to this setting, we propose two new methods, Covariance clustering method and {\it Two-step} method. Also, we investigate and develop the Mean clustering method from both theoretical and practical aspects by random matrix theory. Specifically, the first part focuses on the clustering when the data are collected from a mixture distribution with distinct covariance matrices. We provide a new algorithm to address this issue and find the misclustering rate theoretically. In the second part, for the data with different means, we provide a noncentered and centered version of Mean clustering method. Moreover, to give a theoretical justification of these two methods, we prove that the results of no eigenvalue outside the support of the limiting spectral distribution and exact separation of eigenvalues of large-dimensional sample covariance matrices can be extended to low rank information plus general noise models. In the third part, when either means or covariances are distinct, we propose a Two-step method to do clustering. Both theoretical and numerical properties of the Two-step method are discussed. Simulation studies and real data analysis also demonstrate that the Two-step method outperforms the other methods under a variety of settings.
URI: https://hdl.handle.net/10356/142941
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SPMS Theses

Files in This Item:
File Description SizeFormat 
Thesis_Liu YIMING.pdf1.8 MBAdobe PDFView/Open

Page view(s)

48
checked on Sep 30, 2020

Download(s)

20
checked on Sep 30, 2020

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.