Please use this identifier to cite or link to this item:
|Title:||Efficient and rich graph clustering||Authors:||Xu, Zhiqiang||Keywords:||DRNTU::Engineering::Computer science and engineering::Information systems::Models and principles||Issue Date:||2015||Source:||Xu, Z. (2015). Efficient and rich graph clustering. Doctoral thesis, Nanyang Technological University, Singapore.||Abstract:||Graph clustering is an important task in data mining and pattern recognition. With the rapid development of modern technology nowadays, a lot of challenges have been raised on different aspects of graph clustering. In this thesis, we propose new algorithms to address the issues of efficiency and richness in clustering with graphs. Due to the fast growth of graph size at the age of information, the efficiency issue of graph clustering algorithms is a critical concern in their real applications. By virtue of technology advancements, meanwhile, the graph data we can acquire gets increasingly rich due to the availability of abundant additional information. This gives rise to two types of rich graphs of our interest, namely, attributed graphs and multiple graphs. These new characteristics of graph data pose great challenges for existing algorithms, making them less efficient and/or less effective. Therefore, it is demanding to develop new algorithms to improve the situation. Specifically, we study three problems on efficient and rich graph clustering as follows. Most of existing algorithms for attributed graph clustering are distance-based. Distance-based methods often require a lossy data fusion process for constructing a unified measure to cover both structural and attribute information, which is prone to suffering from information loss. Alternatively, we propose a model-based approach, which seamlessly casts the two types of information into a joint probabilistic framework. The framework is not only effective and flexible for various kinds of attributed graphs, but also efficient by performing variational inference. Stochastic blockmodel (SBM) is a popular probabilistic tool for graph modeling and also a building block for advanced graph models. However, its applicability to graph clustering has gradually become limited because the existing inference methods are either computationally expensive or not guaranteed to converge. Naturally, the algorithm for attributed graph clustering built upon SBM suffers from this weakness as well. To tackle these issues, we propose to exploit the geometric structure of the underlying Riemannian space with the inference problem in deploying the nonlinear conjugate gradient method. The resultant algorithm is not only efficient and convergent, but also robust, for the community detection on graphs with/without node attributes. There also exists a lot of work on clustering with multiple graphs. They achieve the same goal of finding a consensus clustering across different graphs in different ways. Whatever the ways they choose, however, the consensus information is often coupled with domain-specific information implicitly in the modeling. With domain-specific fraction unfiltered, the quality of consensus clustering can be degraded. To reduce the coupling of domain-specific information, we extend the idea of consensus and domain-specific subspace decomposition from flat data to graph data for explicitly modeling and distinguishing the two types of information, consensus and domain-specific, in multiple graphs. And the model is solved by a novel spectral clustering algorithm with better accuracy and efficiency.||URI:||http://hdl.handle.net/10356/65745||Fulltext Permission:||restricted||Fulltext Availability:||With Fulltext|
|Appears in Collections:||SCSE Theses|
checked on Oct 1, 2020
checked on Oct 1, 2020
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.