Please use this identifier to cite or link to this item:
Title: A split-merge framework for comparing clusterings
Authors: Xiang, Qiaoliang
Keywords: DRNTU::Engineering::Computer science and engineering
Issue Date: 2013
Source: Xiang, Q. (2013). A split-merge framework for comparing clusterings. Master's thesis, Nanyang Technological University, Singapore.
Abstract: External clustering evaluation measures are often used to evaluate the performance of different clustering algorithms on a collection of data sets. Traditional normalization property is no longer suitable for this task and a conditional normalization property is proposed based on the fact that one clustering is the ground-truth. Even existing measures have been proposed from different points of view, we study them from the normalization point of view. Besides, we propose a new category of cluster counting measures and further group set matching measures into two subcategories according to how the matching is performed. Furthermore, we propose a generative model to study how exist- ing measures are generated as well as producing new measures according to application requirements. In order to understand the intrinsic properties of a measure, a graph-based model is presented to model two clusterings as a directed bipartite graph, which can be decomposed into weakly connected components. A measure can be expressed as a conic combination of scores on components, and different weights are assigned to components when aggregating their scores. Based on the graph-based model, we propose a split-merge framework by breaking components into subcomponents and combining the scores of any two related subcomponents. It is conditionally normalized while existing measures are not. It also has many nice properties compared to other existing frameworks. We give some examples of the framework and compare one example with a few representative measures theoretically and empirically on a coreference resolution data set.
DOI: 10.32657/10356/55194
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
main.pdfMain article559.44 kBAdobe PDFThumbnail

Page view(s) 50

Updated on Jun 17, 2021

Download(s) 20

Updated on Jun 17, 2021

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.