Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/52422
Title: Challenging issues in classification problems : sparisty control, key instance detection, and imbalanced data
Authors: Liu, Guoqing.
Keywords: DRNTU::Engineering::Computer science and engineering::Computer applications::Computers in other systems
Issue Date: 2013
Abstract: This thesis deals with the difficulties in classification problems caused by three types of sparsity characteristics - feature, label, and instance sparsity. First, feature spar- sity is usually used as prior knowledge by inducing parameter sparsity of the learned model. We show that only an appropriate degree of parameter sparsity is beneficial, and both over-sparsity and under-sparsity are harmful for classification. Second, label sparsity means that only a fraction of training instances are labeled, which causes fail- ure of classic classification methods in these cases. Third, instance sparsity is caused by imbalanced composition of different categories, and instances from one category significantly outnumber the ones from the other. This always makes the classification boundary biased towards the majority category. Consequently, three contributions - sparsity control, key instance detection, and imbal- anced classification - are presented to address these challenges. Sparsity control aims to regularize the sparsity of model parameter at an appropriate level according to the intrinsic feature sparsity in data. It is proposed based on the ob- servation that this sparsity is not always desirable in real problems, and only a proper de- gree of sparsity is beneficial. To address this issue, we propose a novel probit classifier using generalized Gaussian scale mixture (GGSM) priors that can adjust the induced sparsity by tuning the shape parameter of GGSM, and consequently provide either a sparse or non-sparse solution based on the intrinsic feature sparsity. Model learning is carried out by an efficient modified maximum a posteriori estimation. We show rela- tionships of the proposed approach to the previous methods. We also study different types of likelihood working with the GGSM priors in a kernel-based setup, based on which an improved kernel-based approach is presented. Experiments demonstrate that the proposed method has better or comparable performance in both linear and non-linear classification.
URI: http://hdl.handle.net/10356/52422
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
TsceG0800451J.pdf
  Restricted Access
thesis2.94 MBAdobe PDFView/Open

Page view(s) 50

334
checked on Oct 26, 2020

Download(s) 50

22
checked on Oct 26, 2020

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.