dc.contributor.author Tan, Ming Kui dc.date.accessioned 2014-12-04T09:02:55Z dc.date.accessioned 2017-07-23T08:30:16Z dc.date.available 2014-12-04T09:02:55Z dc.date.available 2017-07-23T08:30:16Z dc.date.copyright 2014 en_US dc.date.issued 2014 dc.identifier.citation Tan, M. K. (2014). Towards efficient large-scale learning by exploiting sparsity. Doctoral thesis, Nanyang Technological University, Singapore. dc.identifier.uri http://hdl.handle.net/10356/61881 dc.description.abstract The last decade has witnessed explosive growth in data. The ultrahigh-dimensional and large volume data have brought many critical issues, such as the storage disaster, the scalability issues for data analysis, and so on. To enable efficient and effective big data analysis, this thesis exploits the sparsity constraints of learning tasks and investigates large-scale learning in three directions, namely feature selection for classification tasks, sparse recovery for signal processing, and matrix recovery problem. %Focusing on the scalability challenges A {Feature Generating Machine} (FGM) is proposed to address the large-scale and ultrahigh-dimensional feature selection for classification tasks (e.g. $O(10^{12})$ features). Unlike traditional gradient-based approaches that conduct optimization on en_US all features, FGM iteratively activates a group of features, and solves a sequence of subproblems w.r.t. the activated features only. As a result, it effectively avoids the storage disaster, and scales well on \emph{big data}. %FGM also tackles two challenging tasks -- feature selection with complex structures and nonlinear %feature selection with explicit feature mappings. A {Matching Pursuit LASSO} (MPL) algorithm is developed to address the large-scale sparse recovery problem. MPL is guaranteed to converge to a global solution, and greatly reduces the computational cost under \emph{big dictionary} (e.g. with 1 million atoms). In particular, by taking the advantage of its optimization scheme, a batch-mode MPL is developed to vastly speed up the optimization with many signals. A {Riemannian Pursuit} (RP) algorithm is proposed to address the low-rank {matrix recovery} problem. RP consists of a sequence of fixed-rank optimization problems. Each subproblem, solved by a nonlinear Riemannian conjugate gradient method. Compared to existing methods, RP does not require the rank estimation and performs stably on ill-conditioned big matrices. Extensive experiments on both synthetic and real-world problems demonstrate that the proposed methods achieve superior scalability and comparable or even better performance than the considered state-of-the-art baselines. dc.format.extent 237 p. en_US dc.language.iso en en_US dc.subject DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence en_US dc.title Towards efficient large-scale learning by exploiting sparsity en_US dc.type Thesis dc.contributor.supervisor2 Ivor W. Tsang en_US dc.contributor.research Centre for Computational Intelligence en_US dc.contributor.school School of Computer Engineering en_US dc.description.degree DOCTOR OF PHILOSOPHY (SCE) en_US
﻿

## Files in this item

FilesSizeFormatView
main_thesis.pdf1.539Mbapplication/pdfView/Open