Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/105805
Title: Towards ultrahigh dimensional feature selection for big data
Authors: Tan, Mingkui
Tsang, Ivor W.
Wang, Li
Keywords: DRNTU::Engineering::Computer science and engineering::Data
Issue Date: 2014
Source: Tan, M., Tsang, I. W., & Wang, L. (2014). Towards ultrahigh dimensional feature selection for big data. Journal of machine learning research, 15, 1371-1429.
Series/Report no.: Journal of machine learning research
Abstract: In this paper, we present a new adaptive feature scaling scheme for ultrahigh-dimensional feature selection on Big Data, and then reformulate it as a convex semi-infinite programming (SIP) problem. To address the SIP, we propose an eficient feature generating paradigm. Different from traditional gradient-based approaches that conduct optimization on all input features, the proposed paradigm iteratively activates a group of features, and solves a sequence of multiple kernel learning (MKL) subproblems. To further speed up the training, we propose to solve the MKL subproblems in their primal forms through a modified accelerated proximal gradient approach. Due to such optimization scheme, some eficient cache techniques are also developed. The feature generating paradigm is guaranteed to converge globally under mild conditions, and can achieve lower feature selection bias. Moreover, the proposed method can tackle two challenging tasks in feature selection: 1) group-based feature selection with complex structures, and 2) nonlinear feature selection with explicit feature mappings. Comprehensive experiments on a wide range of synthetic and real-world data sets of tens of million data points with O(1014) features demonstrate the competitive performance of the proposed method over state-of-the-art feature selection methods in terms of generalization performance and training eficiency.
URI: https://hdl.handle.net/10356/105805
http://hdl.handle.net/10220/20902
URL: http://www.jmlr.org/papers/v15/tan14a.html
Rights: © 2014 The Authors(Journal of Machine Learning Research). This paper was published in Journal of Machine Learning Research and is made available as an electronic reprint (preprint) with permission of The Authors(Journal of Machine Learning Research). The paper can be found at the following official URL: [http://jmlr.org/papers/volume15/tan14a/tan14a.pdf]. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper is prohibited and is subject to penalties under law.
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Journal Articles

Files in This Item:
File Description SizeFormat 
Towards ultrahigh dimensional feature selection for big data.pdf1 MBAdobe PDFThumbnail
View/Open

Page view(s) 10

575
Updated on Jun 24, 2021

Download(s) 5

768
Updated on Jun 24, 2021

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.