Please use this identifier to cite or link to this item:
|Title:||Towards ultrahigh dimensional feature selection for big data||Authors:||Tan, Mingkui
Tsang, Ivor W.
|Keywords:||DRNTU::Engineering::Computer science and engineering::Data||Issue Date:||2014||Source:||Tan, M., Tsang, I. W., & Wang, L. (2014). Towards ultrahigh dimensional feature selection for big data. Journal of machine learning research, 15, 1371-1429.||Series/Report no.:||Journal of machine learning research||Abstract:||In this paper, we present a new adaptive feature scaling scheme for ultrahigh-dimensional feature selection on Big Data, and then reformulate it as a convex semi-infinite programming (SIP) problem. To address the SIP, we propose an eficient feature generating paradigm. Different from traditional gradient-based approaches that conduct optimization on all input features, the proposed paradigm iteratively activates a group of features, and solves a sequence of multiple kernel learning (MKL) subproblems. To further speed up the training, we propose to solve the MKL subproblems in their primal forms through a modified accelerated proximal gradient approach. Due to such optimization scheme, some eficient cache techniques are also developed. The feature generating paradigm is guaranteed to converge globally under mild conditions, and can achieve lower feature selection bias. Moreover, the proposed method can tackle two challenging tasks in feature selection: 1) group-based feature selection with complex structures, and 2) nonlinear feature selection with explicit feature mappings. Comprehensive experiments on a wide range of synthetic and real-world data sets of tens of million data points with O(1014) features demonstrate the competitive performance of the proposed method over state-of-the-art feature selection methods in terms of generalization performance and training eficiency.||URI:||https://hdl.handle.net/10356/105805
|URL:||http://www.jmlr.org/papers/v15/tan14a.html||Rights:||© 2014 The Authors(Journal of Machine Learning Research). This paper was published in Journal of Machine Learning Research and is made available as an electronic reprint (preprint) with permission of The Authors(Journal of Machine Learning Research). The paper can be found at the following official URL: [http://jmlr.org/papers/volume15/tan14a/tan14a.pdf]. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper is prohibited and is subject to penalties under law.||Fulltext Permission:||open||Fulltext Availability:||With Fulltext|
|Appears in Collections:||SCSE Journal Articles|
Files in This Item:
|Towards ultrahigh dimensional feature selection for big data.pdf||1 MB||Adobe PDF|
Page view(s) 10720
Updated on Jan 27, 2023
Updated on Jan 27, 2023
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.