Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/98983
Title: | Online feature selection for mining big data | Authors: | Hoi, Steven C. H. Wang, Jialei. Zhao, Peilin. Jin, Rong. |
Keywords: | DRNTU::Engineering::Computer science and engineering | Issue Date: | 2012 | Source: | Hoi, S. C. H., Wang, J., Zhao, P., & Jin, R. (2012). Online feature selection for mining big data. Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining Algorithms, Systems, Programming Models and Applications - BigMine '12, 93-100. | Conference: | International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (1st : 2012 : Beijing, China) | Abstract: | Most studies of online learning require accessing all the attributes/ features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or the access to it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of Online Feature Selection (OFS) in which the online learner is only allowed to maintain a classifier involved a small and fixed number of features. The key challenge of Online Feature Selection is how to make accurate prediction using a small and fixed number of active features. This is in contrast to the classical setup of online learning where all the features are active and can be used for prediction. We address this challenge by studying sparsity regularization and truncation techniques. Specifically, we present an effective algorithm to solve the problem, give the theoretical analysis, and evaluate the empirical performance of the proposed algorithms for online feature selection on several public datasets. We also demonstrate the application of our online feature selection technique to tackle real-world problems of big data mining, which is significantly more scalable than some well-known batch feature selection algorithms. The encouraging results of our experiments validate the efficacy and efficiency of the proposed techniques for large-scale applications. | URI: | https://hdl.handle.net/10356/98983 http://hdl.handle.net/10220/12629 |
DOI: | 10.1145/2351316.2351329 | Schools: | School of Computer Engineering | Fulltext Permission: | none | Fulltext Availability: | No Fulltext |
Appears in Collections: | SCSE Conference Papers |
SCOPUSTM
Citations
5
93
Updated on Mar 24, 2024
Page view(s) 20
652
Updated on Mar 27, 2024
Google ScholarTM
Check
Altmetric
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.