Enhancing naive bayes with various smoothing methods for short text classification
Thalmann, Nadia Magnenat
Date of Issue2012
International conference companion on World Wide Web (21st : 2012 : Lyon, France)
School of Computer Engineering
Partly due to the proliferance of microblog, short texts are becoming prominent. A huge number of short texts are generated every day, which calls for a method that can efficiently accommodate new data to incrementally adjust classification models. Naive Bayes meets such a need. We apply several smoothing models to Naive Bayes for question topic classification, as an example of short text classification, and study their performance. The experimental results on a large real question data show that the smoothing methods are able to significantly improve the question classification performance of Naive Bayes. We also study the effect of training data size, and question length on performance.
DRNTU::Engineering::Computer science and engineering
© 2012 The Authors. This paper was published in WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web Companion and is made available as an electronic reprint (preprint) with permission of The Authors. The paper can be found at the following official DOI: [http://dx.doi.org/10.1145/2187980.2188169]. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper is prohibited and is subject to penalties under law.