One seed to find them all : mining opinion features via association
Date of Issue2012
International conference on Information and knowledge management (21st : 2012 : Maui, USA)
School of Computer Engineering
Feature-based opinion analysis has attracted extensive attention recently. Identifying features associated with opinions expressed in reviews is essential for fine-grained opinion mining. One approach is to exploit the dependency relations that occur naturally between features and opinion words, and among features (or opinion words) themselves. In this paper, we propose a generalized approach to opinion feature extraction by incorporating robust statistical association analysis in a bootstrapping framework. The new approach starts with a small set of feature seeds, on which it iteratively enlarges by mining feature-opinion, feature-feature, and opinion-opinion dependency relations. Two association model types, namely likelihood ratio tests (LRT) and latent semantic analysis (LSA), are proposed for computing the pair-wise associations between terms (features or opinions). We accordingly propose two robust bootstrapping approaches, LRTBOOT and LSABOOT, both of which need just a handful of initial feature seeds to bootstrap opinion feature extraction. We benchmarked LRTBOOT and LSABOOT against existing approaches on a large number of real-life reviews crawled from the cellphone and hotel domains. Experimental results using varying number of feature seeds show that the proposed association-based bootstrapping approach significantly outperforms the competitors. In fact, one seed feature is all that is needed for LRTBOOT to significantly outperform the other methods. This seed feature can simply be the domain feature, e.g., "cellphone" or "hotel". The consequence of our discovery is far reaching: starting with just one feature seed, typically just the domain concept word, LRTBOOT can automatically extract a large set of high-quality opinion features from the corpus without any supervision or labeled features. This means that the automatic creation of a set of domain features is no longer a pipe dream!
© 2012 ACM.