Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/98037
Title: One seed to find them all : mining opinion features via association
Authors: Hai, Zhen
Chang, Kuiyu
Cong, Gao
Issue Date: 2012
Source: Hai, Z., Chang, K., & Cong, G. (2012). One seed to find them all: mining opinion features via association. Proceedings of the 21st ACM international conference on Information and knowledge management.
Abstract: Feature-based opinion analysis has attracted extensive attention recently. Identifying features associated with opinions expressed in reviews is essential for fine-grained opinion mining. One approach is to exploit the dependency relations that occur naturally between features and opinion words, and among features (or opinion words) themselves. In this paper, we propose a generalized approach to opinion feature extraction by incorporating robust statistical association analysis in a bootstrapping framework. The new approach starts with a small set of feature seeds, on which it iteratively enlarges by mining feature-opinion, feature-feature, and opinion-opinion dependency relations. Two association model types, namely likelihood ratio tests (LRT) and latent semantic analysis (LSA), are proposed for computing the pair-wise associations between terms (features or opinions). We accordingly propose two robust bootstrapping approaches, LRTBOOT and LSABOOT, both of which need just a handful of initial feature seeds to bootstrap opinion feature extraction. We benchmarked LRTBOOT and LSABOOT against existing approaches on a large number of real-life reviews crawled from the cellphone and hotel domains. Experimental results using varying number of feature seeds show that the proposed association-based bootstrapping approach significantly outperforms the competitors. In fact, one seed feature is all that is needed for LRTBOOT to significantly outperform the other methods. This seed feature can simply be the domain feature, e.g., "cellphone" or "hotel". The consequence of our discovery is far reaching: starting with just one feature seed, typically just the domain concept word, LRTBOOT can automatically extract a large set of high-quality opinion features from the corpus without any supervision or labeled features. This means that the automatic creation of a set of domain features is no longer a pipe dream!
URI: https://hdl.handle.net/10356/98037
http://hdl.handle.net/10220/12292
DOI: 10.1145/2396761.2396797
Rights: © 2012 ACM.
Fulltext Permission: none
Fulltext Availability: No Fulltext
Appears in Collections:SCSE Conference Papers

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.