Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics
Date of Issue2019-05-13
School of Computer Science and Engineering
Sentiment analysis or opinion mining is a task concerning identifying, extracting and quantifying the sentiment orientations or affective states. The task utilizes a synthesis of techniques like natural language processing, computational linguistics, text mining and so forth. Under its big umbrella, various sub-tasks exist, such as subjectivity detection, sentiment classification, named entity recognition, and sarcasm detection etc. Large quantities of research work that studied the aforementioned tasks were conducted on the English language, due to the popularity of English on the international platform and, thus, its abundance of language resource. Although this research could be applied to other Indo-European languages, they are deficient in performing on many oriental languages, especially on the Chinese language. This was caused by the specific characteristics of the Chinese language. Inspired by linguistics, this thesis discusses the situations and features that make the Chinese language different from English and proposes corresponding approaches to utilize these opportunities. In the beginning, we reviewed the literature on Chinese sentiment analysis research. Amongst which we noticed that existing Chinese sentiment resource was relatively scarce compared to other languages. This was reflected in two aspects: no semantic connection between words and missing sentiment intensity (fine-grained) measure. Thus, we proposed an unsupervised method to construct a semantic-connected valence Chinese sentiment resource. The mapping-based method leveraged on multiple multilingual and sentiment resources, such as WordNet etc. Next, we found that Chinese word segmentation could be a source of errors in sentiment analysis, especially in a non-general domain, such as finance or medical. In addition, we analyzed that intra-character components (radicals) of Chinese text carry semantics due to its origin of the pictogram (or ideogram). To this end, we proposed a radical-based hierarchical character embedding to skip the word segmentation step and also to inject intra-character semantics to the text representation. The new text representation outperformed word-level representation by a considerable margin in the sentiment classification task. When we tried to extend the hierarchical embedding to aspect-based sentiment analysis task, we realized that existing methods all tend to take the averaged embeddings of multi-word aspect target to represent the aspect target. This assumption will work in English on the condition that the proportion of multi-word aspect target is relatively low. However, almost all Chinese aspect targets are multi-character targets. Thus, we introduced an aspect target sequence modeling (ATSM) network to specifically learn adaptive aspect target representation based on sentence context and ATSM-Fusion network to consider the multi-granularity feature of Chinese text. The ATSM model alone achieved the state-of-the-art performance in English ABSA and ATSM-Fusion pushed the Chinese ABSA performance higher. In addition to addressing Chinese sentiment analysis from textual modality, we proposed to incorporate phonetic information for textual sentiment analysis. We introduce two effective features to encode phonetic information. Then, we developed a disambiguate intonation for sentiment analysis (DISA) network using a reinforcement network. It functions as disambiguating intonations for each Chinese character (pinyin). Thus, a precise phonetic representation of Chinese is learned. Furthermore, we fused phonetic features with textual and visual features in order to mimic the way humans read and understand Chinese text. Experimental results show that the inclusion of phonetic features significantly and consistently improves the performance of textual and visual representations In summary, this thesis introduces several approaches to Chinese sentiment analysis, addressing and utilizing the linguistic characteristics (e.g., compositionality, multi-granularity, phonology) that distinguish Chinese from other languages.
DRNTU::Engineering::Computer science and engineering