Please use this identifier to cite or link to this item:
Title: Sentiment analysis using image, text and video
Authors: Chen, Qian
Keywords: Engineering::Computer science and engineering
Issue Date: 2022
Publisher: Nanyang Technological University
Source: Chen, Q. (2022). Sentiment analysis using image, text and video. Doctoral thesis, Nanyang Technological University, Singapore.
Abstract: Emotions and sentiments play a pivotal role in the modern society. In most human-centric environments, they are essential to assist decision-making, communication, and situation awareness. With the explosive increase in usage of social media (text, image and video) along with sentiment polarities for specific subjects (e.g., product reviews, political views and depression emotions), sentiment analysis has increasingly evolved as a subcomponent technology in lots of industries. People are able to present their experience and feelings using images and there is a trend that people prefer image rather than just text. Compared with text, images provide more cues that better reflect people’s sentiments and people can get a more perceptual intuition of sentiment. Particularly for the depression recognition problem in healthcare field, images containing human faces present emotions more intuitively with the face expressions. Hence, prediction of sentiment from visual cues is complementary to textual sentiment analysis. In this dissertation, studies are conducted to explore the sentiment analysis on media data ranging from image, image-text, to video data. We start from sentiment analysis on image data to explore the sentiment polarities. Then, investigations of sentiment analysis are conducted on images and their tags/captions, as such two types of data modalities provide more cues for improved sentiment analysis. Last, we explore the mystery of human emotions and dive into the issue of depression analysis on face videos. The main contributions of this thesis can be summarized as follows. Firstly, for a single image, it may contain several concepts. To model the sequence of different sentiments of such concepts, we consider a Recurrent Neural Networks (RNN) besides Convolutional Neural Network (CNN). The proposed Convolutional Recurrent Image Sentiment Classification (CRISC) model is able to analyze the sentiments of the context in one image without using the labels for the visual concepts. Secondly, to explore the benefit of text data for image sentiment analysis, we propose to extract visual features by fine-tuning a 2D-CNN pre-trained on a large-scale image dataset and extract textual features using AffectiveSpace of English concepts. We propose a novel sentiment score to combine the image and text predictions and evaluate our model on the dataset of images with corresponding labels and captions. We show that accuracy by merging scores from text and image models is higher than using any one system alone. Finally, we investigate multimodal facial depression representation by using facial dynamics and facial appearance. To mine the correlated and complementary depression patterns in multimodal learning, we consider a chained-fusion mechanism to jointly learn facial appearance and dynamics in a unified framework. Therefore, this dissertation demonstrates our studies on image sentiment analysis, focusing particularly on facial depression recognition.
DOI: 10.32657/10356/161285
Schools: School of Computer Science and Engineering 
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
main_thesis.pdf3.27 MBAdobe PDFThumbnail

Page view(s)

Updated on Sep 30, 2023

Download(s) 20

Updated on Sep 30, 2023

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.