Please use this identifier to cite or link to this item:
Title: Multimodal audio-visual emotion detection
Authors: Chaudhary, Nitesh Kumar
Keywords: Engineering::Computer science and engineering
Issue Date: 2021
Publisher: Nanyang Technological University
Source: Chaudhary, N. K. (2021). Multimodal audio-visual emotion detection. Master's thesis, Nanyang Technological University, Singapore.
Abstract: Audio and visual utterances in video are temporally and semantically dependent to each other so modeling of temporal and contextual characteristics plays a vital role in understanding of conflicting or supporting emotional cues in audio-visual emotion recognition (AVER). We introduced a novel temporal modelling with contextual features for audio and video hierarchies to AVER. To extract abstract temporal information, we first build temporal audio and visual sequences that are then fed into large Convolutional Neural Network (CNN) embeddings. We trained a recurrent network to capture contextual semantics from temporal interdependencies of audio and video streams by using the abstract temporal information. The encapsulated AVER approach is end-to-end trainable and enhances the state-of-art accuracies with a greater margin.
DOI: 10.32657/10356/153490
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
Revised_Nitesh Kumar Chaudhary_Thesis.pdfFinal Thesis - CHAUDHARY NITESH KUMAR, G1802997E, M.ENG. (SCSE)3.49 MBAdobe PDFView/Open

Page view(s)

Updated on May 20, 2022

Download(s) 50

Updated on May 20, 2022

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.