Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/153198
Title: Emotion analysis from speech
Authors: Mus'ifah Amran
Keywords: Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Issue Date: 2021
Publisher: Nanyang Technological University
Source: Mus'ifah Amran (2021). Emotion analysis from speech. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/153198
Abstract: Speech is the first form of communication that humans instinctively use without thought and most times, our emotions are expressed though them. Emotion in speech helps us in forming interpersonal connections. The process of producing emotions in speech comes from specific acoustic patterns. Speech emotion recognition systems extract those acoustic features to identify emotions in utterances and analyse the link between those acoustic features and their respective emotions. There are different techniques to perform speech emotion recognition such as using deep neural networks, Hidden Markov models and many more. In this report, we focus on the deep learning techniques to infer emotion from speech with models from an existing work by approaching it as an image classification problem. We focus on three networks, specifically AlexNet, Fully Convolutional Network with Global Average Pooling and Residual Network. As the prior two networks have been trained with the IEMOCAP corpus, ResNet is also trained to compare the models’ performance. The three models are then trained again on a down sampled IEMOCAP corpus and the THAI SER corpus. The models were evaluated using k-fold cross validation in line with publications using the same approach. The models from Ng [1] are used a benchmark for ResNet model implemented here. From the experiments conducted, no single model achieved high accuracies with the different corpus. Stability Training implemented from [1] was updated with tuning of α-parameter and the addition of environment noises. From the three models, Fully Convolutional Network achieved a 0.9% increase in accuracy from its result in [1]. It surpassed the benchmark accuracy of AlexNet by 0.2%.
URI: https://hdl.handle.net/10356/153198
Schools: School of Computer Science and Engineering 
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
Emotion Analysis from Speech FYP Report.pdf
  Restricted Access
1.19 MBAdobe PDFView/Open

Page view(s)

127
Updated on Jun 6, 2023

Download(s)

6
Updated on Jun 6, 2023

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.