Please use this identifier to cite or link to this item:
Title: Generative models for speech emotion synthesis
Authors: Raj, Nathanael S.
Keywords: DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Issue Date: 2019
Abstract: Several attempts have been made to synthesize speech from text. However, existing methods tend to generate speech that sound artificial and lack emotional content. In this project, we investigate using Generative Adversarial Networks (GANs) to generate emotional speech. WaveGAN (2019) was a first attempt at generating speech using raw audio waveforms. It produced natural sounding audio, including speech, bird chirpings and drums. In this project, we applied WaveGAN to emotional speech data from The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), using all 8 categories of emotion. We performed modifications on WaveGAN using advanced conditioning strategies, namely Sparse Vector Conditioning and introducing Auxiliary Classifiers. In experiments conducted with human listeners, we found that these methods greatly aided subjects in identifying the generated emotions correctly, and improved ease of intelligibility and quality of generated samples.
Schools: School of Computer Science and Engineering 
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
Final Report Submission.pdf
  Restricted Access
Final Report2.53 MBAdobe PDFView/Open

Page view(s)

Updated on Jul 17, 2024

Download(s) 50

Updated on Jul 17, 2024

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.