Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/162646
Title: Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition
Authors: Guo, Lili
Wang, Longbiao
Dang, Jianwu
Chng, Eng Siong
Nakagawa, Seiichi
Keywords: Engineering::Computer science and engineering
Issue Date: 2022
Source: Guo, L., Wang, L., Dang, J., Chng, E. S. & Nakagawa, S. (2022). Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition. Speech Communication, 136, 118-127. https://dx.doi.org/10.1016/j.specom.2021.11.005
Journal: Speech Communication
Abstract: The complete acoustic features include magnitude and phase information. However, traditional speech emotion recognition methods only focus on the magnitude information and ignore the phase data, and will inevitably miss some information. This study explores the accurate extraction and effective use of phase features for speech emotion recognition. First, the reflection of speech emotion in the phase spectrum is analyzed, and a quantitative analysis shows that phase data contain information that can be used to distinguish emotions. A dynamic relative phase (DRP) feature extraction method is then proposed to solve the problem in which the original relative phase (RP) has difficulty determining the base frequency and further alleviating the dependence of the phase on the clipping position of the frame. Finally, a single-channel model (SCM) and a multi-channel model with an attention mechanism (MCMA) are constructed to effectively integrate the phase and magnitude information. By introducing phase information, more complete acoustic features are captured, which enriches the emotional representations. The experiments were conducted using the Emo-DB and IEMOCAP databases. Experimental results demonstrate the effectiveness of the proposed DRP for speech emotion recognition, as well as the complementarity between the phase and magnitude information in speech emotion recognition.
URI: https://hdl.handle.net/10356/162646
ISSN: 0167-6393
DOI: 10.1016/j.specom.2021.11.005
Rights: © 2021 Elsevier B.V. All rights reserved.
Fulltext Permission: none
Fulltext Availability: No Fulltext
Appears in Collections:SCSE Journal Articles

SCOPUSTM   
Citations 50

3
Updated on Nov 26, 2022

Web of ScienceTM
Citations 50

3
Updated on Dec 2, 2022

Page view(s)

12
Updated on Dec 5, 2022

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.