Please use this identifier to cite or link to this item:
Full metadata record
DC FieldValueLanguage
dc.contributor.authorFang, Xuhuien_US
dc.identifier.citationFang, X. (2023). Non-reference speech quality assessment based on deep learning. Master's thesis, Nanyang Technological University, Singapore.
dc.description.abstractIn the field of speech processing, voice quality evaluation is one of the important techniques, and it has been widely used in mobile communications, Internet, public safety, digital entertainment, consumer electronics, and other fields. In the early days, there was only subjective voice quality assessment, but it required large human resources, annotated data and time. Hence, objective voice quality evaluation methods gradually became popular. Referenced speech quality assessment models require pure and raw speech signals, which are sometimes difficult to obtain in practice. As a result, the reference speech quality assessment method has received increased attention, especially in recent years. Many experts and researchers have integrated deep learning technology into reference speech quality assessment, which has made a major breakthrough in this field. However, the existing deep learning-based speech quality evaluation still has limitations such as insufficient accuracy and large number of parameters. In order to address these limitations, this dissertation studies the non-reference speech quality evaluation method based on deep learning, and the main research is summarized below: (1) Considering the problem that the accuracy of existing voice quality assessment is not high enough, this dissertation proposes an improvement method from multiple perspectives. This includes the use of BiLSTM(Bidirectional Long Short-Term Memory) to improve the time-dependent model, fully exploiting the ability of BiLSTM to effectively learn the speech context information. On this basis, the Squeeze-and-Excitation (SE) module is added to screen out the attention of the channels by learning the correlation between different channels in the feature map, so as to perform feature calibration on the feature map. In addition, a custom loss function based on the signal loss ratio is used to improve model fitting, which further improves the evaluation performance of the model. Experimental results show the effectiveness of this method. (2) For the problem that the existing speech quality evaluation model has a large number of parameters, we propose a low-complexity speech quality evaluation method based on depthwise residual convolution and Bidirectional Gate Recurrent Unit (BiGRU), the SE-DSResBGRU-NRSQA model\cite{CNN41}. The main goal of this model is to reduce the number of parameters, by using BiGRU and depthwise separable convolution, optimizing the convolution part with the main structure of residual network (ResNet), and using shallow feature information to improve the evaluation performance through direct mapping. On this basis, SE modules are added to learn the importance of different channels, so as to effectively exploit the input information and improve the evaluation performance of the system. From the experimental results, it can be seen that the proposed method can achieve good speech quality evaluation while the number of parameters is relatively small.en_US
dc.publisherNanyang Technological Universityen_US
dc.subjectEngineering::Electrical and electronic engineeringen_US
dc.titleNon-reference speech quality assessment based on deep learningen_US
dc.typeThesis-Master by Courseworken_US
dc.contributor.supervisorTan Yap Pengen_US
dc.contributor.schoolSchool of Electrical and Electronic Engineeringen_US
dc.description.degreeMaster of Science (Communications Engineering)en_US
item.fulltextWith Fulltext-
Appears in Collections:EEE Theses
Files in This Item:
File Description SizeFormat 
FANG XUHUI-dissertation.pdf
  Restricted Access
2.01 MBAdobe PDFView/Open

Page view(s)

Updated on Feb 26, 2024


Updated on Feb 26, 2024

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.