Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/72102
Full metadata record
DC FieldValueLanguage
dc.contributor.authorNguyen, Quy Hy
dc.date.accessioned2017-05-25T08:57:53Z
dc.date.available2017-05-25T08:57:53Z
dc.date.issued2017
dc.identifier.citationNguyen, Q. H. (2017). Voice conversion using deep neural networks. Master's thesis, Nanyang Technological University, Singapore.
dc.identifier.urihttp://hdl.handle.net/10356/72102
dc.description.abstractThis thesis focuses on techniques to improve the performance of voice conversion. Voice conversion modifies the recorded speech of a source speaker towards a given target speaker. The resultant speech is to sound like the target speaker with the language content unchanged. This technology has been applied to create personalized voice in text-to-speech or virtual avatar, speech-to-singing synthesis or spoofing attacks in speaker verification systems. To perform voice conversion, the usual approach is to create a conversion functions which is applied on the source speaker’s speech features such as timbre and prosodic features, to generate the corresponding target features. In this past decade, most of voice conversion researches had focused on spectral mapping, i.e. conversion of the features representing the timbre characteristics in a frame by frame manner. In chapter 3, we investigate a comprehensive approach to train the conversion function using DNN which considers both timbre and prosodic features simultaneously. For better modelling, we have used high-dimension spectral features. However, this further worsen the ability to robustly train a DNN which typically requires large amount of training data. To overcome the issue of limited training data, we propose a new pretraining process using autoencoder. The experimental results show the proposed comprehensive framework with pretraining performs better than conventional voice conversion systems including the state-of-the-art GMM-based system. The technique introduced in chapter 3 only learns a DNN system to convert between a pair of speaker. To reduce the need for parallel training data of new speaker pair, in chapter 4 we examine a novel DNN adaptation technology for voice conversion by including two bias vector representing both source and target speaker. By this configuration, new speaker pair conversion are archived. Our preliminary results show that conversion to new target speakers’ voices could be achieved.en_US
dc.format.extent56 p.en_US
dc.language.isoenen_US
dc.subjectDRNTU::Scienceen_US
dc.subjectDRNTU::Engineering::Computer science and engineeringen_US
dc.titleVoice conversion using deep neural networksen_US
dc.typeThesis
dc.contributor.supervisorChng Eng Siongen_US
dc.contributor.schoolSchool of Computer Science and Engineeringen_US
dc.description.degreeMaster of Engineering (SCE)en_US
dc.identifier.doi10.32657/10356/72102-
item.fulltextWith Fulltext-
item.grantfulltextopen-
Appears in Collections:SCSE Theses
Files in This Item:
File Description SizeFormat 
Thesis_Submission.pdf4.72 MBAdobe PDFThumbnail
View/Open

Page view(s) 50

200
Updated on Jan 23, 2021

Download(s) 50

63
Updated on Jan 23, 2021

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.