Please use this identifier to cite or link to this item:
Title: Voice conversion by speech synthesis
Authors: Lee, Ming Hui.
Keywords: DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing
Issue Date: 2009
Abstract: Speech signal contains two kinds of information. They are: (i) The message the speaker wants to convey to the listener and (ii) the characteristics of the speaker. In this project, we focus on the analysis and manipulation of speaker characteristics embedded in the speech signal for voice conversion. Voice conversion involves transformation of the speaker characteristics in the speech uttered by a speaker (source speaker), so as to generate speech having the voice characteristics of the desired speaker (target speaker). Voice characteristics lie at the linguistic, suprasegmental and segmental levels. The speaker characteristics at the linguistic and suprasegmental levels are learned features. Hence they are difficult to derive from data and model. On the other hand, speaker characteristics at the segmental level can be attributed to the speech production mechanism and they are reflected in the source and system characteristics of the physical system. This mechanism that models after the human speech production is known as source-filter and the two models that are looked at are linear prediction (LP) and formant. But research has shown that the quality of the synthesis using the LP synthesizer is superior to that using the formant synthesizer and since linear prediction is the most primitive methodology, it will serve as an appropriate baseline for beginners in the area of speech processing. Thus, this will form the central idea of this project. To start, with little knowledge in speech signal processing prior to this project and for specialized data sets such as speech, it is necessary to gain understanding of the acoustic features and properties of speech data before advancing the field of speech analysis and synthesis. Using Matlab, routines and functions with graphical user interface support are implemented to enable user to step through the program runtime execution with ease. The programs are closely referenced and built on existing toolboxes. Finally, performance of the system for converting speech from one voice to another is summarized, tabulated and discussed. Drawbacks and shortcomings are determined and examined. Methods involved in evaluating these transformations of the voice conversion system are studied and subjective test is the method employed for evaluation of the results obtained in this project. The report concludes with an application that voice conversion has served as an invaluable tool; speech-to-speech translation is briefly looked at.
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
  Restricted Access
2.15 MBAdobe PDFView/Open

Page view(s)

checked on Sep 26, 2020


checked on Sep 26, 2020

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.