Please use this identifier to cite or link to this item:
|Title:||Correlation-based frequency warping for voice conversion||Authors:||Tian, Xiaohai
Chng, Eng Siong
|Keywords:||DRNTU::Engineering::Computer science and engineering
|Issue Date:||2014||Source:||Tian, X., Wu, Z., Lee, S.-W., & Chng, E. S. (2014). Correlation-based frequency warping for voice conversion. The 9th International Symposium on Chinese Spoken Language Processing, 211-215. doi:10.1109/ISCSLP.2014.6936725||Series/Report no.:||Abstract:||Frequency warping (FW) based voice conversion aims to modify the frequency axis of source spectra towards that of the target. In previous works, the optimal warping function was calculated by minimizing the spectral distance of converted and target spectra without considering the spectral shape. Nevertheless, speaker timbre and identity greatly depend on vocal tract peaks and valleys of spectrum. In this paper, we propose a method to define the warping function by maximizing the correlation between the converted and target spectra. Different from the conventional warping methods, the correlation-based optimization is not determined by the magnitude of the spectra. Instead, both spectral peaks and valleys are considered in the optimization process, which also improves the performance of amplitude scaling. Experiments were conducted on VOICES database, and the results show that after amplitude scaling our proposed method reduced the mel-spectral distortion from 5.85 dB to 5.60 dB. The subjective listening tests also confirmed the effectiveness of the proposed method.||URI:||https://hdl.handle.net/10356/89598
|DOI:||http://dx.doi.org/10.1109/ISCSLP.2014.6936725||Rights:||© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [http://dx.doi.org/10.1109/ISCSLP.2014.6936725].||metadata.item.grantfulltext:||open||metadata.item.fulltext:||With Fulltext|
|Appears in Collections:||SCSE Conference Papers|
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.