Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/144854
Title: Speaker and phoneme-aware speech bandwidth extension with residual dual-path network
Authors: Hou, Nana
Xu, Chenglin
Pham, Van Tung
Zhou, Joey Tianyi
Chng, Eng Siong
Li, Haizhou
Keywords: Engineering::Computer science and engineering
Issue Date: 2020
Source: Hou, N., Xu, C., Pham, V. T., Zhou, J. T., Chng, E. S., & Li, H. (2020). Speaker and phoneme-aware speech bandwidth extension with residual dual-path network. Interspeech 2020, 4064-4068.
Conference: Interspeech 2020
Abstract: Speech bandwidth extension aims to generate a wideband signal from a narrowband (low-band) input by predicting the missing high-frequency components. It is believed that the general knowledge about the speaker and phonetic content strengthens the prediction. In this paper, we propose to augment the low-band acoustic features with i-vector and phonetic posteriorgram (PPG), which represent speaker and phonetic content of the speech, respectively. We also propose a residual dual-path network (RDPN) as the core module to process the augmented features, which fully utilizes the utterance-level temporal continuity information and avoids gradient vanishing. Experiments show that the proposed method achieves 20.2% and 7.0% relative improvements over the best baseline in terms of log-spectral distortion (LSD) and signal-to-noise ratio (SNR), respectively. Furthermore, our method is 16 times more compact than the best baseline in terms of the number of parameters.
URI: https://hdl.handle.net/10356/144854
Schools: School of Computer Science and Engineering 
Research Centres: Air Traffic Management Research Institute 
Rights: © 2020 International Speech Communication Association (ISCA). All rights reserved. This paper was published in Interspeech 2020 and is made available with permission of International Speech Communication Association (ISCA).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:ATMRI Conference Papers

Files in This Item:
File Description SizeFormat 
Speaker and Phoneme-Aware Speech Bandwidth Extension with Residual Dual-Path Network.pdf2.41 MBAdobe PDFThumbnail
View/Open

Page view(s)

417
Updated on May 7, 2025

Download(s) 20

247
Updated on May 7, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.