Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/144854
Title: | Speaker and phoneme-aware speech bandwidth extension with residual dual-path network | Authors: | Hou, Nana Xu, Chenglin Pham, Van Tung Zhou, Joey Tianyi Chng, Eng Siong Li, Haizhou |
Keywords: | Engineering::Computer science and engineering | Issue Date: | 2020 | Source: | Hou, N., Xu, C., Pham, V. T., Zhou, J. T., Chng, E. S., & Li, H. (2020). Speaker and phoneme-aware speech bandwidth extension with residual dual-path network. Interspeech 2020, 4064-4068. | Conference: | Interspeech 2020 | Abstract: | Speech bandwidth extension aims to generate a wideband signal from a narrowband (low-band) input by predicting the missing high-frequency components. It is believed that the general knowledge about the speaker and phonetic content strengthens the prediction. In this paper, we propose to augment the low-band acoustic features with i-vector and phonetic posteriorgram (PPG), which represent speaker and phonetic content of the speech, respectively. We also propose a residual dual-path network (RDPN) as the core module to process the augmented features, which fully utilizes the utterance-level temporal continuity information and avoids gradient vanishing. Experiments show that the proposed method achieves 20.2% and 7.0% relative improvements over the best baseline in terms of log-spectral distortion (LSD) and signal-to-noise ratio (SNR), respectively. Furthermore, our method is 16 times more compact than the best baseline in terms of the number of parameters. | URI: | https://hdl.handle.net/10356/144854 | Schools: | School of Computer Science and Engineering | Research Centres: | Air Traffic Management Research Institute | Rights: | © 2020 International Speech Communication Association (ISCA). All rights reserved. This paper was published in Interspeech 2020 and is made available with permission of International Speech Communication Association (ISCA). | Fulltext Permission: | open | Fulltext Availability: | With Fulltext |
Appears in Collections: | ATMRI Conference Papers |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Speaker and Phoneme-Aware Speech Bandwidth Extension with Residual Dual-Path Network.pdf | 2.41 MB | Adobe PDF | ![]() View/Open |
Page view(s)
417
Updated on May 7, 2025
Download(s) 20
247
Updated on May 7, 2025
Google ScholarTM
Check
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.