Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/86625
Title: | A transfer learning approach to goodness of pronunciation based automatic mispronunciation detection | Authors: | Huang, Hao Xu, Haihua Hu, Ying Zhou, Gang |
Keywords: | Acoustic Analysis Speech Recognition |
Issue Date: | 2017 | Source: | Huang, H., Xu, H., Hu, Y., & Zhou, G. (2017). A transfer learning approach to goodness of pronunciation based automatic mispronunciation detection. The Journal of the Acoustical Society of America, 142(5), 3165-3177. | Series/Report no.: | The Journal of the Acoustical Society of America | Abstract: | Goodness of pronunciation (GOP) is the most widely used method for automatic mispronunciation detection. In this paper, a transfer learning approach to GOP based mispronunciation detection when applying maximum F1-score criterion (MFC) training to deep neural network (DNN)-hidden Markov model based acoustic models is proposed. Rather than train the whole network using MFC, a DNN is used, whose hidden layers are borrowed from native speech recognition with only the softmax layer trained according to the MFC objective function. As a result, significant mispronunciation detection improvement is obtained. In light of this, the two-stage transfer learning based GOP is investigated in depth. The first stage exploits the hidden layer(s) to extract phonetic-discriminating features. The second stage uses a trainable softmax layer to learn the human standard for judgment. The validation is carried out by experimenting with different mispronunciation detection architectures using acoustic models trained by different criteria. It is found that it is preferable to use frame-level cross-entropy to train the hidden layer parameters. Classifier based mispronunciation detection is further experimented with using features computed by transfer learning based GOP and it is shown that it also helps to achieve better results. | URI: | https://hdl.handle.net/10356/86625 http://hdl.handle.net/10220/44162 |
ISSN: | 0001-4966 | DOI: | 10.1121/1.5011159 | Research Centres: | Temasek Laboratories | Rights: | © 2017 Acoustical Society of America. This paper was published in Journal of the Acoustical Society of America and is made available as an electronic reprint (preprint) with permission of Acoustical Society of America. The published version is available at: [http://dx.doi.org/10.1121/1.5011159]. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper is prohibited and is subject to penalties under law. | Fulltext Permission: | open | Fulltext Availability: | With Fulltext |
Appears in Collections: | TL Journal Articles |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
A transfer learning approach to goodness of pronunciation based automatic mispronunciation detection.pdf | 383.97 kB | Adobe PDF | ![]() View/Open |
SCOPUSTM
Citations
20
29
Updated on Mar 18, 2025
Web of ScienceTM
Citations
20
14
Updated on Oct 30, 2023
Page view(s) 50
629
Updated on Mar 23, 2025
Download(s) 5
553
Updated on Mar 23, 2025
Google ScholarTM
Check
Altmetric
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.