dc.contributor.authorDo, Van Hai
dc.date.accessioned2015-09-08T07:24:56Z
dc.date.accessioned2017-07-23T08:30:30Z
dc.date.available2015-09-08T07:24:56Z
dc.date.available2017-07-23T08:30:30Z
dc.date.copyright2015en_US
dc.date.issued2015
dc.identifier.citationDo, V. H. (2015). Acoustic modeling for speech recognition under limited training data conditions. Doctoral thesis, Nanyang Technological University, Singapore.
dc.identifier.urihttp://hdl.handle.net/10356/65409
dc.description.abstractThe development of a speech recognition system requires at least three resources: a large labeled speech corpus to build the acoustic model, a pronunciation lexicon to map words to phone sequences, and a large text corpus to build the language model. For many languages such as dialects or minority languages, these resources are limited or even unavailable - we label these languages as under-resourced. In this thesis, the focus is to develop reliable acoustic models for under-resourced languages. The following three works have been proposed. In the first work, reliable acoustic models are built by transferring acoustic information from well-resourced languages (source) to under-resourced languages (target). Specifically, the phone models of the source language are reused to form the phone models of the target language. This is motivated by the fact that all human languages share a similar acoustic space, and hence some acoustic units e.g. phones, of two languages may have high correspondence and therefore allows the mapping of phones between languages. Unlike previous studies which examined only context-independent phone mapping, the thesis extends the studies to use context-dependent triphone states as the units to achieve higher acoustic resolution. In addition, linear and nonlinear mapping models with different training algorithms are also investigated. The results show that the nonlinear mapping with discriminative training criterion achieves the best performance in the proposed work. In the second work, rather than increasing the mapping resolution, the focus is to improve the quality of the cross-lingual feature used for mapping. Two approaches based on deep neural networks (DNNs) are examined. First, DNNs are used as the source language acoustic model to generate posterior features for phone mapping. Second, DNNs are used to replace multilayer perceptrons (MLPs) to realize the phone mapping. Experimental results show that better phone posteriors generated from the source DNNs result in a significant improvement in cross-lingual phone mapping, while deep structures for phone mapping are only useful when sufficient target language training data are available. The third work focuses on building a robust acoustic model using the exemplar-based modeling technique. Exemplar-based model is non-parametric and uses the training samples directly during recognition without training model parameters. This study uses a specific exemplar-based model, called kernel density, to estimate the likelihood of target language triphone states. To improve performance for under-resourced languages, cross-lingual bottleneck feature is used. In the exemplar-based technique, the major design consideration is the choice of distance function used to measure the similarity of a test sample and a training sample. This work proposed a Mahalanobis distance based metric optimized by minimizing the classification error rate on the training data. Results show that the proposed distance produces better results than the Euclidean distance. In addition, a discriminative score tuning network, using the same principle of minimizing training classification error, is also proposed.en_US
dc.format.extent144 p.en_US
dc.language.isoenen_US
dc.subjectDRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognitionen_US
dc.titleAcoustic modeling for speech recognition under limited training data conditionsen_US
dc.typeThesis
dc.contributor.researchEmerging Research Laben_US
dc.contributor.schoolSchool of Computer Engineeringen_US
dc.contributor.supervisorChng Eng Siongen_US
dc.contributor.supervisorLi Haizhouen_US
dc.description.degreeDOCTOR OF PHILOSOPHY (SCE)en_US
dc.identifier.doihttps://doi.org/10.32657/10356/65409


Files in this item

FilesSizeFormatView
main_thesis16_3.pdf1.918Mbapplication/pdfView/Open

This item appears in the following Collection(s)

Show simple item record