Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/46279
Title: Unifide framework for speaker-aware isolated word recognition
Authors: George Rosario Dhinesh
Keywords: DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
Issue Date: 2011
Source: George, R. D. (2011). Unifide framework for speaker-aware isolated word recognition. Master’s thesis, Nanyang Technological University, Singapore.
Abstract: The explosive growth of various kinds of personal electronic devices in recent years has spawned substantial interest in personalized voice-based human device interaction. There exists a need for robust and computationally-efficient techniques to help realize mobile and embedded computing applications that are capable of recognizing spoken words and the speaker who uttered them. Although spoken word recognition and speaker recognition are closely related problems with a number of commonalities, separate and different techniques are employed for solving them in the current state of the art. This thesis presents the research, development and prototyping of a speaker-aware isolated word recognition system based on a single, low-complexity technique suitable for resource-constrained mobile and embedded devices. A comprehensive literature survey has been carried out to study and evaluate the suitability of several existing techniques for embedded speaker-and-word recognition. Based on qualitative and performance analyses available in the literature, a framework based on Mel Frequency Cepstral Coefficients (MFCC) and Gaussian Mixture Model (GMM) has been chosen as the base for our work. An evaluation platform that is rapidly configurable according to the desired values of the parameters involved in the GMM process has been developed in order to expedite the experimentation process. The challenging problem of recognizing a speaker based on a single utterance of very short duration has been examined in detail. The effectiveness of GMM-based text-dependent and text-constrained speaker recognition approaches has been evaluated on the TI46 speech corpus resulting in a recognition accuracy of 99.28% and 96.6% respectively. We have proposed and evaluated a method of grouping similar sub-word units in text-constrained speaker recognition and obtained a recognition rate of 96.62%. A novel technique has been proposed in order to overcome the inability of GMM to retain the temporal information of the speech in word recognition. This technique relies on modeling a word as a time-ordered sequence of GMMs, where each GMM corresponds to a sub-word unit, so that the sequence of the sub-words is maintained.
URI: https://hdl.handle.net/10356/46279
DOI: 10.32657/10356/46279
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
SCEG0802965L.pdf8.5 MBAdobe PDFThumbnail
View/Open

Page view(s) 50

380
Updated on Jul 25, 2021

Download(s) 20

164
Updated on Jul 25, 2021

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.