Combined articulatory and auditory processing for improved speech recognition
Er, Meng Joo
Date of Issue2011
IEEE Conference on Industrial Electronics and Applications (7th : 2012 : Singapore)
School of Electrical and Electronic Engineering
In this paper, we examined the feasibility of articulatory phonetic inversion (API) conditioned on the auditory qualities for improved speech recognition. And we introduced an efficient data-driven heuristic learning algorithm to capture the articulatory-phonetic features (APFs) of English speech. Then we reported the performance of the combined auditory and articulatory processing methods in the inversion and recognition experiments. Firstly, at the front end, the auditory based bark-frequency cepstral coefficient (BFCC) obtained equivalent or higher accuracy compared to the mel-frequency cepstral coefficient (MFCC). Secondly, the use of APFs also significantly altered the phoneme error patterns compared to the purely acoustic features, and they displayed advantages over the canonical pseudo-articulatory features (PAFs) which are manually derived from the phonological rules. The observations support our view that the combinational use of auditory and articulatory cues is beneficial for speech pattern classification. And the proposed neural based API model qualifies as a competitive candidate for profound phoneme recognition with salient features such as generality and portability.
DRNTU::Engineering::Electrical and electronic engineering