Please use this identifier to cite or link to this item:
Title: Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech
Authors: Xiao, Xiong
Chng, Eng Siong
Li, Haizhou
Keywords: DRNTU::Engineering::Computer science and engineering
Issue Date: 2012
Source: Xiao, X., Chng, E. S., & Li, H. (2012). Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech. 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4325-4328.
Abstract: In this paper, we propose a framework for joint normalization of spectral and temporal statistics of speech features for robust speech recognition. Current feature normalization approaches normalize the spectral and temporal aspects of feature statistics separately to overcome noise and reverberation. As a result, the interaction between the spectral normalization (e.g. mean and variance normalization, MVN) and temporal normalization (e.g. temporal structure normalization, TSN) is ignored. We propose a joint spectral and temporal normalization (JSTN) framework to simultaneously normalize these two aspects of feature statistics. In JSTN, feature trajectories are filtered by linear filters and the filters' coefficients are optimized by maximizing a likelihood-based objective function. Experimental results on Aurora-5 benchmark task show that JSTN consistently out-performs the cascade of MVN and TSN on test data corrupted by both additive noise and reverberation, which validates our proposal. Specifically, JSTN reduces average word error rate by 8-9% relatively over the cascade of MVN and TSN for both artificial and real noisy data.
DOI: 10.1109/ICASSP.2012.6288876
Rights: © 2012 IEEE.
Fulltext Permission: none
Fulltext Availability: No Fulltext
Appears in Collections:TL Conference Papers

Citations 50

Updated on Jan 27, 2023

Page view(s) 20

Updated on Jan 29, 2023

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.