View Item 
      •   Home
      • 2. Research Centres and Institutes
      • Temasek Laboratories (TL)
      • TL Conference Papers
      • View Item
      •   Home
      • 2. Research Centres and Institutes
      • Temasek Laboratories (TL)
      • TL Conference Papers
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.
      Subject Lookup

      Browse

      All of DR-NTUCommunities & CollectionsTitlesAuthorsBy DateSubjectsThis CollectionTitlesAuthorsBy DateSubjects

      My Account

      Login

      Statistics

      Most Popular ItemsStatistics by CountryMost Popular Authors

      About DR-NTU

      Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech

      Thumbnail
      Author
      Xiao, Xiong
      Chng, Eng Siong
      Li, Haizhou
      Date of Issue
      2012
      Conference Name
      IEEE International Conference on Acoustics, Speech and Signal Processing (2012 : Kyoto, Japan)
      School
      School of Computer Engineering
      Research Centre
      Temasek Laboratories
      Abstract
      In this paper, we propose a framework for joint normalization of spectral and temporal statistics of speech features for robust speech recognition. Current feature normalization approaches normalize the spectral and temporal aspects of feature statistics separately to overcome noise and reverberation. As a result, the interaction between the spectral normalization (e.g. mean and variance normalization, MVN) and temporal normalization (e.g. temporal structure normalization, TSN) is ignored. We propose a joint spectral and temporal normalization (JSTN) framework to simultaneously normalize these two aspects of feature statistics. In JSTN, feature trajectories are filtered by linear filters and the filters' coefficients are optimized by maximizing a likelihood-based objective function. Experimental results on Aurora-5 benchmark task show that JSTN consistently out-performs the cascade of MVN and TSN on test data corrupted by both additive noise and reverberation, which validates our proposal. Specifically, JSTN reduces average word error rate by 8-9% relatively over the cascade of MVN and TSN for both artificial and real noisy data.
      Subject
      DRNTU::Engineering::Computer science and engineering
      Type
      Conference Paper
      Rights
      © 2012 IEEE.
      Collections
      • TL Conference Papers
      http://dx.doi.org/10.1109/ICASSP.2012.6288876
      Get published version (via Digital Object Identifier)

      Show full item record


      NTU Library, Nanyang Avenue, Singapore 639798 © 2011 Nanyang Technological University. All rights reserved.
      DSpace software copyright © 2002-2015  DuraSpace
      Contact Us | Send Feedback
      Share |    
      Theme by 
      Atmire NV
       

       


      NTU Library, Nanyang Avenue, Singapore 639798 © 2011 Nanyang Technological University. All rights reserved.
      DSpace software copyright © 2002-2015  DuraSpace
      Contact Us | Send Feedback
      Share |    
      Theme by 
      Atmire NV
       

       

      DCSIMG