Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/84664
Title: Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition
Authors: Nguyen, Duc Hoang Ha
Xiao, Xiong
Chng, Eng Siong
Li, Haizhou
Keywords: Feature adaptation
Temporal filtering
Issue Date: 2016
Source: Nguyen, D. H. H., Xiao, X., Chng, E. S., & Li, H. (2016). Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(6), 1006-1019.
Series/Report no.: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Abstract: Spectral information represents short-term speech information within a frame of a few tens of milliseconds, while temporal information captures the evolution of speech statistics over consecutive frames. Motivated by the findings that human speech comprehension relies on the integrity of both the spectral content and temporal envelope of speech signal, we study a spectro-temporal transform framework that adapts run-time speech features to minimize the mismatch between run-time and training data, and its implementation that includes cross transform and cascaded transform. A Kullback-Leibler divergence based cost function is proposed to estimate the transform parameters. We conducted experiments on the REVERB Challenge 2014 task, where clean and multi-condition trained acoustic models are tested with real reverberant and noisy speech. We found that temporal information is important for reverberant speech recognition and the simultaneous use of spectral and temporal information for feature adaptation is effective. We also investigate the combination of the cross transform with fMLLR, the combination of batch, utterance and speaker mode adaptation, and multicondition adaptive training using proposed transforms. All experiments consistently report significant word error rate reductions.
URI: https://hdl.handle.net/10356/84664
http://hdl.handle.net/10220/41916
ISSN: 2329-9290
DOI: 10.1109/TASLP.2016.2522646
Schools: School of Computer Science and Engineering 
Rights: © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [http://dx.doi.org/10.1109/TASLP.2016.2522646].
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Journal Articles

Files in This Item:
File Description SizeFormat 
Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition.pdf924.42 kBAdobe PDFThumbnail
View/Open

SCOPUSTM   
Citations 50

7
Updated on May 7, 2025

Web of ScienceTM
Citations 20

5
Updated on Oct 27, 2023

Page view(s) 50

526
Updated on May 7, 2025

Download(s) 20

241
Updated on May 7, 2025

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.