Please use this identifier to cite or link to this item:
Title: Image processing techniques for speech signal processing
Authors: Leow, Su Jun
Keywords: DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Issue Date: 2018
Source: Leow, S. J. (2018). Image processing techniques for speech signal processing. Master's thesis, Nanyang Technological University, Singapore.
Abstract: The purpose of this research is to examine the use of visual representation and image processing techniques for speech processing applications. This is inspired by the fact that human spectrogram readers can rely solely on visual cues in the spectrogram to perform recognition of words, even in the presence of differing channels and speakers. This suggests that features on the image spectrogram carry important information that can be harnessed for speech processing applications. It is postulated that the image representation better embeds contextual information that is required for human understanding of the speech context. Unfortunately, commonly used speech features, such as the Mel Frequency Cepstral Coefficients(MFCC), are largely frame-based. Therefore, the image representation of speech can serve to complement existing speech features to improve performance of existing speech tasks. In this work, we developed and applied a solely visual approach to solve speech problems. Two concrete examples of its application are given, so as to perform unsupervised speech segmentation and the detection of unit selection based synthesized speech for anti- spoofing. We first provide the necessary background by introducing common speech and image processing problems, and then draw the parallel of speech and image processing problems. Next, we introduce an image representation of speech to enable the application of image processing techniques. We then conclude the background with a survey of past attempts that uses image processing techniques on speech and acoustic processing tasks. Next, in the experiments section, we give an in-depth discussion of solving the two example speech tasks using a solely image-based solution. Finally, we wrap up the thesis with conclusions and future work.
DOI: 10.32657/10356/73231
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
masterThesisPrinting.pdfMaster Thesis report6.24 MBAdobe PDFThumbnail

Page view(s)

Updated on May 18, 2021

Download(s) 50

Updated on May 18, 2021

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.