dc.contributor.authorWang, Lei
dc.date.accessioned2013-04-11T04:33:30Z
dc.date.accessioned2017-07-23T08:29:44Z
dc.date.available2013-04-11T04:33:30Z
dc.date.available2017-07-23T08:29:44Z
dc.date.copyright2012en_US
dc.date.issued2012
dc.identifier.citationWang, L. (2012). Audio pattern discovery and retrieval. Doctoral thesis, Nanyang Technological University, Singapore.
dc.identifier.urihttp://hdl.handle.net/10356/51781
dc.description.abstractThis thesis explores unsupervised algorithms for pattern discovery and retrieval in audio and speech data. In this work, audio pattern is defined as repeating audio content such as repeating music segments or words/short phrases in speech recordings. The meanings of “pattern” will be defined separately for different types of data, for example, repeating pattern discovery in music will extract segments with similar melody in music piece; In human speech, the same words/short phrases spoken by single or multiple speakers are also defined as speech patterns; In broadcast audio, repeated commercials/logo music are also considered as patterns. Previous work on audio pattern discovery focuses on either symbolizing the audio signal into token sequences followed by text-based search or using Brute-Force search techniques such as self-similarity matrix and Dynamic Time Warping. Symbolization process that relies on Vector Quantization or other modelling techniques may suffer from misclassification errors, and the exhaustive search requires high computation cost and can also be affected by channel distortion and speaker variation in audio data. Such limitations motivate me to explore more efficient and robust approaches to automatically detect repeating information in audio data. In this thesis, different unsupervised techniques are examined to analyze music and speech separately. For music, an efficient approach which extends Ukkonon's suffix tree construction algorithm is proposed to detect repeating segments. For speech data, an iterative merging approach which is based on Acoustic Segment Model (ASM) is proposed to discover recurrent phrases/words in speech. This thesis also explores the techniques of searching audio pattern in broadcast audio which consists of diverse content such as speech, music/songs, commercials, sound effects and background noise. Existing audio pattern retrieval techniques focus only on specific audio types so that their applications are limited and cannot be applied generally. In this work, a robust query-by-example framework is proposed for retrieving mixed speech and music pattern, where the ASM is examined to model music data. To verify the research, the proposed techniques are applied on both public domain audio database such as TIDIGITS corpus as well as TRECVID database and a self-collection of 30 English pop songs. The experimental results show that the proposed work achieves robust and better performance to existing techniques.en_US
dc.format.extent137 p.en_US
dc.language.isoenen_US
dc.subjectDRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognitionen_US
dc.titleAudio pattern discovery and retrievalen_US
dc.typeThesis
dc.contributor.researchEmerging Research Laben_US
dc.contributor.schoolSchool of Computer Engineeringen_US
dc.contributor.supervisorChng Eng Siongen_US
dc.contributor.supervisorLi Haizhouen_US
dc.description.degreeDOCTOR OF PHILOSOPHY (SCE)en_US
dc.identifier.doihttps://doi.org/10.32657/10356/51781


Files in this item

FilesSizeFormatView
TsceG0500768D.pdf2.217Mbapplication/pdfView/Open

This item appears in the following Collection(s)

Show simple item record