Please use this identifier to cite or link to this item:
Title: Speaker diarization of news broacasts and meeting recordings
Authors: Koh, Eugene Chin Wei
Keywords: DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
Issue Date: 2009
Source: Koh, E. C. W. (2009). Speaker diarization of news broacasts and meeting recordings. Master’s thesis, Nanyang Technological University, Singapore.
Abstract: Given a piece of audio recording, the task of speaker diarization can be summarized as answering the question of “Who spoke when ?”. This thesis offers a review of the techniques and issues relating to performing speaker diarization on broadcast news recordings, as well as meeting recordings. The broadcast news domain is generally regarded to be simpler because the turn taking between speakers is better controlled and audio quality tends to be higher. The typical approach used for this domain consist of two steps - speaker segmentation and then speaker clustering. The Bayesian Information Criterion (BIC) has been a very popular distance measure for both speaker segmentation and clustering. Experiments were conducted that confirmed the effectiveness of this distance measure for segmentation and clustering. Further speaker segmentation experiments were performed using the Hotelling’s T2 statistic to augment the BIC. It was observed that while this does speed up processing, the segmentation FScore obtained does not match up to that reported in the literature. A novel speaker clustering approach was also introduced where polynomial expanded feature vectors were used to compute the distance between clusters. It was found that this approach could produce results comparable to that for the BIC. In order to address the problem of speaker diarization for the meeting domain, a diarization system was developed and submitted for the NIST Rich Transcription 2007 (RT-07) evaluation. This diarization system exploited the diversity of meeting recording channels by performing Time Delay of Arrival (TDOA) estimation using a Normalized Least Means Squared (NLMS) filter. Subsequent performance enhancements were delivered by adding a cluster purification module, as well as a Non-Speech & Silence Removal (NS&SR) module. An overall Diarization Error Rate (DER) of 15.32% was obtained for the RT-07 corpus. This score was found to be competitive against the other entrants in the evaluation exercise.
DOI: 10.32657/10356/15707
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
KohEugeneChinWei09.pdfMain report1.19 MBAdobe PDFThumbnail

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.