Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/15707
Title: | Speaker diarization of news broacasts and meeting recordings | Authors: | Koh, Eugene Chin Wei | Keywords: | DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition | Issue Date: | 2009 | Source: | Koh, E. C. W. (2009). Speaker diarization of news broacasts and meeting recordings. Master’s thesis, Nanyang Technological University, Singapore. | Abstract: | Given a piece of audio recording, the task of speaker diarization can be summarized as answering the question of “Who spoke when ?”. This thesis offers a review of the techniques and issues relating to performing speaker diarization on broadcast news recordings, as well as meeting recordings. The broadcast news domain is generally regarded to be simpler because the turn taking between speakers is better controlled and audio quality tends to be higher. The typical approach used for this domain consist of two steps - speaker segmentation and then speaker clustering. The Bayesian Information Criterion (BIC) has been a very popular distance measure for both speaker segmentation and clustering. Experiments were conducted that confirmed the effectiveness of this distance measure for segmentation and clustering. Further speaker segmentation experiments were performed using the Hotelling’s T2 statistic to augment the BIC. It was observed that while this does speed up processing, the segmentation FScore obtained does not match up to that reported in the literature. A novel speaker clustering approach was also introduced where polynomial expanded feature vectors were used to compute the distance between clusters. It was found that this approach could produce results comparable to that for the BIC. In order to address the problem of speaker diarization for the meeting domain, a diarization system was developed and submitted for the NIST Rich Transcription 2007 (RT-07) evaluation. This diarization system exploited the diversity of meeting recording channels by performing Time Delay of Arrival (TDOA) estimation using a Normalized Least Means Squared (NLMS) filter. Subsequent performance enhancements were delivered by adding a cluster purification module, as well as a Non-Speech & Silence Removal (NS&SR) module. An overall Diarization Error Rate (DER) of 15.32% was obtained for the RT-07 corpus. This score was found to be competitive against the other entrants in the evaluation exercise. | URI: | https://hdl.handle.net/10356/15707 | DOI: | 10.32657/10356/15707 | Schools: | School of Computer Engineering | Research Centres: | Emerging Research Lab | Fulltext Permission: | open | Fulltext Availability: | With Fulltext |
Appears in Collections: | SCSE Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
KohEugeneChinWei09.pdf | Main report | 1.19 MB | Adobe PDF | View/Open |
Page view(s) 50
549
Updated on Sep 7, 2024
Download(s) 20
255
Updated on Sep 7, 2024
Google ScholarTM
Check
Altmetric
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.