Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/183960
Title: | Analyzing effects of speech enhancement models on voice activity detection | Authors: | Lim, Sui Kiat | Keywords: | Computer and Information Science | Issue Date: | 2025 | Publisher: | Nanyang Technological University | Source: | Lim, S. K. (2025). Analyzing effects of speech enhancement models on voice activity detection. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/183960 | Abstract: | Voice Activity Detection (VAD) plays a critical role in modern speech processing applications, from telecommunications to automatic speech recognition (ASR) and assistive technologies. Despite significant advances, the performance of VAD systems remains susceptible to environmental noise and other distortions present in real-world audio. Speech enhancement (SE) models offer a potential solution by improving the clarity and intelligibility of speech signals, yet their impact on VAD performance across diverse scenarios remains underexplored. Therefore, this thesis systematically investigates the effects of applying various SE models to preprocessed audio datasets, namely DIHARD3, followed by an evaluation using a range of state-of-the-art (SOTA) VAD models. We explore how different enhancement techniques influence the detection accuracy, robustness, and scalability of VAD systems under varying acoustic conditions. Different state-of-the-art VAD systems are integrated under a common interface to consistently evaluate performance across different SE methods. Comprehensive experiments are then conducted to measure key performance metrics such as accuracy and detection error rate, providing a holistic view of SE and VAD interaction. By benchmarking the performance of multiple SE and VAD model combinations and conducting a DIHARD3 domain-wise analysis to evaluate it further, this work highlights the trade-offs between varying SE strategies and VAD system robustness. The findings offer valuable insights into optimizing VAD pipelines for real-world applications, advancing the development of more robust and adaptive speech processing systems. | URI: | https://hdl.handle.net/10356/183960 | Schools: | College of Computing and Data Science | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | CCDS Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
CCDS24-0016_LimSuiKiat_FinalReport.pdf Restricted Access | 17.38 MB | Adobe PDF | View/Open |
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.