Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/175226
Title: | Robust voice activity detection using DNN approaches | Authors: | Parashar Kshitij | Keywords: | Computer and Information Science | Issue Date: | 2024 | Publisher: | Nanyang Technological University | Source: | Parashar Kshitij (2024). Robust voice activity detection using DNN approaches. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175226 | Project: | SCSE23-0748 | Abstract: | Voice activity detection (VAD) is a pivotal component in various speech processing applications, playing a crucial role in tasks such as speech recognition, speaker diarization, and noise suppression. Recognizing its significance, this thesis delves into the exploration of advancements in single-channel VAD systems, leveraging the power of deep learning techniques. Through meticulous experimentation and analysis, we undertake comprehensive evaluations of three prominent VAD models: Pyannote, Silero, and MarbleNet, across a spectrum of conditions and scenarios. Our investigations encompass a nuanced examination of varying parameters such as chunk sizes, strides, and prediction thresholds, aiming to discern their nuanced impacts on model performance. From our findings, we discern Pyannote as the standout performer exhibiting superior accuracy compared to Silero by approximately 16.87% and MarbleNet by approximately 25.97% on the DIHARD III dataset. Consequently, we pivot our focus towards enhancing Pyannote’s capabilities. In the process of enhancing, we looked into how different parameters affect the performance of Pyannote and trained models on varying chunk sizes and stride to deduce the same. With this, we were able to conclude that models trained on small chunk size and strides do not necessarily perform well during inference with small chunks and strides. Additionally, we delve into the realm of scalability and production readiness, exploring strategies facilitated by the Open Neural Network Exchange (ONNX) framework. These efforts provide important insights that can enhance the field of VAD, leading to the development of more robust and efficient voice activity detection systems capable of meeting the needs of modern speech processing applications | URI: | https://hdl.handle.net/10356/175226 | Schools: | School of Computer Science and Engineering | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | SCSE Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Parashar_Kshitij_FYP.pdf Restricted Access | 2.08 MB | Adobe PDF | View/Open |
Page view(s)
385
Updated on May 7, 2025
Download(s)
13
Updated on May 7, 2025
Google ScholarTM
Check
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.