Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/183950
Title: | Pathological voice detection based on multi-vowel multi-modal model | Authors: | Loh, Zhi Shen | Keywords: | Computer and Information Science | Issue Date: | 2025 | Publisher: | Nanyang Technological University | Source: | Loh, Z. S. (2025). Pathological voice detection based on multi-vowel multi-modal model. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/183950 | Abstract: | Pathological voice detection aims to determine whether a given voice sample originates from an individual with a voice pathology or a healthy individual. While previous studies have demonstrated that incorporating multiple voice recordings improves classification accuracy in traditional machine learning methods [1], the impact of using multiple voice recordings in deep learning-based approaches remains unexplored. This thesis investigates the effectiveness of utilizing multiple voice recordings and leveraging within-vowel and between-vowel feature relationships for pathological voice detection. Additionally, it examines the simultaneous use of multiple input types—specifically, audio waveform, Mel-spectrogram, and extended Geneva Minimalistic Acoustic Parameter set (eGeMAPs). Experiments conducted on the SVD dataset reveal that incorporating multiple vowels and input types enhances detection performance. Based on these findings, this thesis proposes a simple model that integrates multiple vowels and input types, achieving an accuracy of 74.5%. The results suggest that combining this approach with more complex model architectures that have demonstrated high detection accuracy could further improve their performance. In addition, this thesis proposes a method to assign attributes to groups of inputs of the model based on the integrated gradients method, enhancing the explainability of the model. | URI: | https://hdl.handle.net/10356/183950 | Schools: | College of Computing and Data Science | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | CCDS Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Final FYP - Zhi Shen (final).pdf Restricted Access | 2.68 MB | Adobe PDF | View/Open |
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.