Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/165777
Title: Classification of sound using machine learning
Authors: Tan, Ki In
Keywords: Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
Issue Date: 2023
Publisher: Nanyang Technological University
Source: Tan, K. I. (2023). Classification of sound using machine learning. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/165777
Project: SCSE22-0634 
Abstract: The soundscape of urban parks and cities are composed of a variety of natural and man-made noises. The benefits brought by urban parks and the health ailments from sound pollution makes soundscape analysis a valuable study topic in urban cities, like Singapore. However, the need to collect sound data for research and analysis is hampered by the difficulty of recruiting volunteers and manually annotating samples accurately. The use of machine learning in predicting and annotating environment sounds can help in data collection. In my previous work, the use of sound spectrum was to overcome the risk of audio recording storage, where the audio data may expose personal information of the data collection volunteer. Existing works successfully proved the usage of sound spectrum in training a deep learning model to predict with high accuracy. However, the model displayed signs of overfitting. This project aims to explore various ways to improve the existing sound spectrum data pipeline’s accuracy, while limiting the overfitting issue found in models used in the pipeline. In investigating the limitations in the data pipeline, the Convolutional Neural Network (CNN) model architecture used was found to limit model performance, due to its inability to capture global relationships of sound spectrum features. A literature review and evaluation of transformer models for sound classification found the Audio Spectrogram Transformer (AST) most suitable to replace CNN due to the stable performance despite having its batch size reduced when trained in a resource constrained environment. Training AST with sound spectrum yielded an accuracy of 62.30% when compared to CNN’s accuracy of 46.50%, but the performance improvement was limited. Class reduction and data augmentations techniques were experimented; both techniques improved the accuracy of AST, with class reduction improving the accuracy of AST up to 91.25% but at the cost of having limited and less suitable class labels for evaluation. Ultimately, the use of state-of-the-art models and other performance improving techniques in the sound spectrum data pipeline has successfully improved overall model accuracy and future works is likely to investigate on finetuning of the data pipeline for deployment.
URI: https://hdl.handle.net/10356/165777
Schools: School of Computer Science and Engineering 
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
Tan_Ki_In_Final_Report.pdf
  Restricted Access
1.53 MBAdobe PDFView/Open

Page view(s)

194
Updated on Mar 23, 2025

Download(s)

20
Updated on Mar 23, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.