Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/149109
Title: Using AI for music source separation
Authors: Lee, Jasline Jie Yu
Keywords: Engineering::Electrical and electronic engineering
Issue Date: 2021
Publisher: Nanyang Technological University
Source: Lee, J. J. Y. (2021). Using AI for music source separation. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/149109
Project: A1109-201
Abstract: This report summarizes the research, methodologies, and experimental implementation on Music Source Separation (MSS). It is the task of isolating individual instrument source signals that we are interested in from a music piece. In recent years, supervised deep learning methods are known to be state-of-the-art source separation technology and can be categorised as Spectrogram-based and Waveform-based methods. These models usually run on large computational power across multiple GPUs over long training hours. Albeit the success of integrating machine learning in the separation process, many lack visual representation for users to appreciate the connection between the input, model, and output. In this project, we will be focusing on the separation of 4 sources: bass, drums, vocals, and other accompaniments from an input song mixture. The objective is to analyse the impacts of different components present in both Spectrogram and Waveform based systems through fine-tuning, data handling and ablation testing. This allows us to understand the contribution of each component to the overall system and make informed choices to maximise the performance of the model given the limitation of a single GPU and the dataset. Experimental results of this project demonstrate 3 key points. Firstly, the importance of the use of RNN architecture such as BiLSTM and BiGRU in the separation of music. Secondly, the quality of the dataset and the type of data augmentation have a larger impact on the performance of the model compared to the quantity of the dataset. Lastly, the computational efficiency of the model can be improved when an uncompressed dataset and BiGRU is used. On top of the experimental results, a graphical interface is also introduced to the model to allow end-users to have a clear conception of the relationship between the input, model and output.
URI: https://hdl.handle.net/10356/149109
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
Final FYP Report - A1109-201.pdf
  Restricted Access
Using AI for music source separation2.74 MBAdobe PDFView/Open

Page view(s)

82
Updated on Jan 23, 2022

Download(s)

3
Updated on Jan 23, 2022

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.