Please use this identifier to cite or link to this item:
Title: Prediction of neutralising antibodies for novel coronavirus with machine learning
Authors: Kho, Jordon Junyang
Keywords: Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Issue Date: 2023
Publisher: Nanyang Technological University
Source: Kho, J. J. (2023). Prediction of neutralising antibodies for novel coronavirus with machine learning. Final Year Project (FYP), Nanyang Technological University, Singapore.
Project: SCSE22-0982 
Abstract: Coronaviruses were responsible for three major viral outbreaks since the beginning of the 21st century, with the most recent outbreak being the coronavirus disease 2019 pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Coronavirus infections are known to cause severe respiratory disease and even death. Unfortunately, there is no effective drug or treatment to prevent and treat the infection. While neutralising antibodies have the potential to prevent future infections, traditional lab-based methods are often too time-consuming and expensive. Hence, machine learning approaches have become increasingly popular for expediting and complementing lab-based methods in the search for potential antibody candidates. This project investigated the utility of graph features for the discovery of potential neutralising SARS-CoV-2 antibodies. Tree-based models and other traditional classifiers were trained on mean pooling and max pooling graph features and their predictive performance were compared to those of baseline Extended Connectivity Fingerprints (ECFPs) models. As the data set suffered from class imbalance, Synthetic Minority Oversampling Technique (SMOTE) and Synthetic Minority Oversampling Technique (SMOTE-N) – Nominal were applied to oversample minority data points. The best performing models were mean pooling models trained using SMOTE-N with accuracies of up to 82% and F1 scores of up to 84% after hyper-parameter tuning. Mean pooling could capture sequence information more accurately than max pooling and SMOTE-N was found to be more compatible with graph features than SMOTE as the latter was more susceptible to noise generation. Furthermore, graph features were more interpretable and more compatible with oversampling techniques as compared to molecular fingerprints. However, the models were poor at correctly classifying the non-neutralising sequences and had false positive rates as high as 41%. Therefore, the exploration of other oversampling techniques in combination with undersampling techniques and the experimentation of different pooling approaches to capture atomic information more accurately could serve as new directions in future work.
Schools: School of Computer Science and Engineering 
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
Final Report.pdf
  Restricted Access
1.92 MBAdobe PDFView/Open

Page view(s)

Updated on Apr 17, 2024


Updated on Apr 17, 2024

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.