Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/166555
Title: Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts
Authors: Cao, Shuwen
Keywords: Engineering::Computer science and engineering
Issue Date: 2023
Publisher: Nanyang Technological University
Source: Cao, S. (2023). Discovery of novel biomarkers using machine learning based methods in cross disorder psychiatry cohorts. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/166555
Abstract: Psychiatric disorders (PD) are gaining more attention nowadays due to it profound negative impact on individuals and the society. Therefore, genomic psychiatry is also gaining more interests as it holds much promise in biomarker discovery of PD. However, genomic dataset usually consists of high dimensional data with small sample size in a psychiatric outpatient clinic setting, which impose a major challenge for accurate and significant clinical analysis of the transcriptomic data. In this project, we address this issue by proposing a pipeline involving the state-of-the-art machine learning based methods to extract the salient set of genes, which are also known as features of the genomic data as potential biomarkers for future biological analysis. By using machine learning techniques, we aim to narrow down the number of genes, which are potential biomarkers that have a significant impact in identifying bipolar disorders (BD). To better stimulate the application of a psychiatric outpatient clinic setting, we carried out the investigation on transcriptomic data of lithium / non-lithium treated bipolar patients (n=240) and healthy controls (n=240). After a gamut of data pre-processing, univariate filtering using F-test was applied on the genomic data, followed with Principal Component Analysis (PCA) to perform dimensionality reduction. Lastly, we implemented multivariate feature selection method of recursive feature elimination using various machine learning models with nested cross-validation to select the set of genes giving the best prediction accuracy in distinguishing BD patients with healthy controls. The results obtained indicated that the genes selected by our proposed pipeline are able to achieve higher predictive accuracy in classifying BD patients and BD patients treated with lithium from healthy controls. We conclude that our proposed feature selection pipeline combining univariate filtering, PCA and multivariate feature selection with machine learning based methods is capable of overcoming the challenges of high dimensionality of gene expression data, and is able to select relevant salient features for further biological analysis.
URI: https://hdl.handle.net/10356/166555
Schools: School of Computer Science and Engineering 
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
FYP.pdf
  Restricted Access
836.14 kBAdobe PDFView/Open

Page view(s)

123
Updated on Mar 16, 2025

Download(s)

4
Updated on Mar 16, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.