Please use this identifier to cite or link to this item:
Title: Feature selection on transcriptome data for identification of novel biomarkers for bipolar disorder
Authors: Zeng, Yanxi
Keywords: Engineering::Computer science and engineering
Issue Date: 2021
Publisher: Nanyang Technological University
Source: Zeng, Y. (2021). Feature selection on transcriptome data for identification of novel biomarkers for bipolar disorder. Final Year Project (FYP), Nanyang Technological University, Singapore.
Project: SCSE20-0258
Abstract: Genomic psychiatry is a recently expanding field which holds much promise in biomarker discovery for psychiatric disorders. However, high dimensionality of genomic data and relative smaller cohort sizes at the psychiatric outpatient clinic imposes a significant challenge for clinically significant analysis of transcriptomic data. We approach this problem using state-of- the-art machine-learning methods to extract the salient features of genomic data for potential use as biomarkers. To simulate application to psychiatric outpatient clinics, we investigate the use of the above methods on transcriptomic data of lithium treated bipolar patients (n=240) and healthy controls (n=240). After a gamut of preliminary univariate feature selection methods, we apply multivariate methods such as the recursive feature elimination with various machine learning models on transcriptomic data with nested cross-validation to select the set of genes giving the best predictive accuracy of diagnosis. Our results indicated that the genes selected with the above-mentioned process achieve higher predictive classification accuracies of the clinical outcomes and the use of lithium treatment. Furthermore, gene set enrichment analysis and gene ontology analysis were carried out on the candidate biomarkers for investigation of underlying biological and pathogenic processes. We conclude that a feature selection pipeline combining univariate filtering and machine learning based feature selection methods is capable of overcoming the challenges of high dimensionality in genomic data and extracting salient features highlighting related biological pathways for downstream analysis.
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
  Restricted Access
1.49 MBAdobe PDFView/Open

Page view(s)

Updated on May 27, 2022


Updated on May 27, 2022

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.