Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/170179
Title: Large scale transcriptomics analyses for gene function annotation and regulation
Authors: Tan, Qiao Wen
Keywords: Science::Biological sciences::Molecular biology
Issue Date: 2023
Publisher: Nanyang Technological University
Source: Tan, Q. W. (2023). Large scale transcriptomics analyses for gene function annotation and regulation. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/170179
Project: 04INS000396C220 
Abstract: The advances in methods for generating genome-wide gene expression data are reflected by the exponential growth in RNA-sequencing data deposited in sequence read archives over the past decade. While existing methods such as forward and reverse genetics and determination of protein structure remain the gold standard for the validation of gene function, it is not possible to apply these methods for every single gene in every organism studied. Even in the most well-studied model organisms, such as Arabidopsis, only 42.85% of its protein coding genes are experimentally validated. While co-expression is not a new method used in bioinformatics for the prediction of gene function, the power of the method is proportionate to the amount of data used in the analysis. The sheer amount and robustness of RNA-sequencing data enable us to apply co-expression analysis to more organisms with higher resolution. Despite the vast amount of data available, new data still needs to be generated to provide context-specific gene expression data, especially for biological processes that involve genes with multiple functions or are differentially co-expressed. However, the gap between data accumulation and the bioinformatic skill level of researchers remains to be closed. Although co-expression databases exist for this purpose, the database may be outdated, limited to commonly studied organisms, and offer limited customisation in terms of the dataset used to generate the co-expression network. Thus, tools that enable biologists that are not trained in computational biology to construct their own condition-dependent and independent datasets and perform co-expression analysis from raw RNA-sequencing data without the need for excessive hardware requirements would be highly beneficial. The use of co-expression for gene function discovery using publicly available data is demonstrated in chapters 2 to 4, on organisms ranging from Plasmodium, a disease-causing parasite with many unique genes; to Artemisia annua, a plant that synthesises an important secondary metabolite used in the treatment of malaria which is caused by the Plasmodium parasite; and Nicotiana tabacum, where the nicotine produced by the plant is used in tobacco products. Due to the importance of Plasmodium and the lack of an existing co-expression database dedicated to it, the data for the organisms were downloaded and used to populate a co-expression database so that the wider community could benefit from the co-expression network generated. Using the database, we show how it can be used to identify genes that may be interesting for further characterisation based on their association to a biological function, association to gene module with many characterised virulent genes and organelle specificity. In chapters 3 to 4, we demonstrate the use of the pipelines that we have designed for use by biologists with little to no training in computational biology to perform co-expression analyses. Through the analyses of secondary metabolite biosynthesis pathways of artemisinin and nicotine, we highlight how co-expression neighbourhoods of genes known to be involved in secondary metabolite biosynthesis can reveal other biosynthetic genes, potential transcriptional regulators, and components such as transporters involved in the process. The final chapter illustrates the importance of generating condition-specific data despite a large amount of transcriptomic data available for situations such as the study of the plant stress response. Through enrichment of biological processes, reconstruction of stress-specific gene regulatory networks and comparison of stress-specific transcription factors of Marchantia, we observe a hierarchy in stress response where certain stresses are more dominant, the superior performance of stress-specific networks indicative of interactions that are masked when all experiments are aggregated and a disagreement between the involvement of transcription factor orthologs of Arabidopsis and Marchantia during stress respectively. Finally, we investigated the predictability of gene expression during combined through three-dimensional linear regression of single stress and combined stress gene expression and observed well-supported linear relationships where the magnitude of the coefficients corresponded to the dominance of the stress.
URI: https://hdl.handle.net/10356/170179
DOI: 10.32657/10356/170179
Schools: School of Biological Sciences 
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SBS Theses

Files in This Item:
File Description SizeFormat 
Thesis_revised_final_lib.pdfPhD Thesis13.44 MBAdobe PDFThumbnail
View/Open

Page view(s)

349
Updated on Mar 16, 2025

Download(s) 50

224
Updated on Mar 16, 2025

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.