Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/179692
Title: Making sense of unstructured biological metadata
Authors: Sea, Bao Yi
Keywords: Computer and Information Science
Medicine, Health and Life Sciences
Issue Date: 2024
Publisher: Nanyang Technological University
Source: Sea, B. Y. (2024). Making sense of unstructured biological metadata. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/179692
Abstract: Ribonucleic Acid sequencing (RNA-seq) has become a cornerstone of modern biological research, offering deep insights into gene expression and function. However, RNA-seq analysis is often challenged by the presence of unlabelled or unstructured tissue-type information in the metadata, making the annotation process both laborious and computationally intensive. Accurate tissue annotations are crucial for interpreting gene expression profiles, as they provide essential context for understanding the biological significance of the data. This paper aims to enhance the efficiency of metadata processing by introducing an improved annotation pipeline that leverages Generative Pre-trained Transformer (GPT) technology, specifically GPT-4o, to automate the annotation of tissue types in metadata. Preliminary results indicate that the automated approach significantly reduces the time and computational resources required while maintaining high accuracy (based on F1 score) in tissue-type annotations. This automated approach addresses key bottlenecks in metadata annotation, highlighting the potential of Natural Language Processing (NLP) tools in enhancing the process of RNA-seq analysis. Offering a solution for managing large volumes of RNA-seq metadata and serves as a proof-of-concept for large-scale annotation efforts in other types of biological data. This advancement not only streamlines the annotation process but also facilitates accelerated and more efficient biological research, paving the way for deeper insights into gene functions and their implications across various fields.
URI: https://hdl.handle.net/10356/179692
Schools: School of Biological Sciences 
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SBS Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
FYP report_Sea Bao Yi.pdf
  Restricted Access
None2.89 MBAdobe PDFView/Open

Page view(s)

42
Updated on Oct 8, 2024

Download(s)

4
Updated on Oct 8, 2024

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.