Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/179692
Full metadata record
DC FieldValueLanguage
dc.contributor.authorSea, Bao Yien_US
dc.date.accessioned2024-08-19T00:46:41Z-
dc.date.available2024-08-19T00:46:41Z-
dc.date.issued2024-
dc.identifier.citationSea, B. Y. (2024). Making sense of unstructured biological metadata. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/179692en_US
dc.identifier.urihttps://hdl.handle.net/10356/179692-
dc.description.abstractRibonucleic Acid sequencing (RNA-seq) has become a cornerstone of modern biological research, offering deep insights into gene expression and function. However, RNA-seq analysis is often challenged by the presence of unlabelled or unstructured tissue-type information in the metadata, making the annotation process both laborious and computationally intensive. Accurate tissue annotations are crucial for interpreting gene expression profiles, as they provide essential context for understanding the biological significance of the data. This paper aims to enhance the efficiency of metadata processing by introducing an improved annotation pipeline that leverages Generative Pre-trained Transformer (GPT) technology, specifically GPT-4o, to automate the annotation of tissue types in metadata. Preliminary results indicate that the automated approach significantly reduces the time and computational resources required while maintaining high accuracy (based on F1 score) in tissue-type annotations. This automated approach addresses key bottlenecks in metadata annotation, highlighting the potential of Natural Language Processing (NLP) tools in enhancing the process of RNA-seq analysis. Offering a solution for managing large volumes of RNA-seq metadata and serves as a proof-of-concept for large-scale annotation efforts in other types of biological data. This advancement not only streamlines the annotation process but also facilitates accelerated and more efficient biological research, paving the way for deeper insights into gene functions and their implications across various fields.en_US
dc.language.isoenen_US
dc.publisherNanyang Technological Universityen_US
dc.subjectComputer and Information Scienceen_US
dc.subjectMedicine, Health and Life Sciencesen_US
dc.titleMaking sense of unstructured biological metadataen_US
dc.typeFinal Year Project (FYP)en_US
dc.contributor.supervisorMarek Mutwilen_US
dc.contributor.schoolSchool of Biological Sciencesen_US
dc.description.degreeBachelor's degreeen_US
dc.contributor.supervisor2Peng Ken Limen_US
dc.contributor.supervisoremailmutwil@ntu.edu.sgen_US
dc.subject.keywordsBioinformaticsen_US
item.grantfulltextrestricted-
item.fulltextWith Fulltext-
Appears in Collections:SBS Student Reports (FYP/IA/PA/PI)
Files in This Item:
File Description SizeFormat 
FYP report_Sea Bao Yi.pdf
  Restricted Access
None2.89 MBAdobe PDFView/Open

Page view(s)

42
Updated on Oct 8, 2024

Download(s)

4
Updated on Oct 8, 2024

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.