Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/151921
Title: On the design of capacity-approaching error-correction codes for multi constrained systems
Authors: Zhang, Jiayu
Keywords: Engineering::Computer science and engineering::Data::Data storage representations
Issue Date: 2021
Publisher: Nanyang Technological University
Source: Zhang, J. (2021). On the design of capacity-approaching error-correction codes for multi constrained systems. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/151921
Project: A3078-201
Abstract: Current common storage media has limited ability to store data with present data explosion trends, which serves as a dominant motivator for developing novel storage technologies. The technological advancement in biological sciences is not a new story, and DNA data storage is a beneficiary of breakthroughs in bioinformatics and in- novations by cross-disciplinary collaborations. Due to its potential to store data for centuries in a high-density manner, DNA is considered as a promising data storage solution to enormous data generation and storage requirement. DNA Sequencing is part of DNA data storage process, which is error prone. To analyse DNA nucleotide sequences, clustering plays a vital role to reduce redundancies and correct errors. Greedy approaches, which do not always produce the optimal results, are applied by most currently available software tools when clustering se- quences - they are very sensitive to single parameter which decides the similarities among DNA sequences within one cluster. In general, the specific similarity is not known, so sequence clusters generated by these greedy algorithms tend not to match the actual clusters if an imperfect parameter is used. As an unsupervised learning model, mean shift algorithm has been utilised many times in several fields like descriptive statistics, audio processing, and computer vision. A convergence to local optimum is guaranteed by the mean shift algorithm, which overcomes the limitations in greedy algorithms. MeShClust is an alignment-free clustering tool applying the mean shift approach and a machine learning algorithm to cluster DNA sequences. In this project, the MeShClust tool is implemented and the results are compared with the ones produced by the SlideSort algorithm based on the same DNA sequence dataset.
URI: https://hdl.handle.net/10356/151921
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
Zhang Jiayu FYP Report.pdf
  Restricted Access
4.56 MBAdobe PDFView/Open

Page view(s)

172
Updated on Dec 9, 2022

Download(s)

13
Updated on Dec 9, 2022

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.