Please use this identifier to cite or link to this item:
|Title:||Algorithm design and code optimization to speed-up bioinformatics software||Authors:||Ritika Jain.||Keywords:||DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences||Issue Date:||2012||Abstract:||LDhat is a Linux-based package written in C-language, used for analysis and calculation of recombination rate in large scale population genetic data using Hudson likelihood method, developed in Oxford University in 2004. It consists of various interlinked programs used for estimation of recombination rates in phased and unphased data with missing information. The estimation of these rates allows scientists to experiment on methods such as gene targeting, understanding mutations and predicting presence of certain disease-causing genes. It is used by many bio-informatics researchers, National Institute of Health, United States of America being a major user. As of now, there are several parts of this program which may take up to several days to generate results, making it resource-consuming. The purpose of this project was to optimise the LDhat algorithm in order to speed-up the time taken by LDhat to process input files and generate results. Since this program is used for major bioinformatics studies, it was imperative that the optimisation techniques used do not affect the results generated. The basic method used for speed-up in the scope of this project was using parallel programming language, OpenMPI, on the existing code with multi-core processors provided by the Bioinformatics lab. The results were tested against the previous code to ensure the validity of results obtained and compute the speed-up achieved. Several approaches towards parallelisation were employed and the report explains the reasons for success and failure of each of them. The distributed-memory approach for parallel implementation of the code has successfully obtained almost linear speed-up in output generation by LDhat. The report compares various output graphs and speed obtained through this approach and makes recommendations which can be similarly employed in other parts of the program.||URI:||http://hdl.handle.net/10356/48453||Rights:||Nanyang Technological University||Fulltext Permission:||restricted||Fulltext Availability:||With Fulltext|
|Appears in Collections:||SCSE Student Reports (FYP/IA/PA/PI)|
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.