Please use this identifier to cite or link to this item:
Title: Mining, annotating and visualizing evolutionary networks of influenza virus
Authors: Deshpande Akhila Sameer
Keywords: DRNTU::Engineering::Computer science and engineering
Issue Date: 2018
Abstract: Influenza Virus is hosted by both avian and mammalian species. They evolve rapidly through the genetic shift and drift to escape antibody binding, this can cause seasonal epidemics and devastating pandemics. The classification of influenza genes into lineages is an important part of the analysis of viral sequence data. WHOnFAOnOIE H5N1 evolution working group has specified criteria for defining the clade from a phylogenetic tree for HA sequences that have evolved from A/- goose/Guandong/1996 H5N1 virus. Independent studies have classified subtypes like H1N1 and H9N2 into clades for establishing common nomenclature. Gene sequences could be classified based on similarity to pre-defined lineages if lineages are known. But there is a lack of tools that automatically produce clade information from input gene sequences; manual inspections are tedious. This research presents a novel approach MAVEN: Mining and Annotating Evolutionary Network, to determine clade information of Influenza Virus of a particular subtype, which has emerged as a consequence of selective genetic bottlenecks during transmission. MAVEN uses combination of phylogenetic trees and unsupervised machine learning algorithms to find the Clades. In this approach, Phylogenetic trees are constructed using a fixed number of random HA sequences from the input sequences, for each tree the sequence of its internal nodes is inferred using Fitch. Each node in the tree is tested for non-significance, using student t-test on within and between distances for the leaf node sequences present in two child nodes. Clustering algorithm run on these selected nodes groups them into the set of clusters. A tree is constructed for each cluster and a representative node (bottleneck sequence) is found. All the sequences are then v assigned to these representative based on their distance from the bottleneck sequence forming the clades. This solution not only clusters clades based on lineages but also expresses lineage relationship between each cluster in form of a tree. We discuss a case study when MAVEN is applied on 7052 H1N1 HA sequences that have been examined by a previous publication, and have already been classified into clades. We then proceed to compare both cluster classifications using cluster validation indexes like Entropy, Silhouette Coefficient, Dunn index and PearsonGamma, and note that MAVEN performs better on all indexes. While the influenza HA sequences were used for the purpose of this study, this approach could be applied to any genes for lineage assignment.
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:WKWSCI Theses

Files in This Item:
File Description SizeFormat 
  Restricted Access
MSc Dissertation23.98 MBAdobe PDFView/Open

Page view(s) 50

Updated on Feb 25, 2021

Download(s) 50

Updated on Feb 25, 2021

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.