Gene and sample selection for cancer identification
Mundra Piyushkumar Arjunlal
Date of Issue2011
School of Computer Engineering
Bioinformatics Research Centre
Gene-expression data gathered with microarrays play an important role in detection, classification, and understanding of many diseases including cancer. However, the numbers of samples gathered in experiments still remain in hundreds compared to the thousands of genes whose expressions are measured. One way to handle this problem is to identify relevant genes that contribute to the disease and thereafter inferring the underlying mechanisms of their functions. This thesis focuses on identification of relevant genes, which is hindered due to several reasons. For example, relevant genes could be correlated with other genes that are biologically relevant but redundant for the classification of disease. While ranking the genes according to their relevance, it is important to consider the quality of samples as microarray samples are highly heterogeneous and multimodal in nature. This further raises an issue of stability of a gene selection method because a gene selection method should be repeatable and reproducible, giving high confidence for selected genes. For multiclass classification, sample distribution of various classes may play important role in gene selection. By considering these aspects into gene selection criteria, this research has evolved in multiple ways by introducing several novel gene ranking algorithms.
DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences