Effective selection of informative SNPs and classification on the HapMap genotype data.
Date of Issue2007
School of Electrical and Electronic Engineering
Background: Since the single nucleotide polymorphisms (SNPs) are genetic variations which determine the difference between any two unrelated individuals, the SNPs can be used to identify the correct source population of an individual. For efficient population identification with the HapMap genotype data, as few informative SNPs as possible are required from the original 4 million SNPs. Recently, Park et al. (2006) adopted the nearest shrunken centroid method to classify the three populations, i.e., Utah residents with ancestry from Northern and Western Europe (CEU), Yoruba in Ibadan, Nigeria in West Africa (YRI), and Han Chinese in Beijing together with Japanese in Tokyo (CHB+JPT), from which 100,736 SNPs were obtained and the top 82 SNPs could completely classify the three populations.
DRNTU::Engineering::Electrical and electronic engineering
© 2007 The Authors; BioMed Central. This paper was published in BMC Bioinformatics and is made available as an electronic reprint (preprint) with permission of BioMed Central. The paper can be found at the following official URL: http://dx.doi.org/10.1186/1471-2105-8-484 . One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper is prohibited and is subject to penalties under law.