Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/82112
Title: A topological approach for protein classification
Authors: Cang, Zixuan
Mu, Lin
Wu, Kedi
Opron, Kristopher
Xia, Kelin
Wei, Guo-Wei
Keywords: persistent homology
machine learning
Issue Date: 2015
Source: Cang, Z., Mu, L., Wu, K., Opron, K., Xia, K., & Wei, G.-W. (2015). A topological approach for protein classification. Molecular Based Mathematical Biology, 3(1), 140-162.
Series/Report no.: Molecular Based Mathematical Biology
Abstract: Protein function and dynamics are closely related to its sequence and structure.However, prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity between proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics. Persistent homology is a new branch of algebraic topology that has found its success in the topological data analysis in a variety of disciplines, including molecular biology. The present work explores the potential of using persistent homology as an independent tool for protein classification. To this end, we propose a molecular topological fingerprint based support vector machine (MTF-SVM) classifier. Specifically,we construct machine learning feature vectors solely fromprotein topological fingerprints,which are topological invariants generated during the filtration process. To validate the presentMTF-SVMapproach, we consider four types of problems. First, we study protein-drug binding by using the M2 channel protein of influenza A virus. We achieve 96% accuracy in discriminating drug bound and unbound M2 channels. Secondly, we examine the use of MTF-SVM for the classification of hemoglobin molecules in their relaxed and taut forms and obtain about 80% accuracy. Thirdly, the identification of all alpha, all beta, and alpha-beta protein domains is carried out using 900 proteins.We have found a 85% success in this identification. Finally, we apply the present technique to 55 classification tasks of protein superfamilies over 1357 samples and 246 tasks over 11944 samples. Average accuracies of 82% and 73% are attained. The present study establishes computational topology as an independent and effective alternative for protein classification.
URI: https://hdl.handle.net/10356/82112
http://hdl.handle.net/10220/41120
URL: http://www.degruyter.com/view/j/mlbmb.2015.3.issue-1/mlbmb-2015-0009/mlbmb-2015-0009.xml?format=INT
Schools: School of Physical and Mathematical Sciences 
Rights: © 2015 Zixuan Cang et al., licensee De Gruyter Open. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License.
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SPMS Journal Articles

Files in This Item:
File Description SizeFormat 
26-A topological approach to protein classification.pdf2.21 MBAdobe PDFThumbnail
View/Open

Page view(s) 20

677
Updated on Jun 18, 2024

Download(s) 20

223
Updated on Jun 18, 2024

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.