Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/150972
Full metadata record
DC FieldValueLanguage
dc.contributor.authorNguyen, Trinh-Trung-Duongen_US
dc.contributor.authorLe, Nguyen Quoc Khanhen_US
dc.contributor.authorHo, Quang-Thaien_US
dc.contributor.authorPhan, Dinh-Vanen_US
dc.contributor.authorOu, Yu-Yenen_US
dc.date.accessioned2021-05-31T08:34:44Z-
dc.date.available2021-05-31T08:34:44Z-
dc.date.issued2019-
dc.identifier.citationNguyen, T., Le, N. Q. K., Ho, Q., Phan, D. & Ou, Y. (2019). Using word embedding technique to efficiently represent protein sequences for identifying substrate specificities of transporters. Analytical Biochemistry, 577, 73-81. https://dx.doi.org/10.1016/j.ab.2019.04.011en_US
dc.identifier.issn0003-2697en_US
dc.identifier.urihttps://hdl.handle.net/10356/150972-
dc.description.abstractMembrane transport proteins and their substrate specificities play crucial roles in various cellular functions. Identifying the substrate specificities of membrane transport proteins is closely related to protein-target interaction prediction, drug design, membrane recruitment, and dysregulation analysis, thus being an important problem for bioinformatics researchers. In this study, we applied word embedding approach, the main cause for natural language processing breakout in recent years, to protein sequences of transporters. We defined each protein sequence based on the word embeddings and frequencies of its biological words. The protein features were then fed into machine learning models for prediction. We also varied the lengths of protein sequence's constituent biological words to find the optimal length which generated the most discriminative feature set. Compared to four other feature types created from protein sequences, our proposed features can help prediction models yield superior performance. Our best models reach an average area under the curve of 0.96 and 0.99, respectively on the 5-fold cross validation and the independent test. With this result, our study can help biologists identify transporters based on substrate specificities as well as provides a basis for further research that enriches a field of applying natural language processing techniques in bioinformatics.en_US
dc.language.isoenen_US
dc.relation.ispartofAnalytical Biochemistryen_US
dc.rights© 2019 Elsevier Inc. All rights reserved.en_US
dc.subjectScience::Biological sciencesen_US
dc.titleUsing word embedding technique to efficiently represent protein sequences for identifying substrate specificities of transportersen_US
dc.typeJournal Articleen
dc.contributor.schoolSchool of Humanitiesen_US
dc.identifier.doi10.1016/j.ab.2019.04.011-
dc.identifier.pmid31022378-
dc.identifier.scopus2-s2.0-85064809652-
dc.identifier.volume577en_US
dc.identifier.spage73en_US
dc.identifier.epage81en_US
dc.subject.keywordsWord Embeddingsen_US
dc.subject.keywordsFeature Extractionen_US
dc.description.acknowledgementThe authors acknowledge support from the Ministry of Science and Technology, Taiwan, R.O.C. under Grant no. MOST 106-2221-E-155-068.en_US
item.grantfulltextnone-
item.fulltextNo Fulltext-
Appears in Collections:SoH Journal Articles

Page view(s)

50
Updated on Oct 24, 2021

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.