Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/155991
Full metadata record
DC FieldValueLanguage
dc.contributor.authorWang, Li Rongen_US
dc.contributor.authorWong, Limsoonen_US
dc.contributor.authorGoh, Wilson Wen Binen_US
dc.date.accessioned2022-03-30T01:52:31Z-
dc.date.available2022-03-30T01:52:31Z-
dc.date.issued2022-
dc.identifier.citationWang, L. R., Wong, L. & Goh, W. W. B. (2022). How doppelgänger effects in biomedical data confound machine learning. Drug Discovery Today, 27(3), 678-685. https://dx.doi.org/10.1016/j.drudis.2021.10.017en_US
dc.identifier.issn1359-6446en_US
dc.identifier.urihttps://hdl.handle.net/10356/155991-
dc.description.abstractMachine learning (ML) models have been increasingly adopted in drug development for faster identification of potential targets. Cross-validation techniques are commonly used to evaluate these models. However, the reliability of such validation methods can be affected by the presence of data doppelgängers. Data doppelgängers occur when independently derived data are very similar to each other, causing models to perform well regardless of how they are trained (i.e., the doppelgänger effect). Despite the abundance of data doppelgängers in biomedical data and their inflationary effects, they remain uncharacterized. We show their prevalence in biomedical data, demonstrate how doppelgängers arise, and provide proof of their confounding effects. To mitigate the doppelgänger effect, we recommend identifying data doppelgängers before the training-validation split.en_US
dc.description.sponsorshipMinistry of Education (MOE)en_US
dc.description.sponsorshipNational Research Foundation (NRF)en_US
dc.language.isoenen_US
dc.relationRG35/20en_US
dc.relation.ispartofDrug Discovery Todayen_US
dc.rights© 2021 Elsevier Ltd. All rights reserved. This paper was published in Drug Discovery Today and is made available with permission of Elsevier Ltd.en_US
dc.subjectEngineering::Computer science and engineeringen_US
dc.titleHow doppelgänger effects in biomedical data confound machine learningen_US
dc.typeJournal Articleen
dc.contributor.schoolLee Kong Chian School of Medicine (LKCMedicine)en_US
dc.contributor.schoolSchool of Computer Science and Engineeringen_US
dc.contributor.schoolSchool of Biological Sciencesen_US
dc.identifier.doi10.1016/j.drudis.2021.10.017-
dc.description.versionSubmitted/Accepted versionen_US
dc.identifier.pmid34743902-
dc.identifier.scopus2-s2.0-85118879305-
dc.identifier.issue3en_US
dc.identifier.volume27en_US
dc.identifier.spage678en_US
dc.identifier.epage685en_US
dc.subject.keywordsComputational Biologyen_US
dc.subject.keywordsData Scienceen_US
dc.description.acknowledgementThis research/project is supported by the National Research Foundation, Singapore under its Industry Alignment Fund – Prepositioning (IAF-PP) Funding Initiative. W.W.B.G. also acknowledges support from a Ministry of Education (MOE), Singapore Tier 1 grant (Grant No. RG35/20).en_US
item.grantfulltextopen-
item.fulltextWith Fulltext-
Appears in Collections:LKCMedicine Journal Articles
SBS Journal Articles
SCSE Journal Articles
Files in This Item:
File Description SizeFormat 
ddt_manuscript.pdf330.29 kBAdobe PDFThumbnail
View/Open
ddt_supplementary.pdf543.57 kBAdobe PDFThumbnail
View/Open

SCOPUSTM   
Citations 50

5
Updated on Feb 21, 2024

Web of ScienceTM
Citations 50

4
Updated on Oct 26, 2023

Page view(s)

206
Updated on Feb 28, 2024

Download(s)

13
Updated on Feb 28, 2024

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.