Please use this identifier to cite or link to this item:
Title: How doppelgänger effects in biomedical data confound machine learning
Authors: Wang, Li Rong
Wong, Limsoon
Goh, Wilson Wen Bin
Keywords: Engineering::Computer science and engineering
Issue Date: 2022
Source: Wang, L. R., Wong, L. & Goh, W. W. B. (2022). How doppelgänger effects in biomedical data confound machine learning. Drug Discovery Today, 27(3), 678-685.
Project: RG35/20 
Journal: Drug Discovery Today 
Abstract: Machine learning (ML) models have been increasingly adopted in drug development for faster identification of potential targets. Cross-validation techniques are commonly used to evaluate these models. However, the reliability of such validation methods can be affected by the presence of data doppelgängers. Data doppelgängers occur when independently derived data are very similar to each other, causing models to perform well regardless of how they are trained (i.e., the doppelgänger effect). Despite the abundance of data doppelgängers in biomedical data and their inflationary effects, they remain uncharacterized. We show their prevalence in biomedical data, demonstrate how doppelgängers arise, and provide proof of their confounding effects. To mitigate the doppelgänger effect, we recommend identifying data doppelgängers before the training-validation split.
ISSN: 1359-6446
DOI: 10.1016/j.drudis.2021.10.017
Schools: Lee Kong Chian School of Medicine (LKCMedicine) 
School of Computer Science and Engineering 
School of Biological Sciences 
Rights: © 2021 Elsevier Ltd. All rights reserved. This paper was published in Drug Discovery Today and is made available with permission of Elsevier Ltd.
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:LKCMedicine Journal Articles
SBS Journal Articles
SCSE Journal Articles

Files in This Item:
File Description SizeFormat 
ddt_manuscript.pdf330.29 kBAdobe PDFThumbnail
ddt_supplementary.pdf543.57 kBAdobe PDFThumbnail

Citations 50

Updated on Feb 21, 2024

Web of ScienceTM
Citations 50

Updated on Oct 26, 2023

Page view(s)

Updated on Feb 28, 2024


Updated on Feb 28, 2024

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.