Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/155991
Title: | How doppelgänger effects in biomedical data confound machine learning | Authors: | Wang, Li Rong Wong, Limsoon Goh, Wilson Wen Bin |
Keywords: | Engineering::Computer science and engineering | Issue Date: | 2022 | Source: | Wang, L. R., Wong, L. & Goh, W. W. B. (2022). How doppelgänger effects in biomedical data confound machine learning. Drug Discovery Today, 27(3), 678-685. https://dx.doi.org/10.1016/j.drudis.2021.10.017 | Project: | RG35/20 | Journal: | Drug Discovery Today | Abstract: | Machine learning (ML) models have been increasingly adopted in drug development for faster identification of potential targets. Cross-validation techniques are commonly used to evaluate these models. However, the reliability of such validation methods can be affected by the presence of data doppelgängers. Data doppelgängers occur when independently derived data are very similar to each other, causing models to perform well regardless of how they are trained (i.e., the doppelgänger effect). Despite the abundance of data doppelgängers in biomedical data and their inflationary effects, they remain uncharacterized. We show their prevalence in biomedical data, demonstrate how doppelgängers arise, and provide proof of their confounding effects. To mitigate the doppelgänger effect, we recommend identifying data doppelgängers before the training-validation split. | URI: | https://hdl.handle.net/10356/155991 | ISSN: | 1359-6446 | DOI: | 10.1016/j.drudis.2021.10.017 | Schools: | Lee Kong Chian School of Medicine (LKCMedicine) School of Computer Science and Engineering School of Biological Sciences |
Rights: | © 2021 Elsevier Ltd. All rights reserved. This paper was published in Drug Discovery Today and is made available with permission of Elsevier Ltd. | Fulltext Permission: | open | Fulltext Availability: | With Fulltext |
Appears in Collections: | LKCMedicine Journal Articles SBS Journal Articles SCSE Journal Articles |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ddt_manuscript.pdf | 330.29 kB | Adobe PDF | ![]() View/Open | |
ddt_supplementary.pdf | 543.57 kB | Adobe PDF | ![]() View/Open |
SCOPUSTM
Citations
50
4
Updated on Nov 25, 2023
Web of ScienceTM
Citations
50
4
Updated on Oct 26, 2023
Page view(s)
179
Updated on Nov 30, 2023
Download(s)
5
Updated on Nov 30, 2023
Google ScholarTM
Check
Altmetric
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.