Please use this identifier to cite or link to this item:
Title: Doppelgänger spotting in biomedical gene expression data
Authors: Wang, Li Rong
Choy, Xin Yun
Goh, Wilson Wen Bin
Keywords: Science::Biological sciences
Engineering::Computer science and engineering
Issue Date: 2022
Source: Wang, L. R., Choy, X. Y. & Goh, W. W. B. (2022). Doppelgänger spotting in biomedical gene expression data. IScience, 25(8), 104788-.
Journal: iScience
Abstract: Doppelgänger effects (DEs) occur when samples exhibit chance similarities such that, when split across training and validation sets, inflates the trained machine learning (ML) model performance. This inflationary effect causes misleading confidence on the deployability of the model. Thus, so far, there are no tools for doppelgänger identification or standard practices to manage their confounding implications. We present doppelgangerIdentifier, a software suite for doppelgänger identification and verification. Applying doppelgangerIdentifier across a multitude of diseases and data types, we show the pervasive nature of DEs in biomedical gene expression data. We also provide guidelines toward proper doppelgänger identification by exploring the ramifications of lingering batch effects from batch imbalances on the sensitivity of our doppelgänger identification algorithm. We suggest doppelgänger verification as a useful procedure to establish baselines for model evaluation that may inform on whether feature selection and ML on the data set may yield meaningful insights.
ISSN: 2589-0042
DOI: 10.1016/j.isci.2022.104788
Schools: School of Computer Science and Engineering 
School of Biological Sciences 
Lee Kong Chian School of Medicine (LKCMedicine) 
Research Centres: Centre for Biomedical Informatics, NTU
Rights: © 2022 The Author(s). This is an open access article under the CC BY-NC-ND license (
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:LKCMedicine Journal Articles
SBS Journal Articles
SCSE Journal Articles

Files in This Item:
File Description SizeFormat 
1-s2.0-S2589004222010604-main.pdf5.62 MBAdobe PDFThumbnail

Citations 50

Updated on Apr 17, 2024

Web of ScienceTM
Citations 50

Updated on Oct 28, 2023

Page view(s)

Updated on Apr 16, 2024


Updated on Apr 16, 2024

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.