Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/164208
Title: | Doppelgänger spotting in biomedical gene expression data | Authors: | Wang, Li Rong Choy, Xin Yun Goh, Wilson Wen Bin |
Keywords: | Science::Biological sciences Engineering::Computer science and engineering |
Issue Date: | 2022 | Source: | Wang, L. R., Choy, X. Y. & Goh, W. W. B. (2022). Doppelgänger spotting in biomedical gene expression data. IScience, 25(8), 104788-. https://dx.doi.org/10.1016/j.isci.2022.104788 | Journal: | iScience | Abstract: | Doppelgänger effects (DEs) occur when samples exhibit chance similarities such that, when split across training and validation sets, inflates the trained machine learning (ML) model performance. This inflationary effect causes misleading confidence on the deployability of the model. Thus, so far, there are no tools for doppelgänger identification or standard practices to manage their confounding implications. We present doppelgangerIdentifier, a software suite for doppelgänger identification and verification. Applying doppelgangerIdentifier across a multitude of diseases and data types, we show the pervasive nature of DEs in biomedical gene expression data. We also provide guidelines toward proper doppelgänger identification by exploring the ramifications of lingering batch effects from batch imbalances on the sensitivity of our doppelgänger identification algorithm. We suggest doppelgänger verification as a useful procedure to establish baselines for model evaluation that may inform on whether feature selection and ML on the data set may yield meaningful insights. | URI: | https://hdl.handle.net/10356/164208 | ISSN: | 2589-0042 | DOI: | 10.1016/j.isci.2022.104788 | Rights: | © 2022 The Author(s). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). | Fulltext Permission: | open | Fulltext Availability: | With Fulltext |
Appears in Collections: | LKCMedicine Journal Articles SBS Journal Articles SCSE Journal Articles |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
1-s2.0-S2589004222010604-main.pdf | 5.62 MB | Adobe PDF | View/Open |
SCOPUSTM
Citations
50
2
Updated on Feb 3, 2023
Web of ScienceTM
Citations
50
1
Updated on Jan 31, 2023
Page view(s)
16
Updated on Feb 3, 2023
Download(s)
2
Updated on Feb 3, 2023
Google ScholarTM
Check
Altmetric
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.