Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/180519
Title: | A comparative analysis of ENCODE and Cistrome in the context of TF binding signal | Authors: | Perna, Stefano Pinoli, Pietro Ceri, Stefano Wong, Limsoon |
Keywords: | Medicine, Health and Life Sciences | Issue Date: | 2024 | Source: | Perna, S., Pinoli, P., Ceri, S. & Wong, L. (2024). A comparative analysis of ENCODE and Cistrome in the context of TF binding signal. BMC Genomics, 25(Suppl 3), 817-. https://dx.doi.org/10.1186/s12864-024-10668-6 | Project: | SBPP3 MOE T1 251RES1725 |
Journal: | BMC Genomics | Abstract: | Background: With the rise of publicly available genomic data repositories, it is now common for scientists to rely on computational models and preprocessed data, either as control or to discover new knowledge. However, different repositories adhere to the different principles and guidelines, and data processing plays a significant role in the quality of the resulting datasets. Two popular repositories for transcription factor binding sites data - ENCODE and Cistrome - process the same biological samples in alternative ways, and their results are not always consistent. Moreover, the output format of the processing (BED narrowPeak) exposes a feature, the signalValue, which is seldom used in consistency checks, but can offer valuable insight on the quality of the data. Results: We provide evidence that data points with high signalValue(s) (top 25% of values) are more likely to be consistent between ENCODE and Cistrome in human cell lines K562, GM12878, and HepG2. In addition, we show that filtering according to said high values improves the quality of predictions for a machine learning algorithm that detects transcription factor interactions based only on positional information. Finally, we provide a set of practices and guidelines, based on the signalValue feature, for scientists who wish to compare and merge narrowPeaks from ENCODE and Cistrome. Conclusions: The signalValue feature is an informative feature that can be effectively used to highlight consistent areas of overlap between different sources of TF binding sites that expose it. Its applicability extends to downstream to positional machine learning algorithms, making it a powerful tool for performance tweaking and data aggregation. | URI: | https://hdl.handle.net/10356/180519 | ISSN: | 1471-2164 | DOI: | 10.1186/s12864-024-10668-6 | Schools: | Lee Kong Chian School of Medicine (LKCMedicine) | Rights: | © 2024 The Author(s). Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativeco mmons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. | Fulltext Permission: | open | Fulltext Availability: | With Fulltext |
Appears in Collections: | LKCMedicine Journal Articles |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
s12864-024-10668-6.pdf | 2.97 MB | Adobe PDF | ![]() View/Open |
Page view(s)
52
Updated on Mar 17, 2025
Download(s)
5
Updated on Mar 17, 2025
Google ScholarTM
Check
Altmetric
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.