Please use this identifier to cite or link to this item:
Title: DL-CRISPR : a deep learning method for off-target activity prediction in CRISPR/Cas9 with data augmentation
Authors: Zhang, Yu
Long, Yahui
Yin, Rui
Kwoh, Chee Keong
Keywords: Engineering::Computer science and engineering
Issue Date: 2020
Source: Zhang, Y., Long, Y., Yin, R., & Kwoh, C. K. (2020). DL-CRISPR : a deep learning method for off-target activity prediction in CRISPR/Cas9 with data augmentation. IEEE Access, 8, 76610-76617. doi:10.1109/access.2020.2989454
Project: RGANS1905 
Journal: IEEE Access 
Abstract: Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR- associated (Cas) system is a popular and easy to use gene-editing technique, but it has off-target risk. Cutting the off-target sites will harm the cells severely, hence in silico methods are needed to help to avoid this. Most existing in silico approaches mainly relied on a relatively small positive dataset and the data imbalance issue still exists. Besides, some samples used to be considered as negative are later proved to be positive. Hence, it is essential to refresh the dataset and develop more accurate off-target activity prediction programs. In this work, firstly, we extended the current positive dataset and explored the potential differences between positive and negative data based on the new dataset. Then we adopted a new data augmentation method to solve the data imbalance issue, and used the ensemble idea to take more negative data into consideration to make the model close to the real scenario, but at the same time keeping the model balance. Finally, we developed DL-CRISPR, a deep learning framework to predict off-target activity in CRISPR/Cas9. DL-CRISPR is evaluated and compared with other state-of-the-art methods on three kinds of datasets: 5-fold cross validation test datasets, putative off-targets datasets related to specific single guide RNAs (sgRNAs), and putative off-targets datasets related to unseen sgRNAs. DL-CRISPR realizes the best average accuracy, i.e. 98.57%, on 5-fold cross validation datasets and correctly detects more off-targets on datasets related to both seen and unseen sgRNAs.
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2020.2989454
Schools: School of Computer Science and Engineering 
Rights: © 2020 IEEE. This journal is 100% open access, which means that all content is freely available without charge to users or their institutions. All articles accepted after 12 June 2019 are published under a CC BY 4.0 license, and the author retains copyright. Users are allowed to read, download, copy, distribute, print, search, or link to the full texts of the articles, or use them for any other lawful purpose, as long as proper attribution is given.
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Journal Articles

Files in This Item:
File Description SizeFormat 
09076075.pdf5.67 MBAdobe PDFThumbnail

Citations 20

Updated on Feb 19, 2024

Web of ScienceTM
Citations 20

Updated on Oct 30, 2023

Page view(s)

Updated on Feb 20, 2024

Download(s) 50

Updated on Feb 20, 2024

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.