Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/183844
Title: | Mechanism decoding and de novo design of drug molecules with deep learning for targeted therapy | Authors: | Wang, Conghao | Keywords: | Computer and Information Science | Issue Date: | 2025 | Publisher: | Nanyang Technological University | Source: | Wang, C. (2025). Mechanism decoding and de novo design of drug molecules with deep learning for targeted therapy. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/183844 | Abstract: | Molecular design is a paramount mission in drug discovery process. Successful drug design strategies depend on two tasks: decoding of drug mechanism of action; and development of algorithm for targeted design of drug molecules, both of which remain obscure currently. Intuitively, decoding mechanism of drugs involves understanding the roles of both biological features from drug targets and chemical features from drug molecules in the interaction between drugs and their targets. Based on features extracted from potential biological targets, targeted drug design algorithm learns to generate molecule structures accordingly. Therefore, in this thesis, we divide our research on the two tasks into three chapters: (1) identification of salient multi-omics features in predicting drug responses; (2) investigation on salient chemical structures and their interaction; and (3) development of deep generative models for molecular design. In Chapter 2, to elucidate the biological activities caused by drugs, we depict the biological profiles of cells with multi-omics data and leverage them to predict drug responses. Complex diseases, such as cancer and neurodegenerative disorder, are often caused by factors of various biological levels (e.g., gene mutation and protein misfolding), which renders it difficult to demystify the pathological reason of the disease progression. Multi-omics technology enables researchers to investigate the cause of such diseases from multiple levels. However, multi-omics data are intrinsically heterogeneous and high-dimensional. To overcome such difficulties, we propose an attention-based deep neural network to predict cancer drug responses. We first reduce the dimensionality of the multi-omics data with embedding layers, then leverage the attention mechanism to integrate the latent multi-omics features. By looking into the attention weights assigned by the model, we found that gene mutation and proteomics features gave rise to the prediction to a great extent. In addition to integrating multi-omics features with the attention mechanism, we explore the network-based approach, which converts the high-dimensional data into patient similarity networks (PSN), on vital status prediction of neuroblastoma patients. PSNs constructed with different omics data are merged by the similarity network fusion algorithm, and topological features are then extracted from the PSN, including centrality features and modularity features. Centrality features reflect the node’s importance in the network, and the modularity features illustrate the membership of each node in the modules identified by clustering algorithm. We predict vital status of neuroblastoma patients with deep neural networks and explain the feature importance with integrated gradients. Our results show that modularity features generally contribute to the prediction more than centrality features. In Chapter 3, we investigate the association between drug molecular structures and drug responses. We represent drugs with molecular graphs, which preserve more structural information than 1D descriptors such as simplified molecular-input line-entry system (SMILES) or molecular fingerprints, and utilize graph neural networks (GNN) to comprehensively model the molecule structures. We also incorporate gene expression and gene mutation profiles of cancer cell lines for drug response prediction and integrate the learned latent drug and cell line features with cross-attention mechanism. We further analyze the model’s interpretability with GNNExplainer and integrated gradients. Pathway analysis is conducted based on the gene saliency scores. Our model successfully captured the mechanism of action of the drugs and predicted the drug responses accurately. In Chapter 4, we step into design of potential drug candidates with desired biological and chemical properties. The rapid development of deep generative models has brought forth an unprecedented opportunity of generating molecular structures from scratch, named \textit{de novo} design, rather than screening the vast chemistry library. Inspired by the generative model, we develop a constrained graph latent diffusion model to generate hit candidates under the guidance of transcriptomic profiles. We first train a variational autoencoder to model the latent space of drug-like molecules, and then train the diffusion model to manufacture the latent representations. This circumvents operating diffusion models directly on the topological structures of molecules, which is difficult and costly. In order to generate drug candidates with specific biological targets, we incorporate gene expression changes that are expected to be caused by the generated molecules as constraints. Our method exhibited outstanding performance in a series of constrained and unconstrained generation tasks. In addition to drug design guided by biological targets, we explore pharmacophore-guided drug design, which induces the generation of molecules with characterized chemical structures that can form specific types of interactions with other biomolecules. Specifically, we devise a geometric diffusion bridge to align the distributions between 3D pharmacophore arrangements and molecular structures. Diffusion bridge extends the general diffusion model's ability of mapping the data distribution to the prior distribution into aligning arbitrary distributions. We equip the diffusion bridge with SE(3)-equivariant dynamics to transform between the point clouds representations of the molecule and the associated pharmacophores. The generated molecules exhibit high recovery rates of desired pharmacophores and high binding affinity with potential target proteins. This demonstrates our model's ability of transforming indispensable chemical structures into potential drug candidates. In sum, in this thesis, we have explored decoding of drug mechanism via explainable deep learning, and generation of de novo drug designs with generative models. Our approaches tackle the difficulty of interpreting deep neural networks in pharmaceutical research and provide a feasible way of generating hit candidates with desired biological activities expressed by identified salient features. We believe our research will expedite the drug discovery process drastically. | URI: | https://hdl.handle.net/10356/183844 | DOI: | 10.32657/10356/183844 | Schools: | College of Computing and Data Science | Rights: | This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). | Fulltext Permission: | open | Fulltext Availability: | With Fulltext |
Appears in Collections: | CCDS Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Thesis_Conghao.pdf | thesis pdf | 19.62 MB | Adobe PDF | View/Open |
Page view(s)
143
Updated on May 7, 2025
Download(s) 50
64
Updated on May 7, 2025
Google ScholarTM
Check
Altmetric
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.