Bayesian framework for building gene regulatory networks including delays
Date of Issue2013
School of Computer Engineering
Bioinformatics Research Centre
Any biological activity is due to several genes regulating one another in a gene regulatory network (GRN). GRN are causal models where activation of one gene could cause another gene to be activated or inhibited. Dynamic Bayesian networks (DBN) are being widely used to model GRN because of their ability to capture complex regulatory interactions among genes. DBN are presumed to be of first-order such that gene expression at a particular time point is only dependant on the previous time point. However, regulatory events are inherently asynchronous and transpire at different times, and involve delays. In order to account for delays in GRN, we propose higher-order DBN. Chapter 1 gives an introduction to GRN and describes earlier works of building GRN, which are broadly classified in to (i) Boolean networks, (ii) differential equations, and (iii) stochastic models. The limitations of these approaches in representing delays in regulatory networks and the motivation to this research work are then highlighted. Four sets of benchmark data used in the demonstration of research work in the thesis are described next: synthetic data, two sets of cell-cycle data, and DNA repair data collected in M. tuberculosis. The pre-processing techniques used and rational of using them are demonstrated next. The first chapter ends with a summary of contributions made in this thesis work, which are detailed in Chapters from 2-5. Chapter 6 concludes the thesis with a summary and potential future works. Chapter 2 describes the DBN approach for building GRN and introduces novel ways of evaluating their confidences, robustness, and structure learning. Bayesian networks are acyclic while DBN are able to handle feedbacks and loops. We demonstrate how Dirichlet priors improve the estimation of parameters of GRN; next we compare structural learning by Markov chain Monte Carlo (MCMC) technique and genetic algorithm (GA), and lastly how bootstrapping helps determine the confidence of structures and connections. The MCMC technique generates more accurate structures on small set of genes while GA-based structure learning is more suitable for a large number of genes. We overcome the errors due to generalization and instabilities with bootstrapping. A probabilistic approach of bootstrapping of GRN is proposed and we compare it with sieve bootstrapping for evaluation of the confidences of GRN structures and parameters. Pathways are inherently asynchronous with delays in gene regulation and assembly of gene products. There the first-order assumption in DBN is extended to higher-orders where expressions of genes depend on the expression of parents from r>1 previous time points. Only a few have considered higher-order DBN (HDBN) to represent delays in regulatory networks but the order or the maximum delay of such networks has been fixed. To overcome this limitation, in Chapter 3, we introduce variable-order DBN (VDBN). The order is automatically determined by using a novel variable-order MCMC technique. This approach rendered more accurate GRN structure because of its ability to achieve correct delays and thereby biologically more plausible networks. A method for validation of estimated networks by using protein-protein interaction (PPI) data is presented. The results with the benchmark data show improved accuracy of networks built. We investigate biological relevance of the genes, connections, and delays represented in the network. The higher-order and variable-order DBN models are computationally intractable when the delays or orders of the networks are large. In Chapter 4, a skip-chain Markov model is introduced to handle long delays in regulatory networks. It automatically determines the optimal delays by using a hidden Markov model (HMM) and hence does not need to heuristically search the high-dimensional solution space. In this approach, we propose novel techniques to identify time-delayed interaction features and determine the optimal structure by computing the Viterbi scores and using an MCMC. In conclusion, the knowledge of regulatory networks gives a comprehensive picture of how genes function and interact with one another to execute a specific biological function. Two approaches of building higher-order gene regulatory networks were introduced in this research - VDBN and skip-chain DBN - that provide more accurate ways of building networks than earlier methods by accurately representing delays. The gene networks derived for the benchmark datasets, the core-genes and -networks, and their biological relevance were investigated. In addition, we proposed methods to validate, and determine robustness and significances of networks derived. This research should lead to building more accurate pathways in a data driven manner combining various sources of bioinformatics data. When the delays of regulations are large, the number of parameters involved becomes intractable. Our algorithms not only targeted the trade-off between optimality and complexity of solution landscapes but also yielded biologically relevant networks. This work can be extended to building more complex pathways by using multiple sources of bioinformatics data.
DRNTU::Engineering::Computer science and engineering::Information systems::Models and principles