Title: Detecting robust time-delayed regulation in Mycobacterium tuberculosis
1Detecting robust time-delayed regulation in
Mycobacterium tuberculosis
Iti Chaturvedi and Jagath C Rajapakse
INCOB 2009
2Gene Regulatory Networks (GRN)
- GRN represents the regulatory effects (causal
effects) among genes involved in a particular
pathway. - Signal transduction may is transient dynamic
delayed regulations seem to exist. - Distributed nature causes intense cross-talk
- M. tuberculosis is a bacteria causing TB in man
and has a very slow growth rate in vitro . - The DNA repair pathway is activated when a damage
to DNA occurs. - System consists of LexA and RecA and upto 40
other genes that are regulated by these two
proteins.
Rv2719c
linB
dnaB
lexA
ruvC
recA
fadd21
dnaE2
fadd23
agenda
3Methods of Building GRN
- Bayesian networks (BN) graphical models with
regulations denoted by conditional probabilities
(Heckerman et al 1995) - Dynamic BN (DBN) Transition network over time
can model cyclic events (Friedman, N. et al 1998) - Higher-order (HDBN) Extended transition network
for longer delays - (Zheng et al 2006)
- Skip-chain BN Can model very long delays of
arbitary length using feature functions. (Galley,
M. 2006)
ai are the parents of gene i
4Optimization Using GA
Each individual is an o-nary interaction matrix
where for no interaction
and for o-order interaction
Crossover involves swapping several rows between
two parents.
5GA Algorithm
Initialize N individuals with 0.7 similarity
using Mutual Information
Initialize
Rank all individuals using fitness function.
Selection
Elite individual E is sent to next generation.
Select two parents using roulette strategy for
mating.
Mating
E is optimal network
Swap last few rows of the selected parents to
generate two new children. Try another crossover
point if population similarity is lt 0.7
Crossover
Yes
Terminate
Invert a random cell to cause mutation
No
If dga lt 1 for 20 generations or number of
generations greater than Q
Mutate
6Skip-chain Model
- The likelihood of a gene expression xi is given
by a weighted sum of linear and skip edge scores
- Linear-chain feature functions
represent local dependencies of o-order. - The skip-chain features
represent long range - dependencies in a GRN using a HMM
agenda
7Viterbi Forward Path
- The skip-edge score is given as a the normalized
MAP interaction - We can use maximum likelihood to estimate state
transition and emission probabilities - where denotes number of occurrences
for - The most probable path is given by MAP estimate
using dynamic programming
8Priors for Networks
- Most higher-order Markov models are sensitive to
change in pathways and associated data. - Gibbs prior is used to model target network
prior. - Interaction potentials, denotes an
interaction in target network and no
interaction - Here a small and large will reflect prior
more and vice versa. - We use adaption to reduce over-fitting due to
sparse feature - specific data.
- Adaption model can combine the reliable DBN with
a volatile feature specific HMM for long delays.
where
9Dirichlet Prior over Parameters
- We extend the MLE to a Bayesian learning.
Dirichlet is a conjugate prior for multinomial
distribution. - We can maximize probability as (MAP)
- Using the linear feature as a Dirichlet conjugate
prior for the skip feature of a gene we get - Lastly, the interpolated probability of gene
based on linear and skip-edges is
where
where
10Experiments M. tuberculosis
- Here we looked at the response of bacteria to
drug-induced stress. Treatment with Mitocyin C
caused DNA damage and hence led to the
upregulation of associated repair genes. - Eight time points are available at NCBI Gene
Expression Omnibus (GSE1642-GPL1396 series)
0.33hr, 0.75hr,1.5hr, 2hr, 4hr, 6hr, 8hr and 12hr
after DNA damage. - Data was discretized into 0 for down and 1 for up
regulation. - The corresponding skip probabilities were
calculated as described in methods . Upto seven
time points of delays were allowed. - Firstly, we used 9 genes previously specified. In
order to get an expanded dataset, the original
dataset was subjected to ICA and the components
closest to 9 genes were identified. This gave us
a second dataset of 32 genes.
agenda
11Time delays
Table 1. Predicted by DBN, HDBN, and skip-chain
without priors
Higher-order edges(hrs) Higher-order edges(hrs) Higher-order edges(hrs) Higher-order edges(hrs) Higher-order edges(hrs)
Genes Modelo ML 1(0.75) 2(1.5) 3(2) 4(4) 5(6)
9 DBN1 -14.7 9
9 HDBN3 -8.69 8 2 7
9 SKIP-CHAIN1/5 -6.05 13 (3)
32 DBN1 -48.9 36
32 HDBN4 -39.4 20 6 14 20
32 SKIP-CHAIN2/5 -37.2 54 18 (41) (4)
It can be seen that the ML of the underlying
skip-chain prediction is much higher than the DBN
or HDBN, confirming that the network fits data
well.
12Time delays Priors
Table 2. Predicted by skip-chain models with
priors Gibbs prior for structures and
Dirichlet prior for parameters
Higher-order edges(hrs) Higher-order edges(hrs) Higher-order edges(hrs) Higher-order edges(hrs) Higher-order edges(hrs)
Genes Modelo ML 1(0.75) 2(1.5) 3(2) 4(4) 5(6)
9 SKIP-CHAIN1 -6.05 13 (3)
9 SKIP-CHAIN(Gibbs)1 -5.8 11 (2)
9 SKIP-CHAIN(Dirichlet)2 -5.2 7 13 (11) (1)
9 SKIP-CHAIN(Gibbs and Dirichlet)3 -3.27 2 7 (4) (5)
32 SKIP-CHAIN2 -37.2 54 18 (41) (4)
32 SKIP-CHAIN(Gibbs)3 -35.2 37 16 24 (40) (3)
32 SKIP-CHAIN(Dirichlet)2 -35.05 54 16 (37) (4)
32 SKIP-CHAIN(Gibbs and Dirichlet)2 -34.54 50 15 (41) (4)
Using priors further increased likelihood and
gave many new time-delayed interactions. The
combined use of both Dirichlet and Gibbs priors
is optimal.
13Networks 9 genes
Fig 3 Time-delayed interactions in predicted
network of 9 genes (a) DBN network, (b) HDBN
network, (c) Skip-chain network, (d) Skip-chain
network with Gibbs prior, (e) Skip-chain network
with Dirichlet prior, (f) Skip-chain network with
Gibbs and Dirichlet prior
A small number of transcription factors (TF)
regulate the rest of the repair system. At the
same time the in-degree is low, as each gene is
regulated by just one TF.
14Networks 32 genes
Fig 4 Time-delayed interactions in predicted
network of 32 genes (a) DBN network, (b) HDBN
network, (c) Skip-chain network, (d) Skip-chain
network with Gibbs prior, (e) Skip-chain network
with Dirichlet prior, (f) Skip-chain network with
Gibbs and Dirichlet prior
The second dataset of 32 genes indicated that our
method is good for identifying core genes. Use of
priors gave better networks with fewer hubs.
15Conclusion
- An organism responds to changes in its
environment by altering the level of expression
of critical genes. - Skip-chain models address the difficulties of a
HDBN by easily incorporating long time-delayed
regulations. - Numerous time-delays are identified between the
same pair of genes. The forward Viterbi path
determines the best long-distant time delay
between two genes. - Using priors gave us higher likelihood and
improved the over-fitting in building the
regulatory networks. - The work can be extended to skip-chain
conditional random fields and fusion with protein
interaction networks.
16Thank You
backup