Title: Simulation and Application on learning gene causal relationships
1Simulation and Application on learning gene
causal relationships
2Introduction
- High-throughput genetic technologies empowers to
study how genes interact with each other - Simulation to evaluate how well IC algorithm
learns gene causal relationships - We present an algorithm (mIC algorithm) for
learning causal relationship with knowledge of
topological ordering information, and apply it on
Melanoma dataset - Apply mIC algorithm on Melanoma dataset
3Steps for Simulation Study
- Construct a causal network N
- Generate datasets based on the causal network
- Learning the simulated data using causal
algorithms (e.g. IC algorithm) to obtain network
N - Compare the original network N with obtained
network N w.r.t precision and recall
4Modeling and simulation of a causal Boolean
network (BN)
- Constructing a causal structure
- Assign parameters (proper functions) for each
node with casual parents - Assign probability distribution
5Constructing Boolean Network
- 1. Generate M BNs with up to 3 causal parents for
each node - 2. For each BN, generate a random proper function
for each node - 3. Assign random probabilities for the root
gene(s) - 4. Given one configuration, get probability
distribution - 5. Collect 200 data points for each network
- 6. Repeat above steps 3-5 for all M networks.
6Constructing Causal Structure
7Steps for constructing causal structure
8Proper function (1)
Proper function The function that reflects the
influence of the operators. Example
By simplifying f, c is a function of a with c
a b is a pseudo predictor of c, and has no
effect on c.
f is not a proper function.
9Proper function (2)
- With n predictors, the number of proper function
is given by
10Probability Distribution
11Generating dataset
12Steps of learning gene causal relationships
- Step1 obtain the probability distribution and
data sampling - Step2 apply algorithms to find causal relations
- Step3 compare the original and obtained networks
based on the two notions of precision and recall - Step4 repeat step 1-3 for every random network
13Comparing two networks
A
B
A
B
D
C
D
C
Original Network
Obtained Network
14Precision and Recall
- Original graph is a DAG, while obtained graph has
both directed and undirected edges
Orig Graph Obt. Graph
FN
TP
TN
FP
PFN, PTP
PTN, PFP
Recall ATP/(AFNATP), Precision ATP/(ATP
AFP)
15Observational equivalence and Transitive Closure
- Two DAGs are said to be observational equivalent
(OE) if they have the same skeleton and the same
set of v-structure
OE
- Transitive closure (TC)
- A -gtB -gt C with A -gt C
- cc(x,y) is true if there is a directed or an
undirected edge from x to y - pcc(x,y) is true if there is a path from x to y
consisting of properly directed and undirected
edges - pcc(x,y) cc(x,y) pcc(x,z) ? pcc(z,y)
16Result for IC algorithm
17How to improve IC algorithm
- The original IC algorithm did not have good
results on learning gene causal relationships - A possible way to improve the performance is to
incorporate extra information - If we know the topological ordering of the
regulatory network, it would be helpful to
improve the learning result
18Gene topological ordering
- If a specific gene is the causal parent of
another gene - In a pathway, if one gene appears before another
gene - If one gene is at the beginning or at the end of
the pathway
IC algorithm topological ordering information
19mIC algorithm
- mIC algorithm based on IC, but incorporates both
topological ordering information with steady
state data to infer causality - 3 Steps of mIC algorithm
- Find conditional independence
- For each pair of gene gi and gj in a dataset,
test pairwise conditional independence. If they
are dependent, search for a set - Sij gk gi and gj are independent given gk,
with iltkltj, or jltklti. - Construct an undirected graph G such that gi and
gj are connected with an edge if an only if they
are pairwise dependent and no Sij can be found - Find v-structure
- For each pair of nonadjacent genes gi and gj
with common neighbor gk, if gk ?Sij, and kgti,
kgtj, add arrowheads pointing at gk, such as
gi -gtgk lt- gj - Orientate more directed edges according to rules
- Orientate the undirected edges without creating
new cycles and v-structures
20Results from mIC algorithm
21Melanoma dataset
- The 10 genes involved in this study chosen from
587 genes from the melonoma data - Previous studies show that WNT5A has been
identified as a gene of interest involved in
melanoma - Controlling the influence of WNT5A in the
regulation can reduce the chance of melanoma
metastasizing
22Applying mIC algorithm on Melanoma Dataset
Partial biological prior knowledge MMP3 is
expected to be the end of the pathway
23Conclusion
- Evaluated IC algorithm using simulation data
- We presented mIC algorithm that can infer gene
causal relationship from steady state data with
gene topological ordering information - Performed simulation based on Boolean network to
evaluate the performance of the causal
algorithms - We applied mIC algorithm to real biological
microarray data Melanoma dataset - The result showed that some of the important
causal relationships associated with WNT5A gene
have been identified using mIC algorithm.