Microarray Statistics - PowerPoint PPT Presentation

1 / 89
About This Presentation
Title:

Microarray Statistics

Description:

Reverse engineering gene and protein regulatory networks using ... Deregulation carcinogenesis. Extensively studied in the literature gold standard network ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 90
Provided by: Dirk70
Category:

less

Transcript and Presenter's Notes

Title: Microarray Statistics


1
Reverse engineering gene and protein regulatory
networks using Graphical Models. A comparative
evaluation study.
Marco Grzegorczyk Dirk Husmeier Adriano Werhli
2
(No Transcript)
3
Systems biology Learning signalling pathways and
regulatory networks from postgenomic data
4
(No Transcript)
5
possibly completely unknown
6
possibly completely unknown
E.g. Flow cytometry experiments
data
Here Concentrations of (phosphorylated) proteins
7
possibly completely unknown
E.g. Flow cytometry experiments
data
data
Machine Learning
statistical methods
8
extracted network
true network
Is the extracted network a good prediction of the
real relationships?
9
extracted network
true network
Evaluation of learning performance
biological knowledge (gold standard network)
10
Reverse Engineering of Regulatory Networks
  • Can we learn network structures from postgenomic
    data themselves?
  • Are there statistical methods to distinguish
    between direct and indirect correlations?
  • Is it worth applying time-consuming Bayesian
    network approaches although computationally
    cheaper methods are available?
  • Do active interventions improve the learning
    performances?
  • Gene knockouts (VIGs, RNAi)

11
direct interaction
common regulator
indirect interaction
co-regulation
12
Reverse Engineering of Regulatory Networks
  • Can we learn network structures from postgenomic
    data themselves?
  • Are there statistical methods to distinguish
    between direct and indirect correlations?
  • Is it worth applying time-consuming Bayesian
    network approaches although computationally
    cheaper methods are available?
  • Do active interventions improve the learning
    performances?
  • Gene knockouts (VIGs, RNAi)

13
Three widely applied methodologies
  • Relevance networks
  • Graphical Gaussian models
  • Bayesian networks

14
  • Relevance networks
  • Graphical Gaussian models
  • Bayesian networks

15
Relevance networks(Butte and Kohane, 2000)
  • Choose a measure of association A(.,.)
  • Define a threshold value tA
  • For all pairs of domain variables (X,Y) compute
    their association A(X,Y)
  • 4. Connect those variables (X,Y) by an
    undirected edge whose association A(X,Y) exceeds
    the predefined threshold value tA

16
Relevance networks(Butte and Kohane, 2000)
17
Relevance networks(Butte and Kohane, 2000)
  • Choose a measure of association A(.,.)
  • Define a threshold value tA
  • For all pairs of domain variables (X,Y) compute
    their association A(X,Y)
  • 4. Connect those variables (X,Y) by an
    undirected edge whose association A(X,Y) exceeds
    the predefined threshold value tA

18
direct interaction
common regulator
indirect interaction
co-regulation
19
Pairwise associations without taking the context
of the system into consideration
direct interaction
pseudo-correlations between A and B
E.g.Correlation between A and C is disturbed
(weakend) by the influence of B
20
strong correlation s12
21
  • Relevance networks
  • Graphical Gaussian models
  • Bayesian networks

22
Graphical Gaussian Models
Partial correlation, i.e. correlation
conditional on all other domain variables
Corr(X1,X2X3,,Xn)
strong partial correlation p12
But usually observations lt variables
23
Shrinkage estimation of the covariance matrix
(Schäfer and Strimmer, 2005)
0lt?0lt1 estimated (optimal) shrinkage intensity,
with
where
is guaranteed
24
direct interaction
common regulator
indirect interaction
co-regulation
25
Graphical Gaussian Models
direct interaction
common regulator
indirect interaction
P(A,B)P(A)P(B) But P(A,BC)?P(AC)P(BC)
26
Further drawbacks
  • Relevance networks and Graphical Gaussian models
    can extract undirected edges only.
  • Bayesian networks promise to extract at least
    some directed edges. But can we trust in these
    edge directions?
  • It may be better to learn undirected edges than
    learning directed edges with false orientations.

27
  • Relevance networks
  • Graphical Gaussian models
  • Bayesian networks

28
Bayesian networks
NODES
  • Marriage between graph theory and probability
    theory.
  • Directed acyclic graph (DAG) represents
    conditional independence relations.
  • Markov assumption leads to a factorization of the
    joint probability distribution

A
C
B
EDGES
D
E
F
29
Bayesian networks versus causal networks
Bayesian networks represent conditional
(in)dependency relations - not necessarily causal
interactions.
30
Bayesian networks versus causal networks
31
Bayesian networks
NODES
  • Marriage between graph theory and probability
    theory.
  • Directed acyclic graph (DAG) represents
    conditional independence relations.
  • Markov assumption leads to a factorization of the
    joint probability distribution

A
C
B
EDGES
D
E
F
32
Bayesian networks
Parameterisation Gaussian BGe scoring
metric dataN(µ,S) with normal-Wishart
distribution of the (unknown) parameters,
i.e. µN(µ,(vW)-1) and WWishart(T0)
33
Bayesian networks
BGe metric closed form solution
34
Learning the network structure
graph ? scoreBGe(graph)
Idea Heuristically searching for the graph M
that is most supported by the data
P(Mdata)gtP(graphdata), e.g. greedy search
35
MCMC sampling of Bayesian networks
  • Better idea Bayesian model averaging via Markov
    Chain Monte Carlo (MCMC) simulations
  • Construct and simulate a Markov Chain (Mt)t in
    the space of DAGs graph whose distribution
    converges to the graph posterior distribution as
    stationary distribution, i.e.
  • P(Mtgraphdata) ? P(graphdata)
  • t ? 8
  • to generate a DAG sample G1,G2,G3,GT

36
Order MCMC(Friedman and Koller, 2003)
  • Order MCMC generates a sample of node orders from
    which in a second step DAGs can be sampled

Acceptance probability (Metropolis Hastings)
G1,G2,G3,GT DAG sample
37
Equivalence classes of BNs
A
C
B
A
C
A
B
P(A,B)?P(A)P(B) P(A,BC)P(AC)P(BC)
C
B
A
C
completed partially directed graphs (CPDAGs)
B
v-structure
A
P(A,B)P(A)P(B) P(A,BC)?P(AC)P(BC)
C
B
38
CPDAG representations
CPDAGs
DAGs
Utilise the CPDAG sample for estimating the
posterior probability of edge relation features
where I(Gi) is 1 if the CPDAG Gi contains the
directed edge A?B, and 0 otherwise
39
CPDAG representations
CPDAGs
interpretation
DAGs
superposition
Utilise the DAG (CPDAG) sample for estimating the
posterior probability of edge relation features
where I(Gi) is 1 if the CPDAG of Gi contains the
directed edge A?B, and 0 otherwise
40
Interventional data
A and B are correlated
A
B
inhibition of A
A
B
A
B
A
B
down-regulation of B
no effect on B
41
Evaluation of Performance
  • Relevance networks and Graphical Gaussian models
    extract undirected edges (scores (partial)
    correlations)
  • Bayesian networks extract undirected as well as
    directed edges (scores posterior probabilities
    of edges)
  • Undirected edges can be interpreted as
    superposition of two directed edges with opposite
    direction.
  • How to cross-compare the learning performances
    when the true regulatory network is known?
  • Distinguish between DGE (directed graph
    evaluation) and UGE (undirected graph evaluation)

42
Probabilistic inference - DGE
true regulatory network
edge scores
data
low
high
Thresholding
concrete network predictions
TP1/2 FP0/4
TP2/2 FP1/4
43
Probabilistic inference - UGE
skeleton of true regulatory network
undirected edge scores add up scores of directed
edges with opposite direction
data
44
Probabilistic inference - UGE
skeleton of true regulatory network

undirected edge scores add up scores of directed
edges with opposite direction
data


45
Probabilistic inference
skeleton of true regulatory network
undirected edge scores
data
46
Probabilistic inference
skeleton of true regulatory network
undirected edge scores
data
high
low
Thresholding
concrete network (skeleton) predictions
TP1/2 FP0/1
TP2/2 FP1/1
47
Evaluation 1 AUC scoresArea under Receiver
Operator Characteristic (ROC) curve

sensitivity
inverse specificity
AUC0.5
AUC1
0.5ltAUC1
48
Evaluation 2 TP scores

We set the threshold such that we obtained 5
spurious edges (5 FPs) and counted the
corresponding number of true edges (TP count).
49
Evaluation 2 TP scores
5 FP counts
50
Evaluation 2 TP scores
BN
GGM
RN
5 FP counts
51
(No Transcript)
52
Evaluation
  • On real experimental cytometric from the RAF
    signalling pathway for which a gold standard
    network is known
  • On synthetic data simulated from this
    gold-standard network topology

53
Evaluation
  • On real experimental cytometric from the RAF
    signalling pathway for which a gold standard
    network is known
  • On synthetic data simulated from the
    gold-standard network topology

54
Evaluation Raf signalling pathway
  • Cellular signalling cascade which consists of 11
    phosphorylated proteins and phospholipids in
    human immune systems cell
  • Deregulation ? carcinogenesis
  • Extensively studied in the literature ?
    gold standard network

55
gold standard RAF pathway according to Sachs et
al. (2004)
56
Raf pathway
11 nodes (proteins) and 20 directed edges
57
Data
  • Intracellular multicolour flow cytometry
    experiments concentrations of 11 proteins
  • 5400 cells have been measured under 9 different
    cellular conditions (cues)
  • We decided to downsample our test data sets to
    100 instances - indicative of microarray
    experiments

58
(No Transcript)
59
Two types of experiments
60
(No Transcript)
61
Evaluation
  • On real experimental data, using the gold
    standard network from the literature
  • On synthetic data simulated from this
    gold-standard network

62
Raf pathway
63
Gaussian simulated data
64
Netbuilder simulated data
DNA TFs? DNA?TF ? mRNA ? protein
65
Netbuilder simulated data
KA
DNA TFA DNA?TFA DNA TFB DNA?TFB
KB
Steady-state approximation
66
Netbuilder simulated data
  • Generating data using Netbuilder tool
  • The main idea of Netbuilder is instead of
  • solving the steady state approximation to
  • ODEs explicitly, we approximate them
  • with a qualitatively equivalent combination
  • of multiplications and sums of sigmoidal
  • transfer functions.

67
  • Experimental Results

68
Synthetic data, observations
69
Synthetic data, interventions
70
Cytometry data, observations
71
Cytometry data, interventions
72
(No Transcript)
73
(No Transcript)
74
How can we explain the difference between
synthetic and real data ?
75
Raf pathway
76
(No Transcript)
77
Disputed structure of the gold-standard network
Regulation of Raf-1 by Direct Feedback
Phosphorylation. Molecular Cell, Vol. 17, 2005
Dougherty et al
78
Complications with real data
  • Interventions are not ideal owing to negative
    feedback loops.
  • Putative negative feedback loops Can we
    trust our gold-standard network?

79
Stabilisation through negative feedback loops
inhibition
80
Conclusions 1
  • BNs and GGMs outperform RNs, most notably on
    Gaussian data.
  • No significant difference between BNs and GGMs on
    observational data.
  • For interventional data, BNs clearly outperform
    GGMs and RNs, especially when taking the edge
    direction (DGE score) rather than just the
    skeleton (UGE score) into account.

81
Conclusions 2
  • Performance on synthetic data better than on
    real data.
  • Real data more complex
  • Real interventions are not ideal
  • Errors in the gold-standard network

82
Additional analysis I Raf pathway
83
Additional analysis I Raf pathway
84
Additional analysis I Raf pathway
85
CPDAGs of networks
ORIGINAL
MODIFIED
3/20 directed edges
13/16 directed edges
86
ORIGINAL
MODIFIED
Gaussian
Netbuilder
87
Some additional analysis II
88
Thank you
89
  • References
  • Butte, A.S. and Kohane, I.S. (2003) Relevance
    networks A first step toward
  • finding genetic regulatory networks within
    microarray data.
  • In Parmigiani, G., Garett, E.S., Irizarry, R.A.
    und Zeger, S.L. editors, The analysis
  • of Gene Expression Data, pages 428-446. Springer.
  • Doughtery, M.K. et al. (2005) Regulation of
    Raf-1 by Direct Feedback
  • Phosphorylation. Molecular Cell, 17, 215-227.
  • Friedman, N. and Koller, D. (2003) Being
    Bayesian about network structure.
  • Machine Learning, 5095-126.
  • Madigan, D. and York, J. (1995) Bayesian
    graphical models for discrete data.
  • International Statistical Review, 63215-232.
  • Sachs, K., Perez, O., Peer, D., Lauffenburger,
    D.A., Nolan, G.P. (2004)
  • Protein-signaling networks derived from
    multiparameter single-cell data.
  • Science, 308523-529.
Write a Comment
User Comments (0)
About PowerShow.com