Using Bayesian Networks to Analyze Expression Data - PowerPoint PPT Presentation

About This Presentation

Title:

Using Bayesian Networks to Analyze Expression Data

Description:

Understanding regulatory processes is a central problem of ... Example: Pedigree. A node represents. an individual's. genotype. Homer. Bart. Marge. Lisa ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 29

Provided by: nirf

Category:

more less

Transcript and Presenter's Notes

Title: Using Bayesian Networks to Analyze Expression Data

1
Using Bayesian Networks to Analyze Expression Data

N. Friedman M. Linial I. Nachman D. Peer
Hebrew University, Jerusalem

2
Central Dogma
Translation
Protein
Cells express different subset of the genes In
different tissues and under different conditions
3
Gene Regulation

Regulation of expression of genes is crucial
Regulation occurs at many stages
pre-transcriptional (chromatin structure)
transcription initiation
RNA editing (splicing) and transport
Translation initiation
Post-translation modification
RNA Protein degradation
Understanding regulatory processes is a central
problem of biological research

4
Microarrays (aka DNA chips)

New technological breakthrough
Measure RNA expression levels of thousands of
genes in one experiment
Measure expression on a genomic scale
Opens up new experimental designs
Many major labs are using,or will use this
technology in the near future

5
The Problem
Genes
j
Experiments
i

Goal
Learn regulatory/metabolic networks
Identify causal sources of the biological
phenomena of interest

6
Analysis Approaches

Clustering of expression data
Groups together genes with similar expression
patterns
Does not reveal structural relations
between genes
Boolean networks
Deterministic models of the logical interactions
between genes
Deterministic, impractical for real data

7
Example Cell-Cycle Data Spellman et al
Cell cycle stages
clusters
8
Our Approach

Characterize statistical relationships between
expression patterns of different genes
Beyond pair-wise interactions
Many interactions are explained by intermediate
factors
Regulation involves combined effects of several
gene-products

We build on the language of Bayesian networks

9
Network Example

Noisy stochastic process
Example Pedigree
A node represents an individualsgenotype

Modeling assumptions
Ancestors can effect descendants' genotype only
by passing genetic materials through intermediate
generations

10
Network Structure
Ancestor

Generalizing to DAGs
A child is conditionally independent from its
non-descendents, given the value of its parents
Often a natural assumption for causal processes
if we believe that we capture the relevant state
of each intermediate stage.

Parent
Non-descendent
Non-descendent
Descendent
11
Local Probabilities

Associated with each variable Xi is a conditional
probability distribution P(XiPai?)
Discrete variables Multinomial distribution
Continuous variables Choice for example
linear Gaussian

12
Bayesian Network Semantics
Qualitative part DAG specifies
conditional independence statements
Quantitative part local probability models
Unique joint distribution over domain

Compact efficient representation
? k parents ?? O(2kn) vs. O(2n) params
parameters pertain to local interactions

P(C,A,R,E,B) P(B)P(EB)P(RE,B)P(AR,B,E)P(C
A,R,B,E)
versus P(C,A,R,E,B) P(B)P(E)
P(RE) P(AB,E) P(CA)
13
Why Bayesian Networks?

Bayesian Networks
Flexible representation of dependency structure
of multivariate distributions
Natural for modeling processes with local
interactions
Learning of Bayesian Networks
Can learn dependencies from observations
Handles stochastic processes
true stochastic behavior
noise in measurements

14
Modeling Biological Regulation

Variables of interest
Expression levels of genes
Concentration levels of proteins
Exogenous variables Nutrient levels, Metabolite
Levels, Temperature,
Phenotype information
Bayesian Network Structure
Capture dependencies among these variables

15
Examples

Interactions are represented by a graph
Each gene is represented by a node in the graph
Edges between the nodes represent direct
dependency

16
More Complex Examples

Dependencies can be mediated through other nodes
Common effects can imply conditional dependence

B
A
C
B
A
C
Common cause
Intermediate gene
17
Outline of Our Approach
Bayesian Network Learning Algorithm
Expression data
Use learned network to make predictions about
structure of the interactions between genes
18
Learning With Many Variables
Sparse Candidate algorithm - efficient heuristic
search that relies on sparseness

Choose candidate set for direct influence for
each gene
Find optimal BN constrained on candidates
Iteratively improve candidate set

19
Experiment

Data from Spellman et al. (Mol.Bio. of the Cell
1998).
Contains 76 samples of all the yeast genome
Different methods for synchronizing cell-cycle in
yeast.
Time series at few minutes (5-20min) intervals.
Spellman et al. identified 800 cell-cycle
regulated genes.

20
Methods

Experiment 1 discretized data into 3 levels
Learn multinomial probabilities
Experiment 2
Learn linear interactions (w/ Gaussian noise)
No prior biological knowledge was used

21
Network Learned
22
Challenge Statistical Significance

Sparse Data
Small number of samples
Flat posterior -- many networks fit the data
Solution
estimate confidence in network features
Two types of features
Markov neighbors X directly interacts with Y
Order relations X is an ancestor of Y

23
Confidence Estimates
D1
Bootstrap approachFGW, UAI99
Learn
resample
D2
E
B
D
Learn
resample
R
A
C
...
resample
E
B
Dm
Learn
R
A
C
Estimate
24
Testing for Significance

We run our procedure on randomized data where we
reshuffled the order of values for each gene
Histograms of number of Markov features at each
confidence level

Randomized Data
Original Data
25
Testing for Significance

We run our procedure on randomized data where we
reshuffled the order of values for each gene

Markov w/ Gaussian Models
4000
3500
3000
2500
2000
Features with Confidence above t
1500
1000
500
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
t
26
Testing for Significance
Markov w/ Multinomial Models
1400
1200
1000
800
Features with Confidence above t
600
400
200
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
t
27
Local Map
28
Finding Key Genes

Key gene a gene that preceeds many other genes
YLR183C
MCD1 Mitotic Chromosome Determinant
RAD27 DNA repair protein
CLN2 role in cell cycle START
SRO4 involved in cellular polarization during
budding
YOX1 Homeodomain protein that binds leu-tRNA gene
POL30 required for DNA replication and repair
YLR467W
CDC5
MSH6 Homolog of the human GTBP protein
YML119W
CLN1 role in cell cycle START

29
Strong Markov Relations
30
Future Work

Finding suitable local distribution models
Temporal aspect - DBN
Correct handling of hidden variables
Can we recognize hidden causes of coordinated
regulation events?
Incorporating prior knowledge
Incorporate large mass of biological knowledge,
and insight from sequence/structure databases
Abstraction
Combine with cluster analysis

31
Future Work -- Causality