Genetic Networks - PowerPoint PPT Presentation

About This Presentation

Title:

Genetic Networks

Description:

Title: Class 9: Phylogenetic Trees Author: Nir Friedman Last modified by: Iftach Nachman Created Date: 12/12/1999 5:49:58 PM Document presentation format – PowerPoint PPT presentation

Number of Views:94

Avg rating:3.0/5.0

Slides: 55

Provided by: NirF48

Category:

more less

Transcript and Presenter's Notes

Title: Genetic Networks

1
Genetic Networks
2
Cellular Networks

Most processes in the cell are controlled by
networks of interacting molecules
Metabolic Networks
Signal Transduction Networks
Regulatory Networks

3
Unifying View

The cell as a state machine
Cell state S (P1,P2, , R1, R2, m1, m2, )
P proteins, R mRNA molecules, m metabolites
Each cell at any given time, can be characterized
using its state S
Dynamics
Input(t), S(t) gt S(tDt)

4
What does it mean?

Steady Cell State cell type
Neuron
RBC
muscle cell
Tumor cell
Dynamics cellular process
Differentiation
Apoptosis
Cell Cycle

5
Gene Regulation Networks

Regulation of expression of genes is crucial
Regulation occurs at many stages
pre-transcriptional (chromatin structure)
transcription initiation
RNA editing (splicing) and transport
Translation initiation
Post-translation modification
RNA Protein degradation
Understanding regulatory processes is a central
problem of biological research

6
Genetic Network Models Goals

Incorporate rule-based dependencies between genes
Rule-based dependencies may constitute important
biological information.
Allow to systematically study global network
dynamics
In particular, individual gene effects on
long-run network behavior.
Must be able to cope with uncertainty
Small sample size, noisy measurements, biological
noise
Quantify the relative influence and sensitivity
of genes in their interactions with other genes
This allows us to focus on individual (groups of)
genes.
What model should we use?

7
Level of Biochemical Detail

Detailed models require lots of data!
Highly detailed biochemical models are only
feasible for very small systems which are
extensively studied
Example Arkin et al. (1998), Genetics
149(4)1633-48
lysis-lysogeny switch in Lambda phage
5 genes, 67 parameters based on 50 years of
research, stochastic simulation required
supercomputer

8
Example Lysis-Lysogeny
Arkin et al. (1998), Genetics 149(4)1633-48
9
Level of Biochemical Detail

In-depth biochemical simulation of e.g. a whole
cell is infeasible (so far)
Less detailed network models are useful when data
is scarce and/or network structure is unknown
Once network structure has been determined, we
can refine the model

10
Boolean or Continuous?

Boolean Networks (Kauffman (1993), The Origins of
Order) assumes ON/OFF gene states.
Allows analysis at the network-level
Provides useful insights in network dynamics
Algorithms for network inference from binary data

11
Boolean Formalism Cons

Boolean abstraction is poor fit to real data
Cannot model important concepts
amplification of a signal
subtraction and addition of signals
compensating for smoothly varying environmental
parameter (e.g. temperature, nutrients)
varying dynamical behavior (e.g. cell cycle
period)
Feedback control
negative feedback is used to stabilize expression
?? causes oscillation in Boolean model

12
Boolean Formalism Pros

Studies give rise to qualitative phenomena, as
observed by experimentalists.
Some studied systems exhibit multiple steady
states and switchlike transitions between them.
It is experimentally shown that such systems are
robust to exact values of kinetic parameters of
individual reactions.

13
Concentrations or Molecules?

Use of concentrations assumes individual
molecules can be ignored
Known examples (in prokaryotes) where stochastic
fluctuations play an essential role (e.g.
lysis-lysogeny in lambda)
Requires stochastic simulation (Arkin et al.
(1998), Genetics 149(4)1633-48), or modeling
molecule counts (e.g. Petri nets, Goss and
Peccoud (1998), PNAS 95(12)6750-5)
Significantly increases model complexity

14
Concentrations or Molecules?

Eukaryotes larger cell volume, typically longer
half-lives. Few known stochastic effects.
Yeast 80 of the transcriptome
is expressed at 0.1-2 mRNA
copies/cell
Holstege, et al.(1998),
Cell 95717-728.
Human 95 of transcriptome is
expressed at lt5 copies/cell
Velculescu et al.(1997), Cell 88243-251

15
Spatial or Non-Spatial

Spatiality introduces additional complexity
intercellular interactions
spatial differentiation
cell compartments
cell types
Spatial patterns also provide more data
e.g. stripe formation in Drosophila
Mjolsness et al. (1991), J. Theor. Biol. 152
429-454.
Few (no?) large-scale spatial gene expression
data sets available so far.

16
Example Drosophila Segmentation
eve (even-striped) expression
anterior
posterior
high
eve (stripe 2)
hb
Kr
gt
bcd
low
expression of transcription factors in embryo
17
Deterministic or Stochastic?

Many sources of stochasticity
Bioloical stochasticity
Experimental noise
Stochastic models can account for those
Deterministic models are usually simpler to
analyze (dynamics, steady states) and interpret

18
Modeling Approaches

Boolean Networks
Linear Models
Bayesian Networks

19
Boolean Network
20
What is a Boolean Network?

Boolean network is a kind of Graph
G(V, F) V is a set of nodes ( genes )
F is a list of Boolean functions
Every node has only two value ON ( 1 ) and OFF
( 0 )
Every function has the result value of each node
Representation standard, wiring , automata

21
What is a Boolean Network?

Attractor Certain states revisited infinitely
often depending on the initial starting state.
Basin of attraction
Limit-cycle attractor

22
Boolean Network Example
Nodes (genes)
23
Boolean Network Example
Nodes (genes)
24
Basic Structure of Boolean Networks

Each node is a gene
1 means active/expressed
0 means inactive/unexpressed

A
B
Boolean function A B X 0 0 1 0 1 1 1 0 0 1
1 1
X
In this example, two genes (A and B) regulate
gene X. In principle, any number of input genes
are possible. Positive/negative feedback is also
common (and necessary for homeostasis).
25
Dynamics of Boolean Networks
A
B
C
D
E
F
Time
0
1
1
0
0
1
At a given time point, all the genes form a
genome-wide gene activity pattern (GAP) (binary
string of length n ). Consider the state space
formed by all possible GAPs.
26
State Space of Boolean Networks

Similar GAPs lie close together.
There is an inherent directionality in the state
space.
Some states are attractors (or limit-cycle
attractors). The system may alternate between
several attractors.
Other states are transient.

Picture generated using the program DDLab.
27
Reverse Engineering Problem
Can we infer the structure and rules of a genetic
network from gene expression measurements?
28
Reverse Engineering Problem

Input Gene expression data
Output Network structure and parameters (or
regulation rules)

29
Gene Expression Time Series Data
gene 1 gene 2 gene 3
Problem how can these data be used to infer how
these three genes influence each other?
30
Modelling Gene Expression Data
gene 1 gene 2 gene 3
assume that genes exist in two states on and off
if expression of gene i is above level ti
consider it on, otherwise, consider it off
31
Modelling Gene Expression Data
gene 1 gene 2 gene 3
t1
t2
t3
assume that genes exist in two states on and off
if expression of gene i is above level ti
consider it on, otherwise, consider it off
32
Modelling Gene Expression Data
gene 1 gene 2 gene 3
on
on
on
on
on
on
on
on
on
t1
on
on
on
t2
off
on
on
t3
off
off
off
off
off
off
off
off
off
off
off
off
off
off
off
off
off
off
assume that genes exist in two states on and off
if expression of gene i is above level ti
consider it on, otherwise, consider it off
33
Modelling Gene Expression Data

we obtain the following discretized gene
expression data

time 0 5 10 15 20 25 30 35 40 45 50 55
gene 1 0 0 0 0 0 0 1 1 1 1 1 1
gene 2 0 0 0 0 0 0 0 1 1 0 0 0
gene 3 1 1 1 1 1 1 1 0 0 0 0 0

the gene expression data is now in the form of
bit streams

34
Information Theoretic Tools

we define some necessary information theoretic
tools
Shannon entropy of data stream
H(X) - ? pi log(pi)
where pi is the probability that a random
element of data stream X is i
(the base of the logarithm can be anything, but
must be consistent throughout usually we use
base 2)

35
Information Theoretic Tools

e.g. Shannon entropy of data streams X and Y
X 0, 1, 1, 1, 1, 1, 1, 0, 0, 0
Y 0, 0, 0, 1, 1, 0, 0, 1, 1, 1
H(X) - ? pi logn(pi)
-(pX0 log2(pX0) pX1 log2(pX1))
-(0.4 log2(0.4) 0.6 log2(0.6))
0.971
H(Y) - ? pi logn(pi)
-(0.5 log2(0.5) 0.5 log2(0.5))
1.0

36
Information Theoretic Tools

e.g. Shannon joint entropy of data streams X and
Y
X 0, 1, 1, 1, 1, 1, 1, 0, 0, 0
Y 0, 0, 0, 1, 1, 0, 0, 1, 1, 1
H(X, Y) - ? pi logn(pi)
-(pX0,Y0 log2(pX0,Y0,) pX1,Y0
log2(pX1,Y0)
pX0,Y1 log2(pX0,Y1,) pX1,Y1
log2(pX1,Y1))
-(0.1 log2(0.1) 0.4 log2(0.4)
0.3 log2(0.3) 0.2 log2(0.2)
1.85

37
Information Theoretic Tools

Define
Conditional Entropy
H(XY) H(X, Y) H(X)
H(YX) H(X, Y) H(Y)
Mutual Information
M(X, Y) H(Y) - H(YX)
H(X) - H(XY)
H(X) H(Y) - H(X,Y)

38
Information Theoretic Tools

It is easy to show that
Let X be an input data stream
and Y be an output data stream
If M(Y, X) H(Y)
then X exactly determines Y
Look for pairs(x,y) where M(Yt1, Xt) H(Yt1)

39
Identification of the Network Graph

back to the data

time 1 2 3 4 5 6 1 2 3 1 2 3 1 2
gene A 0 0 1 1 1 1 0 1 1 0 1 1 1 1
gene B 0 0 0 1 0 0 1 0 1 1 0 1 1 1
gene C 0 1 1 0 0 0 0 1 0 1 0 0 1 0

step 1 put data in state transition table form

40
Identification of the Network Graph

state transition table

Input stream value Input stream value Input stream value Output stream value Output stream value Output stream value
Ai-1 Bi-1 Ci-1 Ai Bi Ci
0 0 0 0 0 1
0 0 1 1 0 1
0 1 0 0 0 1
0 1 1 1 0 1
1 0 0 1 0 0
1 0 1 1 1 0
1 1 0 1 0 0
1 1 1 1 1 0

step 1 put data in state transition table form

41
Identification of the Network Graph

state transition table tells us how to get from
state i 1 to state i as a lookup table
however, it is difficult to discern functional
relationships, so
step 2 use information theoretic tools to
discover which inputs determine the outputs

42
Identification of the Network Graph

step 2a calculate entropies

note limx?0xx1, therefore in the left-hand
limit, (0)log(0) 0. H(Ai) -((0.25)log(0.25)
(0.75)log(0.75)) 0.81 H(Bi)
-((0.75)log(0.75) (0.25)log(0.25)) 0.81 H(Ci)
-((0.5)log(0.5) (0.5)log(0.5)) 1 H(Ai-1)
H(Bi-1) H(Ci-1) -((0.5)log(0.5)
(0.5)log(0.5)) 1 H(Ai-1, Ci-1)
-((0.25)log(0.25) (0.25)log(0.25)
(0.25)log(0.25) (0.25)log(0.25)) 2
43
Identification of the Network Graph

step 2a calculate entropies

H(Ai, Ai-1, Ci-1) -((0.25)log(0.25)
(0.25)log(0.25)
(0.25)log(0.25) (0.25)log(0.25)) 2 H(Bi,
Ai-1, Ci-1) -((0.25)log(0.25) (0.25)log(0.25)
(0.25)log(0.25)
(0.25)log(0.25)) 2 H(Ci, Ai-1)
-((0.5)log(0.5) (0.5)log(0.5) 1
44
Identification of the Network Graph

step 2b calculate mutual information

M(Ai, Ai-1, Ci-1) H(Ai) H(Ai-1, Ci-1) -
H(Ai, Ai-1, Ci-1) 0.81 2 2
0.81
H(Ai), therefore Ai-1 and Ci-1 determine
Ai M(Bi, Ai-1, Ci-1) H(Bi) H(Ai-1, Ci-1)
- H(Bi, Ai-1, Ci-1) 0.81 2 2
0.81
H(Bi), therefore Ai-1 and Ci-1 determine
Bi M(Ci, Ai-1) H(Ci) H(Ai-1) - H(Ci,
Ai-1) 1 1 1 1
H(Ci), therefore Ai-1
determines Ci
45
Identification of the Boolean Circuits

step 3 determine functional relationship between
variables (this is simply the truth table)

Ai-1 Ci-1 Ai
0 0 0
0 1 1
1 0 1
1 1 1
Ai Ai-1 OR Ci-1
46
Identification of the Boolean Circuits

step 3 determine functional relationship between
variables

Ai-1 Ci-1 Bi
0 0 0
0 1 0
1 0 0
1 1 1
Bi Ai-1 AND Ci-1
47
Identification of the Boolean Circuits

step 3 determine functional relationship between
variables

Ai-1 Ci
0 1
0 0
Ci NOT Ai-1
48
Problems With This Approach

no theory exists for determining the
discretization level ti
the assumption that genes can be modeled as
either on or off may be sufficient for some
genes, but will certainly not be sufficient for
all genes
Ignores noise of all kinds (experimental,
biological)

49
Boolean networks areinherently deterministic

Conceptually, the regularity of genetic function
and interaction is not due to hard-wired
logical rules, but rather to the intrinsic
self-organizing stability of the dynamical
system.
Additionally, we may want to model an open system
with inputs (stimuli) that affect the dynamics of
the network.

From an empirical viewpoint, the assumption of
only one logical rule per gene may lead to
incorrect conclusions when inferring these rules
from gene expression measurements, as the latter
are typically noisy and the number of samples is
small relative to the number of parameters to be
inferred.

50
Linear Models

Basic model weighted sum of inputs
Simple network representation
Only first-order approximation
Parameters of the model
weight matrix containing NxN interaction
weights
Fitting the model find the parameters wji, bi
such that model best fits available data

51
Underdetermined problem!

Assumes fully connected network need at least as
many data points (arrays, conditions) as
variables (genes)!
Underdetermined (underconstrained, ill-posed)
model we have many more parameters than data
values to fit
No single solution, rather infinite number of
parameter settings that will all fit the data
equally well

52
Solution 1 reduce N

Rather than trying to model all genes, we can
reduce the dimensionality of the problem
Network of clusters construct a linear model
based on the cluster centroids
rat CNS data (4 clusters) Wahde and Hertz
(2000), Biosystems 55, 1-3129-136.
yeast cell cycle (15-18 clusters) Mjolsness et
al.(2000), Advances in Neural Information
Processing Systems 12 van Someren et al.(2000)
ISMB2000, 355-366.
Network of Principal Components linear model
between characteristic modes of the data
Holter et al.(2001), PNAS 98(4)1693-1698.

53
Solution 2

Take advantage of additional information
replicates
accuracy of measurements
smoothness of time series
Most likely, the network will still be poorly
constrained.
? Need a method to identify and extract those
parts of the model that are well-determined and
robust

54
Danger of Overfitting

The linear model assumes every gene is regulated
by all other genes (i.e. full connectivity)
This is the richest model of its kind
Danger to over fit the training data
Will result in poor prediction on new data
Far from reality only few regulators for each
gene

Write a Comment

User Comments (0)