Genetic Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Genetic Networks

Description:

P proteins, R mRNA molecules, m metabolites ... Boolean Formalism: Cons. Boolean abstraction is poor fit to real data ... Boolean Formalism: Pros ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 55
Provided by: NirFri
Category:

less

Transcript and Presenter's Notes

Title: Genetic Networks


1
Genetic Networks
2
Cellular Networks
  • Most processes in the cell are controlled by
    networks of interacting molecules
  • Metabolic Networks
  • Signal Transduction Networks
  • Regulatory Networks

3
Unifying View
  • The cell as a state machine
  • Cell state S (P1,P2, , R1, R2, m1, m2, )
  • P proteins, R mRNA molecules, m metabolites
  • Each cell at any given time, can be characterized
    using its state S
  • Dynamics
  • Input(t), S(t) gt S(tDt)

4
What does it mean?
  • Steady Cell State cell type
  • Neuron
  • RBC
  • muscle cell
  • Tumor cell
  • Dynamics cellular process
  • Differentiation
  • Apoptosis
  • Cell Cycle

5
Gene Regulation Networks
  • Regulation of expression of genes is crucial
  • Regulation occurs at many stages
  • pre-transcriptional (chromatin structure)
  • transcription initiation
  • RNA editing (splicing) and transport
  • Translation initiation
  • Post-translation modification
  • RNA Protein degradation
  • Understanding regulatory processes is a central
    problem of biological research

6
Genetic Network Models Goals
  • Incorporate rule-based dependencies between genes
  • Rule-based dependencies may constitute important
    biological information.
  • Allow to systematically study global network
    dynamics
  • In particular, individual gene effects on
    long-run network behavior.
  • Must be able to cope with uncertainty
  • Small sample size, noisy measurements, biological
    noise
  • Quantify the relative influence and sensitivity
    of genes in their interactions with other genes
  • This allows us to focus on individual (groups of)
    genes.
  • What model should we use?

7
Level of Biochemical Detail
  • Detailed models require lots of data!
  • Highly detailed biochemical models are only
    feasible for very small systems which are
    extensively studied
  • Example Arkin et al. (1998), Genetics
    149(4)1633-48
  • lysis-lysogeny switch in Lambda phage
  • 5 genes, 67 parameters based on 50 years of
    research
  • stochastic simulation required supercomputer!

8
Example Lysis-Lysogeny
Arkin et al. (1998), Genetics 149(4)1633-48
9
Level of Biochemical Detail
  • In-depth biochemical simulation of e.g. a whole
    cell is infeasible (so far)
  • Less detailed network models are useful when data
    is scarce and/or network structure is unknown
  • Once network structure has been determined, we
    can refine the model

10
Boolean or Continuous?
  • Boolean Networks (Kauffman (1993), The Origins of
    Order) assumes ON/OFF gene states.
  • Allows analysis at the network-level
  • Provides useful insights in network dynamics
  • Algorithms for network inference from binary data

11
Boolean Formalism Cons
  • Boolean abstraction is poor fit to real data
  • Cannot model important concepts
  • amplification of a signal
  • subtraction and addition of signals
  • compensating for smoothly varying environmental
    parameter (e.g. temperature, nutrients)
  • varying dynamical behavior (e.g. cell cycle
    period)
  • Feedback control
  • negative feedback is used to stabilize expression
  • ?? causes oscillation in Boolean model

12
Boolean Formalism Pros
  • Studies give rise to qualitative phenomena, as
    observed by experimentalists.
  • Some studied systems exhibit multiple steady
    states and switchlike transitions between them.
  • It is experimentally shown that such systems are
    robust to exact values of kinetic parameters of
    individual reactions.

13
Concentrations or Molecules?
  • Use of concentrations assumes individual
    molecules can be ignored
  • Known examples (in prokaryotes) where stochastic
    fluctuations play an essential role (e.g.
    lysis-lysogeny in lambda)
  • Requires stochastic simulation (Arkin et al.
    (1998), Genetics 149(4)1633-48), or modeling
    molecule counts (e.g. Petri nets, Goss and
    Peccoud (1998), PNAS 95(12)6750-5)
  • Significantly increases model complexity

14
Concentrations or Molecules?
  • Eukaryotes larger cell volume, typically longer
    half-lives. Few known stochastic effects.
  • Yeast 80 of the transcriptome
    is expressed at 0.1-2 mRNA
    copies/cell
    Holstege, et al.(1998),
    Cell 95717-728.
  • Human 95 of transcriptome is
    expressed at lt5 copies/cell
    Velculescu et al.(1997), Cell 88243-251

15
Spatial or Non-Spatial
  • Spatiality introduces additional complexity
  • intercellular interactions
  • spatial differentiation
  • cell compartments
  • cell types
  • Spatial patterns also provide more data
  • e.g. stripe formation in Drosophila
  • Mjolsness et al. (1991), J. Theor. Biol. 152
    429-454.
  • Few (no?) large-scale spatial gene expression
    data sets available so far.

16
Example Drosophila Segmentation
eve (even-striped) expression
anterior
posterior
high
eve (stripe 2)
hb
Kr
gt
bcd
low
expression of transcription factors in embryo
17
Deterministic or Stochastic?
  • Many sources of stochasticity
  • Bioloical stochasticity
  • Experimental noise
  • Stochastic models can account for those
  • Deterministic models are usually simpler to
    analyze (dynamics, steady states) and interpret

18
Modeling Approaches
  • Boolean Networks
  • Linear Models
  • Bayesian Networks

19
Boolean Network
20
What is a Boolean Network?
  • Boolean network is a kind of Graph
  • G(V, F) V is a set of nodes ( genes )
    F is a list of Boolean functions
  • Every node has only two values ON ( 1 ) and
    OFF ( 0 )
  • Every function has the result value of each node
  • Representation standard, wiring , automaton

21
What is a Boolean Network?
  • Attractor Certain states revisited infinitely
    often depending on the initial starting state.
  • Basin of attraction
  • Limit-cycle attractor

22
Boolean Network Example
Nodes (genes)
23
Boolean Network Example
Nodes (genes)
24
Basic Structure of Boolean Networks
  • Each node is a gene
  • 1 means active/expressed
  • 0 means inactive/unexpressed

A
B
Boolean function A B X 0 0 1 0 1 1 1 0 0 1
1 1
X
In this example, two genes (A and B) regulate
gene X. In principle, any number of input genes
are possible. Positive/negative feedback is also
common (and necessary for homeostasis).
25
Dynamics of Boolean Networks
A
B
C
D
E
F
Time
0
1
1
0
0
1
At a given time point, all the genes form a
genome-wide gene activity pattern (GAP) (binary
string of length n ). Consider the state space
formed by all possible GAPs.
26
State Space of Boolean Networks
  • Similar GAPs lie close together.
  • There is an inherent directionality in the state
    space.
  • Some states are attractors (or limit-cycle
    attractors). The system may alternate between
    several attractors.
  • Other states are transient.

Picture generated using the program DDLab.
27
Reverse Engineering Problem
Can we infer the structure and rules of a genetic
network from gene expression measurements?
28
Reverse Engineering Problem
  • Input Gene expression data
  • Output Network structure and parameters (or
    regulation rules)

29
Gene Expression Time Series Data
gene 1 gene 2 gene 3
Problem how can these data be used to infer how
these three genes influence each other?
30
Modelling Gene Expression Data
gene 1 gene 2 gene 3
assume that genes exist in two states on and off
if expression of gene i is above level ti
consider it on, otherwise, consider it off
31
Modelling Gene Expression Data
gene 1 gene 2 gene 3
t1
t2
t3
assume that genes exist in two states on and off
if expression of gene i is above level ti
consider it on, otherwise, consider it off
32
Modelling Gene Expression Data
gene 1 gene 2 gene 3
on
on
on
on
on
on
on
on
on
t1
on
on
on
t2
off
on
on
t3
off
off
off
off
off
off
off
off
off
off
off
off
off
off
off
off
off
off
assume that genes exist in two states on and off
if expression of gene i is above level ti
consider it on, otherwise, consider it off
33
Modelling Gene Expression Data
  • we obtain the following discretized gene
    expression data
  • the gene expression data is now in the form of
    bit streams

34
Information Theoretic Tools
  • we define some necessary information theoretic
    tools
  • Shannon entropy of data stream
  • H(X) - ? pi log(pi)
  • where pi is the probability that a random
    element of data stream X is i
  • (the base of the logarithm can be anything, but
    must be consistent throughout usually we use
    base 2)

35
Information Theoretic Tools
  • e.g. Shannon entropy of data streams X and Y
  • X 0, 1, 1, 1, 1, 1, 1, 0, 0, 0
  • Y 0, 0, 0, 1, 1, 0, 0, 1, 1, 1
  • H(X) - ? pi logn(pi)
  • -(pX0 log2(pX0) pX1 log2(pX1))
  • -(0.4 log2(0.4) 0.6 log2(0.6))
  • 0.971
  • H(Y) - ? pi logn(pi)
  • -(0.5 log2(0.5) 0.5 log2(0.5))
  • 1.0

36
Information Theoretic Tools
  • e.g. Shannon joint entropy of data streams X and
    Y
  • X 0, 1, 1, 1, 1, 1, 1, 0, 0, 0
  • Y 0, 0, 0, 1, 1, 0, 0, 1, 1, 1
  • H(X, Y) - ? pi logn(pi)
  • -(pX0,Y0 log2(pX0,Y0,) pX1,Y0
    log2(pX1,Y0)
  • pX0,Y1 log2(pX0,Y1,) pX1,Y1
    log2(pX1,Y1))
  • -(0.1 log2(0.1) 0.4 log2(0.4)
  • 0.3 log2(0.3) 0.2 log2(0.2)
  • 1.85

37
Information Theoretic Tools
  • Define
  • Conditional Entropy
  • H(XY) H(X, Y) H(X)
  • H(YX) H(X, Y) H(Y)
  • Mutual Information
  • M(X, Y) H(Y) - H(YX)
  • H(X) - H(XY)
  • H(X) H(Y) - H(X,Y)

38
Information Theoretic Tools
  • It is easy to show that
  • Let X be an input data stream
  • and Y be an output data stream
  • If M(Y, X) H(Y)
  • then X exactly determines Y
  • Look for pairs(x,y) where M(Yt1, Xt) H(Yt1)

39
Identification of the Network Graph
  • back to the data
  • step 1 put data in state transition table form

40
Identification of the Network Graph
  • state transition table
  • step 1 put data in state transition table form

41
Identification of the Network Graph
  • state transition table tells us how to get from
  • state i 1 to state i as a lookup table
  • however, it is difficult to discern functional
    relationships, so
  • step 2 use information theoretic tools to
    discover which inputs determine the outputs

42
Identification of the Network Graph
  • step 2a calculate entropies

note limx?0xx1, therefore in the left-hand
limit, (0)log(0) 0. H(Ai) -((0.25)log(0.25)
(0.75)log(0.75)) 0.81 H(Bi)
-((0.75)log(0.75) (0.25)log(0.25)) 0.81 H(Ci)
-((0.5)log(0.5) (0.5)log(0.5)) 1 H(Ai-1)
H(Bi-1) H(Ci-1) -((0.5)log(0.5)
(0.5)log(0.5)) 1 H(Ai-1, Ci-1)
-((0.25)log(0.25) (0.25)log(0.25)
(0.25)log(0.25) (0.25)log(0.25)) 2
43
Identification of the Network Graph
  • step 2a calculate entropies

H(Ai, Ai-1, Ci-1) -((0.25)log(0.25)
(0.25)log(0.25)
(0.25)log(0.25) (0.25)log(0.25)) 2 H(Bi,
Ai-1, Ci-1) -((0.25)log(0.25) (0.25)log(0.25)
(0.25)log(0.25)
(0.25)log(0.25)) 2 H(Ci, Ai-1)
-((0.5)log(0.5) (0.5)log(0.5) 1
44
Identification of the Network Graph
  • step 2b calculate mutual information

M(Ai, Ai-1, Ci-1) H(Ai) H(Ai-1, Ci-1) -
H(Ai, Ai-1, Ci-1) 0.81 2 2
0.81
H(Ai), therefore Ai-1 and Ci-1 determine
Ai M(Bi, Ai-1, Ci-1) H(Bi) H(Ai-1, Ci-1)
- H(Bi, Ai-1, Ci-1) 0.81 2 2
0.81
H(Bi), therefore Ai-1 and Ci-1 determine
Bi M(Ci, Ai-1) H(Ci) H(Ai-1) - H(Ci,
Ai-1) 1 1 1 1
H(Ci), therefore Ai-1
determines Ci
45
Identification of the Boolean Circuits
  • step 3 determine functional relationship between
    variables (this is simply the truth table)

Ai Ai-1 OR Ci-1
46
Identification of the Boolean Circuits
  • step 3 determine functional relationship between
    variables

Bi Ai-1 AND Ci-1
47
Identification of the Boolean Circuits
  • step 3 determine functional relationship between
    variables

Ci NOT Ai-1
48
Problems With This Approach
  • no theory exists for determining the
    discretization level ti
  • the assumption that genes can be modeled as
    either on or off may be sufficient for some
    genes, but will certainly not be sufficient for
    all genes
  • Ignores noise of all kinds (experimental,
    biological)

49
Boolean networks areinherently deterministic
  • Conceptually, the regularity of genetic function
    and interaction is not due to hard-wired
    logical rules, but rather to the intrinsic
    self-organizing stability of the dynamical
    system.
  • Additionally, we may want to model an open system
    with inputs (stimuli) that affect the dynamics of
    the network.
  • From an empirical viewpoint, the assumption of
    only one logical rule per gene may lead to
    incorrect conclusions when inferring these rules
    from gene expression measurements, as the latter
    are typically noisy and the number of samples is
    small relative to the number of parameters to be
    inferred.

50
Linear Models
  • Basic model weighted sum of inputs
  • Simple network representation
  • Only first-order approximation
  • Parameters of the model
    weight matrix containing NxN interaction
    weights
  • Fitting the model find the parameters wji, bi
    such that model best fits available data

51
Underdetermined problem!
  • Assumes fully connected network need at least as
    many data points (arrays, conditions) as
    variables (genes)!
  • Underdetermined (underconstrained, ill-posed)
    model we have many more parameters than data
    values to fit
  • No single solution, rather infinite number of
    parameter settings that will all fit the data
    equally well

52
Solution 1 reduce N
  • Rather than trying to model all genes, we can
    reduce the dimensionality of the problem
  • Network of clusters construct a linear model
    based on the cluster centroids
  • rat CNS data (4 clusters) Wahde and Hertz
    (2000), Biosystems 55, 1-3129-136.
  • yeast cell cycle (15-18 clusters) Mjolsness et
    al.(2000), NIPS 12 van Someren et al.(2000)
    ISMB2000, 355-366.
  • Network of Principal Components linear model
    between characteristic modes of the data
  • Holter et al.(2001), PNAS 98(4)1693-1698.

53
Solution 2
  • Take advantage of additional information
  • replicates
  • accuracy of measurements
  • smoothness of time series
  • Most likely, the network will still be poorly
    constrained.
  • ? Need a method to identify and extract those
    parts of the model that are well-determined and
    robust

54
Danger of Overfitting
  • The linear model assumes every gene is regulated
    by all other genes (i.e. full connectivity)
  • This is the richest model of its kind
  • Danger to over fit the training data
  • Will result in poor prediction on new data
  • Far from reality only few regulators for each
    gene
Write a Comment
User Comments (0)
About PowerShow.com