Network motifs: discovery and applications

About This Presentation

Title:

Network motifs: discovery and applications

Description:

Title: Slide 1 Author: Guy Last modified by: Guy Created Date: 3/30/2005 10:24:13 AM Document presentation format: Custom Company: Zinman Other titles – PowerPoint PPT presentation

Number of Views:310

Avg rating:3.0/5.0

Slides: 84

Provided by: guy80

Category:

more less

Transcript and Presenter's Notes

Title: Network motifs: discovery and applications

1
Network motifs discovery and applications

Guy Zinman
Seminar in Bioinformatics
Technion, Spring 2005

2
Outline

Theory of network motifs
Definition, Algorithm
Application to E. Coli transcription network
The dynamic behavior of the motifs
Finding active subnetworks
Simulated annealing
experiments

3
Network
4
Network

Dictionary definition
A group or system of (electric) components and
connecting circuitry designed to function in a
specific manner.
Network is the backbone of a complex system
Studies of networks are similar to paleontology
learning about an animal
from its backbone

5
Network motifs

The notion of motif, widely used for sequence
analysis, is generalized to the level of
networks.
Network Motifs are defined as patterns of
interconnections that recur in many different
parts of a network at frequencies much higher
than those found in randomized networks.

6
Network motifs (cont.)

Such motifs are found in networks from
Biochemistry
Transcriptional regulation networks
Neurobiology
Neuron connectivity
Ecology
Food webs
Engineering
Electoronic circuits
World Wide Web

7
Network motifs (cont.)
8
(No Transcript)
9
Schematic view of motif detection

Occurrence of the FFL motif

10
Random vs designed/evolved features

Large networks may contain information about
design principles and/or evolution of the complex
system
Which features are there for a reason
design principles (e.g. feed-forward loops)
constraints (e.g. the all nodes on the Internet
must be connected to each other)
evolution, growth dynamics (e.g. network growth
is mainly due to gene duplication)

11
Network motifs

Alon U. et al Network Motifs Simple building
Blocks of Complex Networks Science, 2002.
Different motifs were found in different classes
of network.
The motif reflect the underlying processes that
generate each type of network.

12
Motifs detected

Two significant motifs
Both appeared numerous times in non-homologous
gene systems that perform diverse biological
functions

13
Motifs detected
14
Motifs detected
15
Main tasks for detecting network motifs

There are two main tasks in detecting network
motifs
(1) generating an ensemble of proper random
networks
(2) counting the subgraphs in the real network
and in random networks.

16
The algorithm

Starting point graph with directed edges
Scan for n-node subgraphs (n3,4) and count
number of occurrences
Compare to Erdos-Renyi randomized graph
(randomization preserves in-, out- and inout-
degree of each node)

17
All 3-node connected subgraphs

13 different isomorphic types of 3-node connected
subgraph
There are
199 4-node subgraphs,
9364 5-node subgraphs

18
Generation of randomized network

Algorithm A
Employ a Markov-chain algorithm based on starting
with the real network and repeatedly swapping
randomly chosen pairs of connections (X1 gt Y1,
X2 gt Y2 is replaced by X1 gt Y2, X2 gt Y1) until
the network is well randomized.
Switching is prohibited if the either of the
connections X1 gt Y2 or X2 gt Y1 already exist.

19
Generation of randomized network

Algorithm B
Each network was presented as a connectivity
matrix M, such that Mij 1 if there is a
connection directed from node i to node j, and 0
otherwise.
The goal is to create a randomized connectivity
matrix Mrand, which has the same number of
nonzero elements in each row and column as the
corresponding row and column of the real
connectivity matrix.

20
Generation of randomized network

Ri ?jMrand,ij ?jMij, Ci ?iMrand,ij ?iMij.
To generate the randomized networks, we start
with an empty matrix Mrand.
We then repeatedly randomly choose a row n
according to the weights pi Ri/?Ri and a column
m according to the weights qj Rj/?Rj.
If Mrand,nm 0, we set Mrand,mn 1.
We then set Rm Rm 1 and Cn Cn 1. If the
entry (m, n) was previously entered to the
randomized matrix, that is, ifMrand,mn 1, or if
m n, we choose a new (m, n).
This process is repeated until all Ri 0 and Cj
0.

21
Network motif detection

For each nonzero element (i,j)
Looping through all connected elements Mik 1,
Mki 1, Mjk 1, and Mkj 1. This is
recursively repeated with elements (i, k), (k,
i), (j,k), and (k, j) until an n-node subgraph is
obtained.
A table is formed that counts the number of
appearances of each type of subgraph in the
network, correcting for the fact that multiple
submatrices of M can correspond to one isomorphic
architecture owing to symmetries.

22
Network motif detection

This process is repeated for each of the
randomized networks. The number of appearances of
each type of subgraph in the random ensemble is
recorded, to assess its statistical significance.
The present concepts and algorithms are easily
generalized to nondirected or directed graphs
with several colors of edges and nodes,
multipartite graphs, and so forth.

23
Criteria for Network Motif Selection

The probability that it appears in a randomized
network an equal or greater number of times than
in the real network is smaller than P 0.01.

Reminder p-value the probability to get the
given result when the tested subject is not
affected by the experiment. if p-value lt 0.01
than the subject is considered to be affected
(the hypothesis is correct).
24
Run time complexity

The performance of this algorithm scales with the
total number of n-node subgraphs in the network.
The number of subgraphs and the algorithm runtime
also increase dramatically for subgraphs with n
5.

25
Sampling method for subgraph counting

Kashtan et al. Efficient sampling algorithm for
estimating subgraph concentrations and detecting
network motifs Bioinformatics, 2004.
This algorithm samples subgraphs in order to
estimate their relative frequency.
The runtime of the algorithm asymptotically does
not depend on the network size.
Surprisingly, few samples are needed to detect
network motifs reliably.

26
Subgraph sampling

Procedure description
pick a random edge from the network and then
expand the subgraph iteratively by picking random
neighboring edges until the subgraph reaches n
nodes.
For each random choice of an edge, in order to
pick an edge that will expand the subgraph size
by one, prepare a list of all such candidate
edges and then randomly choose an edge from the
list.

27
Subgraph sampling

Finally, the sampled subgraph is defined by the
set of n nodes and all the edges that connect
between these nodes in the original network.
Finding n-node subgraphs for n 5 is much easier
now.

28
Comparing sampling method results with exhaustive
enumeration
29
Transcriptional Regulation Network ofEscherichia
coli

Operon a group of contiguous genes that are
transcribed into a single mRNA molecule.
The transcriptional network is represented as a
directed graph each operon represents a node and
edges represent
direct transcriptional
interactions.

30
Application to E. Coli

Alon U. Network motifs in the transcriptional
regulation network of Eschersichia coli Nature
Genetics, 2002.
Database - RegulonDB
contains interactions between Transcription
Factors and the operons they regulate
Contains 577 interactions, 424 operons and 116
TFs
35 more TFs were added from literature
Previously described algorithm was run on this
data (1000 random networks)

31
Significant motifs

Feedforward loop
found in 22 different systems,
10 TFs and 40 operons
P-Val0.001

32
Concentration of FFL
33
Same in the yeast regulatory network

Young et. al Transcriptional Regulatory Networks
in Saccharomyces cerevisiae Science, 2002

Can you think of a possible role for this motif?

35
Dynamics for the FFL
36

Mangan et al., Structure and function of the
feed-forward loop PNAS, 2003.
Consider Sx and Sy as
Input signal small molecules
That activate or inhibit the
Activity of X and Y.

37
Coherency of FFLs

The FFL is coherent if the direct effect of the
general TF on the effector has the same sign.
85 of the FFL found were coherent.

38
Significant motif

Single Input Motif (SIM)
Single Transcription Factor controls set of
operons.
All operons in a SIM are regulated
with the same sign.
Appeared in 24 different systems

39
Dynamics for the SIM
40
Significant motif

Dense Overlapping Regulon (DOR) -
a layer of overlapping interactions between
operons and a group of TFs, much denser than this
structure would appear in an Erdos-Renyi random
graph

41
E. Coli network
42
Dor detection

Briefly
Define a (nonmetric) distance measure between
operon k and j.
The operons were clustered.
DORs corresponded to clusters with more than C10
connections, with ratio of connections to TF
greater than R2.

43
mFinder

A software tool for estimating subgraph
concentrations and detecting network motifs.
www.weizmann.ac.il/mcb/UriAlon/

44
Discussion

The concept of homology between genes based on
sequence motifs has been crucial for
understanding the function of uncharacterized
genes.
Likewise, the notion of similarity between
connectivity patterns in networks, based on
network motifs, may be helpful in gaining insight
into the dynamic behavior of newly identified
gene circuits.

45
Discussion

Until now we considered only transcription
interactions specifically manifested by
transcription factors that bind regulatory sites.
This transcriptional network can be thought of as
slow part of the cellular regulation network
(time scale of minutes).

46
Discussion

An additional layer of faster interactions, which
include interaction between proteins (often
subsecond timescale), contributes to the full
regulatory behavior.

47
Finding active subnetworks

Ideker, T. Discovering regulatory and signaling
circuits in molecular interaction networks
Bioinformatics, 2002.
Integrates protein-protein and protein-DNA
interactions with mRNA expression data, in a goal
of better understanding the molecular mechanism
of the observed gene expression.
Uses a method of searching the network to find
active subnetwork, i.e., connected sets of
genes with unexpectedly high levels of
differential expression, under one or more
perturbation.

48
Methodology

Using a molecular interaction network to analyze
changes in expression over 20 perturbations to
the yeast galactose utilization (GAL) pathway.
Determining which conditions significantly
affected the gene expression in each active
subnetwork.

49
The means

Combining a rigorous statistical measure for
scoring subnetworks with a search algorithm for
identifying subnetworks with high score.

50
Basic z-score calculation

To rate the biological activity of a particular
subnetwork, begin with assessing the significance
of differential expression for each gene.
The error model provided by VERA (Variability and
ERror Assessment) program.
VERA estimates the parameters of a statistical
model using the method of maximum likelihood.
Output p-values (pi), representing the
significance of expression change.

51
Basic z-score calculation

Each pi is converted to z-score
zi F-1(1-pi)
F-1 The inverse normal CDF (cumulative
distribution function)
Smaller p-values correspond to larger z-score

52
Scoring of Subnetworks

Aggregate z-score for an entire subnetwork A of k
genes
Notice
zA will also be distributed according the
standard normal (because the variables are
independent).
Subnetworks of all sizes are comparable under
this scoring system, independent of k.
A high zA indicates a biologically active
subnetwork.

53
Calibrating z against background distribution

Randomly sample gene sets of size k using a Monte
Carlo approach, compute their scores zA, and
calculate standard deviation parameters for each
k.
The corrected subnet score SA is

54
Scoring an example subnetwork
SA
55
Scoring over multiple conditions

Starting with a matrix of p-values (genes vs.
conditions) and corresponding z-scores.
Producing m different aggregate scores, one for
each condition, and sorting them.
Finding the probability that at least j of the m
conditions had scores above zA(j)
Monte Carlo technique is used for estimating the
mean and the standard deviation from random gene
set of size k.

56
Scoring over multiple conditions
57
Finding the maximal scoring

Problem
Finding the maximal scoring connected subgraph
is NP-hard.

58
The Difficulty in Searching Global Optima
Global maxima
Local maxima
Local maxima
significance score
subnetwork
59
Rugged landscapes and local maxima problem
60
Monte Carlo random search

Known also as the Metropolis algorithm
A simulation technique for conformational
sampling and optimization based on a random
search for energetically favourable conformations
Finding global (or at least good local) maximum
by biased random walk may take some luck

61
Global maxima
Local maxima
Local maxima
significance score
subnetwork
62
Climbing mountains easier simulated annealing
In order to get out from a local maxima one needs
to allow for locally unfavorable moves
Global maxima
Local maxima
Local maxima
significance score
subnetwork
63
Introduction to simulated annealing

Simulated annealing (Kirkpatrick et al.,1983).
Mathematical method developed together with
Monte Carlo techniques to avoid false maxima
Method simulates slow cooling of a solidifying
solution to form a single crystal
Origin
The annealing process of heated solids
Intuition
By allowing occasional descent in the search
process, we might be able to escape the trap of
local maxima.
In our context
Allow nodes to be removed from the subsets, even
if the resulting subnetworks score is a (little)
lower.

What can be an adverse effect of this method?

65
Consequences of the Occasional Ascents
adverse effect
desired effect
Might pass global optima after reaching it
Help escaping the local optima.

So the result is not guaranteed to be optimal.
But here we dont care- any high-scoring
subnetwork is suspected to be biologically
significant.

66
Climbing mountains easier simulated annealing

Defining a temperature function.
Increasing the effective temperature means
higher probability of accepting moves that
increase the energy Thus, the likelihood of
escaping from a local maximum may be tuned.

67
Control of Annealing Process
Acceptance of a search step (Metropolis
Criterion)
Assume the performance change in the search
direction is .
Always accept a ascending step, i.e.
Accept a descending step only if it pass a random
test, i.e. with probability p
68
Control of Annealing Process
Cooling Schedule
T, the annealing temperature, is the parameter
that control the frequency of acceptance of
decending steps.
We gradually reduce temperature T(k) between 1
and 0.
The probability to accept declining steps is
proportional!
69
In our context

Input
Graph G (V,E) of molecular interactions,
N number of iteration
Ti temperature function which decreases from
Tstart to Tend
Output
Gw Subgraph of G
Initialize Gw by setting each node to an
active/inactive state randomly (with p ½).

70
Simulated Annealing Algorithm

For i 1 to N DO
Randomly pick a node v from V and toggle its
state.
Compute the score si for the working subgraph Gw
IF (si gt si-1), keep v toggled
ELSE keep v toggled with probability

71
Heuristics for improved annealing

Look for M active subnetworks simultaneously.
M is a user defined variable
Maintaining multiple components can improve the
efficiency of annealing.
Can be done by
multiple annealing runs
Or by
extending the annealing approach to maintain a
graph state vector of the top M component scores.

72
Galactose metabolic flow
73
Results
Experiment 1 small network of 362 interaction. 2
conditions of the expression data gal80 deletion
vs. WT. 5 significant subnetworks were found,
including 41 out of 77 significant genes.
74
Score and temperature vs. number of iteration

Temperature cooling is geometric from 1 to 0.
N
By the end of the run, each of the 5 subnetworks
reach a (local) maximum.

75
Evaluation of the subnetworks
Z-score distribution of the top 5 active networks.
Z-score distribution with real data
Z-score distribution with random data ( scrambled
nodes z-scores )
76
Experiment 2
Results

Network consists of all known interactions7145
protein-protein interactions from BIND317
regulation interactions from TRANSFAC
Expression data includes 20 perturbations to
genes in the Galactose pathway.
7 active subnetworks found. The biggest consists
of 340 genes.
Repeating annealing with the network above,
generated 5 significant sub-sub-networks.
All results were evaluated with methods similar
to what we have seen.

77
(No Transcript)
78
Discussion
79
Cytoscape

www.cytoscape.org

80
Summary

Theory of network motifs
Definition, Alogorithm
Application to E. Coli transcription network
The dynamic behavior of the motifs
Finding active subnetworks
Simulated annealing
2 experiments

81
References

S Shen-Orr, R Milo, S Mangan U Alon,
Network motifs in the transcriptional regulation
network of Escherichia coli.
Nature Genetics, 3164-68 (2002).
R Milo, S Shen-Orr, S Itzkovitz, N Kashtan, D
Chklovskii U Alon,
Network Motifs Simple Building Blocks of Complex
Networks
Science, 298824-827 (2002).
Ideker, T., Ozier, O., Schwikowski, B., and
Siegel, A.
Discovering regulatory and signaling circuits in
molecular interaction networks.
Bioinformatics 18 S233 (2002).

S. Mangan and U. Alon
Structure and function of feed forward loop
network motif.
PNAS 10011980-11985 (2003).
N. Kashtan, S. Itzkovitz, R. Milo and U. Alon
Efficient sampling algorithm for estimating
subgraph concentration and detecting network
motifs Bioinformatics 201746-175 (2004).
S. kirkpatrick, C. D. Gelatt and M. P. Vecchi
Optimization by simulated annealing
Science 220671-680 (1983).

83
Thank you

Write a Comment

User Comments (0)