Title: Semesterplanung
1Semesterplanung
25.11. Ass 6 30.11. Qualitätsanalyse in
PI networks Bayesche Statistik Ass 7 2.12.
7.12. Phylogenie 9.12. 14.12. Genome-Rearrangeme
nt Ass 8 16.12. Weihnachtsvorlesung 11.1 V1
9 Einleitung metabolische Netzwerke 13.1 V20 Ex
treme Pathways Ass. 9 18.1 V21 Elementarmodena
nalyse 20.1. V22 Integration metabol. regul.
Netzwerke Ass. 10 25.1. V23 Modellierung von
Signaltransduktions-Kaskaden 27.1. Modellierung
von Signaltransduktions-Kaskaden
(II) 1.2. chemical genomics 3.2. V12
pharmacogenomics 8.2. Integrative
Netzwerkanalyse 10.2. Zusammenfassung für
Klausur Klausurtermin wann?
2V11 modules in cellular networks wrap up
traditional biology (reductionist approach)
produces long lists lists of genes in
genomes lists of transcripts in different cell
types lists of protein interactions in model
organisms ? genomes, transcriptomes, proteomes,
interactomes, databases of genetic
perturbations, and corresponding phenotypes How
to make sense of it all? Will meaningful
hypotheses and discoveries emerge? systems
biology Formalized mathematical modeling still
room for reductionism simulations ? test
hypothesis from quantitative measurements system
s biology experiments
Gagneur et al. Genome Biology 5, R57 (2004)
3Strategies to detect communities in networks
Community stands for module, class, group,
cluster, ... Define community as a subset of
nodes within the graph such that connections
between the nodes are denser than connections
with the rest of the network. The detection of
community structure is generally intended as a
procedure for mapping the network into a tree
(dendogram in social sciences).
Leaves nodes branches join nodes or (at higher
level) groups of nodes.
Radicchi et al. PNAS 101, 2658 (2004)
4Agglomerative algorithms for mapping to tree
Traditional method to perform this mapping
hierarchical clustering. For every pair i,j of
nodes in the network compute weight Wij that
measures how closely connected the vertices
are. Starting from the set of all nodes and no
edges, links are iteratively added between pairs
of nodes in order of decreasing weight. In this
way nodes are grouped into larger and larger
communities, and the tree is built up to the
root, which represents the whole network. ?
agglomerative algorithm
Here 3 communities of densely connected vertices
(circles with solid lines) with a much lower
density of connections (gray lines) between them.
Girven, Newman, PNAS 99, 7821 (2002) Radicchi et
al. PNAS 101, 2658 (2004)
5Possible definitions of the weights
(1) number of node-independent paths between
vertices 2 paths that connect the same pair of
vertices are said to be node-independent if they
share none of the same vertices other than their
initial and final vertices. (2) edge-independent
paths. It has been shown that the number of
node-independent (edge-independent) paths between
2 vertices i and j in a graph is equal to the
minimum number of vertices (edges) that must be
removed from the graph to disconnect i and j from
one another (Menger, 1927). ? these numbers are a
measure of the robustness of the network to
deletion of nodes (edges).
Girven, Newman, PNAS 99, 7821 (2002)
6Possible definitions of the weights (II)
(3) count total number of paths that run between
them (not just those that are node- or
edge-independent). Because the number of paths
between any 2 vertices is either 0 or infinite,
one typically weighs paths of length l by a
factor ?l with small ? so that the weighted count
of number of paths converges. Thus long paths
contribute exponentially less weight than short
paths. These node- or edge-dependent path
definitions for weights work okay for certain
community structures, but show typical
pathologies.
Girven, Newman, PNAS 99, 7821 (2002)
7Problems
In particular, both counting of node- and
edge-independent paths has a tendency to separate
single peripheral vertices from the communities
to which they should rightly belong. If a
vertex is, e.g., connected to the rest of a
network by only a single edge then, to the extent
that it belongs to any community, it should
clearly be considered to belong to the community
at the other end of that edge. Unfortunately,
both the numbers of independent paths and the
weighted path counts for such vertices are small
and hence single nodes often remain isolated from
the network when the communities are
constructed. This and other pathologies, make
the hierarchical clustering method, although
useful, far from perfect.
Girven, Newman, PNAS 99, 7821 (2002)
8New strategy Use betweenness as definition of
weights
Focus on those edges that are least central, that
are between communities. Define edge
betweenness of an edge as the number of shortest
paths between pairs of vertices that run along
it. If there is more than one shortest path
between a pair of vertices, each path is given
equal weight such that the total weight of all of
the paths is 1. If a network contains
communities or groups that are only loosely
connected by a few intergroup edges, then all
shortest paths between different communities must
go along one of these few edges. ? the edges
connecting communities will have high edge
betweenness. By removing these edges we separate
groups from one another and so reveal the
underlying community structure of the graph.
Girven, Newman, PNAS 99, 7821 (2002)
9GN Algorithm
1. Calculate betweenness for all m edges in a
graph of n vertices (can be done in O(mn)
time). 2. Remove the edge with the highest
betweenness. 3. Recalculate betweenness for all
edges affected by the removal. 4. Repeat from
step 2 until no edges remain. Because step 3 has
to be done for all edges, the algorithm runs in
worst-case time O(m2n).
Girven, Newman, PNAS 99, 7821 (2002)
10Application of GirvanNewman Algorithm
1.
(a) The friendship network from Zachary's karate
club study. Nodes associated with the club
administrator's faction are drawn as circles,
those associated with the instructor's faction
are drawn as squares. (b) Hierarchical tree
showing the complete community structure for the
network calculated by using the algorithm
presented in this article. The initial split of
the network into two groups is in agreement with
the actual factions observed by Zachary, with the
exception that node 3Â is misclassified. (c)
Hierarchical tree calculated by using
edge-independent path counts, which fails to
extract the known community structure of the
network.
Girven, Newman, PNAS 99, 7821 (2002)
11Divisive algorithms for mapping to tree
Reverse order of construction of the tree than
for agglomerative algorithms start with the
whole graph and iteratively cut the edges ?
divide network progressively into smaller and
smaller disconnected subnetworks identified as
the communities. Crucial point how to select
those edges to be cut. Example Girven Newman
algorithm (GN) Problem of GN algorithm requires
the repeated evaluation of a global property, the
betweenness, for each edge whose value depends on
the properties of the whole system. ? becomes
computationally very expensive for networks with
e.g. ? 10000 nodes.
Radicchi et al. PNAS 101, 2658 (2004)
12Faster algorithm
Introduce divisive algorithm that only requires
the consideration of local quantities. Need
quantity that can single out edges connecting
nodes belonging to different communities. Conside
r edge-clustering coefficient number of
triangles to which a given edge belongs divided
by the number of triangles that might potentially
include it, given the degrees of the adjacent
nodes. For the edge-connecting node i to node j,
the edge-clustering coefficient is
where zi,j(3) is the number of triangles built on
that edge and min(ki 1), (kj 1) is the
maximal possible number of them. 1 is added to
zi,j(3) to remove degeneracy for zi,j(3) 0.
Radicchi et al. PNAS 101, 2658 (2004)
13Faster algorithm
Edges connecting nodes in different communities
are included in few or no triangles and tend to
have small values of Ci,j(3). On the other hand,
many triangles exist within clusters. By
considering higher order cycles one can define
coefficients of order g
where zi,j(g) is the number of cyclic structures
of order g the edge (i,j) belongs to, and
si,j(g) is the number of possible cyclic
structures of order g that can be built given the
degrees of the nodes. Define, for every g, a
dectection algorithm that works exactly as the GN
method with the difference that, at every step,
the removed edges are those with the smallest
value of Ci,j(g). By considering increasing
values of g, one can smoothly interpolate between
a local and a nonlocal algorithm.
Radicchi et al. PNAS 101, 2658 (2004)
14Comparison with GN method
Test of the efficiency of the different
algorithms in the analysis of the artificial
graph with four communities. Here N 128 and pin
is changed with pout to keep the average degree
equal to 16. (Left) Strong definition fraction
of successes for the different algorithms
compared with the analytical probability that
four communities are actually defined. (Right)
Weak definition in addition to the same
quantities plotted in Left, here we report, for
every algorithm, the fraction f of nodes not
correctly classified.
Radicchi et al. PNAS 101, 2658 (2004)
15Comparison with GN algorithm
Plot of the dendrograms for the network of
college football teams, obtained by using the GN
algorithm (Left) and our algorithm with g 4
(Right). Different symbols denote teams
belonging to different conferences. In both
cases, the observed communities perfectly
correspond to the conferences, with the exception
of the six members of the Independent
conference, which are misclassified.
Radicchi et al. PNAS 101, 2658 (2004)
16Simple network clustering based on shortest-path
distance
Aim compute modular organization of cellular
networks controlling specific biological
responses. Ideas (i) the shortest path between
any two vertices (proteins) is probably the most
relevant for functional associations (ii) each
vertex in a network has a unique profile of
shortest-path distances through the network to
every other vertex (iii) module comembers are
likely to have similar (clustered)
shortest-path-distance profiles.
Rives Galitski PNAS 100, 1128 (2003)
17Network clustering
Yeast PI network 4079 proteins, 6761 protein
interactions. MIPS 133 signaling proteins, 64
have ? 1 interactions with another signaling
protein. Algorithm assign length 1 to each edge
in protein interaction network. Compute
all-pairs shortest-path distance matrix contains
length of the shortest path (distance) d between
every pair of vertices in the network. Convert
into association matrix using 1/d2 . ?
Associations range from 0 to 1. Emphasizes local
association in subsequent clustering. Use
hierarchical agglomerative average-linkage
clustering.
Rives Galitski PNAS 100, 1128 (2003)
18Clustering of yeast signaling protein interaction
network
A symmetrical matrix of 64Â proteins of the
MIPS-database signaling category was clustered
identically in both dimensions. The cluster tree
is not shown. Each row or column represents a
protein. Each feature is the intersection of two
proteins and is a grayscale representation of
pairwise protein association). Columns to the
right of the clustered network represent
MIPS-defined signaling pathways P, polarity-PKC
R, Ras H, HOG M, mating/filamentation MAPK
(mfMAPK). White bars in the MIPS-pathway columns
indicate protein members of the pathway.
Ras-pathway proteins form a single cluster. 3
MAPK pathways as clusters.
Rives Galitski PNAS 100, 1128 (2003)
19Network clustering of high-throughput data sets
HTS-Data usually has high (50) false-positive
error frequencies! Also, many binary
interactions may not occur within modules.
Because interacting proteins usually localize
in the same subcellular compartment one may
integrate interaction and localization data for
the identification of modules. Single proteins
with many interactions in Y2H screens (hubs)
nucleate large clusters that are not modules.
Rives Galitski PNAS 100, 1128 (2003)
20examples of derived clusters
Clustering of the yeast nuclear-protein network
derived from high-throughput interaction and
localization data. (A) Examples of clusters
representing module rudiments are labeled. The
cluster tree is not shown. Arrows indicate
high-connectivity hub proteins. (B) Example
clusters are shown in detail. Cluster comembers
participating in some common structure or
function have large bold labels.
Rives Galitski PNAS 100, 1128 (2003)
21Properties of hubs
All hub proteins indicated bind gt 90 proteins in
the global Y2H network. The proteins bound by
these hubs are randomly distributed in cellular
compartments. The nuclear-localized proteins
bound by these hubs form the 4 largest
clusters. Proteins bound by high-connectivity
hubs will have few or no interactions among
themselves if they are not functionally
associated (hub-and-spokes structure). ?
proteins bound by each high-connectivity hub are
not functionally associated with each other, and
their clusters do not represent modules.
Rives Galitski PNAS 100, 1128 (2003)
22connectivity ? neighborhood clustering
Global protein connectivity versus neighborhood
clustering. Each protein in the global protein
net-work is plotted by its connectivity, k, and
its neighborhood clustering, C. Arrows indicate
high-connec-tivity proteins shown in Fig. 2A.
The 4 high-connectivity hubs are among 15
outliers. Although these proteins have
exceedingly high connectivity, they almost
completely lack neighborhood clustering. ?
useful criterion to distinguish modules from
nonmodules?
Rives Galitski PNAS 100, 1128 (2003)
23Application to biological-response networks
Incorporate network clustering into 3-step
process to study complex biomolecular systems ?
generates modular network-structure model (i)
compile known and suspected components of the
response network (from databases, expression
profiling, proteomics, genetic screens,
metabolite profiles ...) (ii) cluster network
based on interactions between vertices. Edges can
represent any type of interaction. (iii)
abstract modular network-structure model showing
modules. Cluster 90 filamentation-network
proteins that have ? 1 interaction with other
filamentation proteins.
Rives Galitski PNAS 100, 1128 (2003)
24Clustering of the yeast filamentation network
Proteins of the yeast filamentation network were
clustered. A tree-depth threshold was set. Tree
branches with ? 3 leaves (clusters with ? 3
proteins) below the tree threshold are shown.
Bullets and large bold labels indicate proteins
of highest intracluster connectivity.
Rives Galitski PNAS 100, 1128 (2003)
25Modular model of the yeast filamentation network
Clusters indicated in Fig. 4 are abstracted as
modules. All intermodule paths in the
filamentation network are indicated as black
lines with the interacting proteins at the
termini. A gray line connecting the Ras and
protein kinase A modules was added to indicate a
connection mediated by the small molecule cAMP.
Rives Galitski PNAS 100, 1128 (2003)
26Filamentous growth-response of yeast cells
(A) Wild-type yeast-form cells grown in SHAD
liquid medium. (B) Wild-type filamentous-form
cells grown for 10 h on SLAD agar medium. For
budding yeast diploid cells, low availability of
ammonium and a solid growth substrate trigger a
dimorphic switch to filamentous-form growth,
characterized by cell elongation, unipolar distal
budding, adhesion and invasion. Prominent
involved pathways cAMP-dependent protein kinase,
fMAP kinase, Cdc28 kinase activity,
ubiquitination by SCF ubiquitin-ligase. Here
investigate next step, ubiquitin-dependent
degration by 26S proteasome.
Prinz et al. Genome Research 14, 380 (2004)
27Integrated filamentation network
The filamentation network includes proteins
(rectangular nodes) implicated in filamentous
growth by expression profiling or known
phenotypes, and metabolites (triangular nodes)
that are either substrates or products of
filamentation-protein enzymes. N ot shown are
filamentation proteins with neither a
proteinmetabolite interaction nor a
proteinprotein interaction with another
filamentation protein. Blue edges
proteinprotein interactions. Green edges
proteinmetabolite interactions. Each gene node
is colored based on its expression log-ratio.
Shades of red indicate higher expression in the
filamentous form relative to the yeast form
shades of blue indicate the opposite response
white indicates no difference.
Prinz et al. Genome Research 14, 380 (2004)
28Collective Functions of Network Clusters
If clusters in an integrated network represent
biological modules, the clusters should have
collective functions in specific biological
processes. Specific biological-process gene
annotations (taken from GO database) are found
overrepre-sented in specific filamentation-network
clusters. Significance -log (cumulative
probability of the observed data and all more
extreme probabilities).
Prinz et al. Genome Research 14, 380 (2004)
29Modular abstraction of the filamentation network
Network clusters are abstracted as circular
"module nodes." The area of each module node is
proportional to the number of member molecules.
The color of each module node reflects the
average expression log-ratio of member genes.
Each module node is assigned the name of the
member node of highest intracluster degree (the
highest number of interactions with cluster
co-members) most are proteins, some are
metabolites.
Prinz et al. Genome Research 14, 380 (2004)
30Quantitative identifcation of network clusters
Nodes of the filamentation network were
iteratively joined into clusters. (A) A cluster
was defined as a joined group containing at least
3 protein nodes. The number of clusters is
plotted as a function of join number. (B) The
selection of nodes/clusters to join was based on
average-linkage Manhattan distance of node
shortest-paths-distance profiles. This distance
metric is plotted as a function of join
number. The arrows indicate join 535
corresponding to the highest join number with the
highest number of clusters.
Prinz et al. Genome Research 14, 380 (2004)
31RPN12, GRR1, and CDC28 modules and their
components
Modules (A), and their respective components (B)
with collective functions in cell-cycle control
and ubiquitin-dependent proteolysis are shown.
Prinz et al. Genome Research 14, 380 (2004)
32growth behavior of rpn4? mutants
rpn4? mutants show Cln1-dependent
hyperelongation, and cell type-independent agar
adhesion. (A) Diploid wild-type, rpn4? , cln1? ,
and rpn4? cln1? strains were grown on SLAD agar
plates and photographed after 9 h. (B) Patches
of strains of the indicated cell types and
genotypes were subjected to a wash-off assay of
adhesion. The plate was imaged before and after
washing with water.
Prinz et al. Genome Research 14, 380 (2004)
33Stabilization of Cln1 protein in rpn4? mutants
(A) Northern blot analysis of total RNA from
wild-type and rpn4 ? strains, and a cln1 ?
strain. The blot was probed consecutively with
probes for CLN1 and RPN12. The asterisk in the
CLN1 blot indicates a cross-hybridizing band that
also serves as a loading control. (B) Western
blot analysis of Cln1 protein in diploid
wild-type and rpn4 ? strains carrying HA-tagged
CLN1, and a no-tag wild-type control strain.
Protein extracts were prepared from cells grown
for 10 h on SLAD agar plates. Pgk1 protein levels
served as a loading control. (C) Cln1-HA protein
was immunoprecipitated from an rpn4 ? strain.
Aliquots of the immunoprecipitate were incubated
with calf-intestine phosphatase (CIP), or without
CIP, and analyzed by Western blotting. (D)
myc-tagged Cln1 protein was immunoprecipitated in
diploid wild-type and rpn4 ? strains, and a
no-tag control strain. All strains had a
multicopy plasmid expressing HA-tagged ubiquitin.
Immunoprecipitates were analyzed by gel
electrophoresis and immunoblotting with anti-HA
antibody to detect ubiquitin conjugates. The blot
membrane was stripped and reprobed with anti-myc
antibodies to detect the immunoprecipitated Cln1.
Prinz et al. Genome Research 14, 380 (2004)
34Non-random interaction among filamentation
proteins
a Interaction data include all protein- protein
interactions plus all metabolic interactions.
Each analysis used either biological interaction
data or 10 data sets in which interactions were
as signed randomly to pairs of proteins. b Each
analysis included a list of either the 1026
filamentation proteins, or the 873
expression-implicated proteins, or 10 sets of
random proteins. c The number of proteins in the
list that has at least one interaction with
another protein in the list d The number of
direct interactions between pairs of proteins in
the list. e Node degree of incident edges of
the node. Mean node degree ratio of 2 of
interactions to of interacting proteins.
Prinz et al. Genome Research 14, 380 (2004)
35Expression change within clusters
RPN12, GRR1, and CDC28 modules and their
components. Modules (A), and their respective
components (B) with collective functions in
cell-cycle control and ubiquitin-dependent
proteolysis are shown. Graphic representations
are as in Figures 2 and 3.
... table continues ...
Prinz et al. Genome Research 14, 380 (2004)
36Biological Insights from modular network
abstraction
(1) In an integrated network, data on molecules
and interactions shows clustered organization
that can be identified quantitatively (2) Cluster
co-member genes show significant coordination of
expression change, as expected for genes involved
in a collective function. (3) Cluster go-member
genes show significant overrepresentation of
biological-process annotations, indicating
collective function. (4) The modular network
abstraction intuitively stimulates testable
biological insights on complex biological
properties.
Prinz et al. Genome Research 14, 380 (2004)
37Evolutionary conservation of motif
constituentsin the yeast protein interaction
network
Question why are some cellular components
conserved across species but others evolve
rapidly? Many biological functions are carried
out by the integrated activity of highly
interacting cellular components functional
modules Motifs topologically distinct
interaction patterns with complex networks may
represent the simplest building blocks of
modules. Here, test the correlation between a
proteins evolutionary rate and the structure of
the motif it is embedded in ? identify all 2-,
3-, 4-node motifs and some 5-node motifs
Wuchty, Oltvai, Barabasi, Nature Gen 35, 176
(2003)
38shared components
Data from DIP database, 3183 interacting yeast
proteins if there is evolutionary pressure to
maintain specific motifs, their components should
be evolutionarily conserved and have identifiable
orthologs in other organisms. Study conservation
of 678 S. cerevisae proteins with an ortholog in
each of 5 higher eukaryotes Arabidopsis
thaliana, C. elegans, Drosophila melanogaster,
Mus musculus, Homo sapiens.
Algorithm to detect all n-node subgraphs scan
all rows of the adjacency matrix M. For each
non-zero element (i,j) representing a link, scan
through all neighbors of (i,j) until a specific
n-node subgraph is detected.
Wuchty, Oltvai, Barabasi, Nature Gen 35, 176
(2003)
39shared components
motifs of a given kind in the yeast PI
network fraction of original yeast motifs that
is evolutionary fully conserved each of their
protein components belongs to 678 orthologous
proteins fraction of motifs that is fully
conserved for the random ortholog
distribution column 4 / column 5 less than 5
of 2 (linear 3-component proteins) are
completely maintained
47 of the fully conserved pentagons (11) are
fully conserved!
Wuchty, Oltvai, Barabasi, Nature Gen 35, 176
(2003)
40topology ? conservation of individual proteins
Larger motifs tend to be conserved as a whole,
where each component has an ortholog.
E.g. less than 1 of the fully connected pentagon
motifs disappeared completely, for 69 of them,
each of the subunits had an ortholog in
human. Clear correlation between the
conservation rate and the degree of saturation of
a motif. Participation in motifs substantially
influences the evolutionary conservation
of specific components.
Wuchty, Oltvai, Barabasi, Nature Gen 35, 176
(2003)
41clustering coefficient ? conservation of proteins
?
Wuchty, Oltvai, Barabasi, Nature Gen 35, 176
(2003)
From 65 (C 0) to 84 (C 1) of neighbors of a
human ortholog were also human orthologs (filled
circles). The conserved fraction of the
nonorthologous proteins neighborhood is markedly
smaller. Enrichment ration between the
percentages of orthologous proteins at distance d
from an ortholog in the natural and the random
orthologous sets. d shortest distance between i
and target protein measured along network
links. Proteins that interact directly with an
ortholog at d1 have a 50 higher chance of
conservation that at random!
42function ? conservation?
Examine if the specific function of the yeast
proteins within motifs affects their rate of
evolutionary conservation. Assign each motif to
functional class to which its protein components
belong. Larger motifs have a notable functional
homogeneity - for 95 of fully connected yeast
pentagon motifs (11) all components shared at
least one common functional class, - only 10 of
the 2-node motifs (1) are functionally
conserved. Identify type and number of
evolutionary fully conserved motifs of each
functional class in S.cerevisae, for those that
have an ortholog in humans.
Wuchty, Oltvai, Barabasi, Nature Gen 35, 176
(2003)
43shared components
For 3 functional classes (subcellular
localization, protein fate, transcription) each
of the 11 studied motifs is considerably
overrepresented. Some other functional classes
have only 1 or 2 characteristic motifs. No
motifs are found for transposable elements,
energy, cellular fate, cellular communi-cation,
cellular rescue, cellular organization,
metabolism, protein activity, protein binding
Wuchty, Oltvai, Barabasi, Nature Gen 35, 176
(2003)
44shared components
For 3 functional classes (subcellular
localization, protein fate, transcription) each
of the 11 studied motifs is considerably
overrepresented. Some other functional classes
have only 1 or 2 characteristic motifs. No
motifs are found for transposable elements,
energy, cellular fate, cellular communi-cation,
cellular rescue, cellular organization,
metabolism, protein activity, protein binding
The fully connected motifs (9 and 11) tend
to identify protein complexes. However, the mere
existence of protein complexes cannot explain the
observed trends towards higher conservation rates
of the highly connected motifs.
Wuchty, Oltvai, Barabasi, Nature Gen 35, 176
(2003)
45shared components
Shared components proteins or groups of
proteins occurring in different complexes are
fairly common A shared component may be a small
part of many complexes, acting as a unit that is
constantly reused for ist function. Also, it may
be the main part of the complex e.g. in a family
of variant complexes that differ from each other
by distinct proteins that provide functional
specificity. Aim identify and properly
represent the modularity of protein-protein
interaction networks by identifying the shared
components and the way they are arranged to
generate complexes.
Wuchty, Oltvai, Barabasi, Nature Gen 35, 176
(2003)
46Summary
Modules are key intermediate level in the
organizational hierarchy of cells. Biological
Module loose association of preferred molecular
interaction partners that interact to perform a
collective function. Modules can be identified
based on structural characteristics such as their
closely connected members and interfacesto other
modules. There is evidence that modules are
evolutionarily conserved. Module co-members tend
to be coordinately expressed.