Title: Complex%20networks%20in%20nature
1Complex networks in nature
PHYSBIO 2007
Imre Derényi
Dept. of Biological Physics, Eötvös University,
Budapest
Complex systemsare often made ofmany
non-identical elements connected by diverse
interactions.
networks
graphs
2Outline
- Lectures 1-3Graph theoretical basics, examples
of real networks, basic models (Erdos-Rényi,
small world, scale free graphs) and their
properties, examples. - Lecture 4Dynamics on networks error and attack
tolerance, disease spreading, metabolic
networks. - Lecture 5Network motifs and communities.
3Graph theory basics
A graph, usually denoted as G(V,E), consists of a
set of vertices (or nodes) V together with a set
of edges (or links) E. Every edge connects its
two endvertices. The order of a graph (denoted by
N) is the number of its vertices.
A graph is a simple graph if it has no multiple
edges or loops.If not stated otherwise, a graph
is usually assumed to be simple.
4Two vertices are adjacent (or neighbors of each
other) if there is an edge connecting them.
Every graph can be represented by its adjacency
matrix A, which is an N?N symmetric binary
matrix with elements Aij Aji 1 if vertex i is
adjacent to vertex j and Aij Aji 0 otherwise.
The degree ki of vertex i is the number of its
neighbors (or edges)
The sum of the degrees of all the vertices is
twice the number M of the edges of the graph
5A sequence of adjacent vertices is a walk.A walk
is closed if its first and last vertices are the
same, and open if they are different.
A walk in which no edge occurs more than once is
known as a trail.A closed trail is called tour
or circuit.
A trail or circuit is Eulerian if it uses all
edges precisely once.
A walk in which no vertex occurs more than once
is known as a path.A cycle can be defined as a
closed path.
A path or cycle is Hamiltonian if it uses all
vertices exactly once.
Two vertices are reachable from each other, if
there exists a path between them.
A graph is connected, if any of its vertices can
be reached from any other.
6A component of a graph is defined as a maximal
connected subgraph.
A subgraph of a graph G is a graph whose vertices
and edges are subsets of those of G.
A subgraph of G is a spanning subgraph, or
factor, if it contains all the vertices of G.
k-cliques are complete subgraphs of order (size)
k.
Cliques are maximal complete subgraphs.
A tree is an acyclic connected graph.It has N-1
edges.
7The length l of a walk is the number of edges
that it uses.
The distance d(i, j) between two (not necessary
distinct) vertices i and j is the length of a
shortest path between them.
The eccentricity e(i) of a vertex i is its
maximum distance from any other vertex
The diameter D of a graph is its maximum
eccentricity
The radius R of a graph is its minimum
eccentricity
The characteristic path length (sometimes also
called diameter) is defined as
8Extensions
If weight or cost is assigned to each edge, then
we get a weighted graph.In the calculation of
lengths the weights are taken into account.
If the edges are directed, then we have a
directed graph or digraph. In-neighbors and
out-neighbors, and in-degrees and out-degrees can
be distinguished.
In a hypergraph more than two vertices can be
connected by hyperedges.
9Random graphs
Graph theory was invented by Euler in the 18th
century.The early work was concentrated on small
graphs with a high degree of regularity Random-gr
aph theory was introduced by Erdos and Rényi in
the late 1950s.As complex networks often appear
to be random, random-graph theory appears to be a
useful tool in the study of large complex
networks.
10The Erdos-Rényi model
- Original modelConnect N nodes by M edges
randomly. - Alternative modelConnect every pair of the N
nodes with probability p.
Pál Erdos (1913-1996)
The two models (or ensembles) become equivalent
in the thermodynamic limit
The average degree of a node is
p1/6
11The Erdos-Rényi model
Degree distribution
The characteristic path length can be estimated
from
resulting in
12The greatest discovery of Erdos and Rényi was
that many network properties appear suddenly as p
is increased.
As an example let us consider the occurrence of
an arbitrary subgraph consisting of n vertices
and m edges.
Their number can be estimated as
Thus the critical probability of appearance is
13A giant (percolating) component also appears
suddenly.
This can easily be understood with the help of a
branching process
- Let us start to grow a component from a seed
vertex by randomly selecting its neighbors from
the remaining N-1 vertices with probability p. - Let us repeat this process with the newly
selected vertices as seeds, over and over again. - The branching process stops when no new neighbor
is selected.
If p lt pc 1/N then the expected number of new
neighbors is smaller than the number of seeds,
and the branching process quickly comes to a halt.
If , on the other hand, p gt pc 1/N then the
component can easily grow to infinity.
The giant component has a tree-like structure.
14Are complex networks really random?
No!
15Watts-Strogatz model
Watts and Strogatz, Nature 393, 440 (1998)
16Watts-Strogatz model
n nodes per block
Optimal n
17WWW
World Wide Web
Nodes WWW documents Links URL links
800 million documents (S. Lawrence, 1999)
ROBOT collects all URLs found in a document
and follows them recursively
R. Albert, H. Jeong, A-L Barabasi, Nature, 401
130 (1999)
18WWW-power
What can we expect for ER and WS networks?
?k? 6 NWWW 109
P(k500) 10-99 ? N(k500)10-90
The results Scale-free network
?out 2.45
? in 2.1
P(k500) 10-6 ? N(k500)103
19Internet
INTERNET BACKBONE
Nodes computers, routers Links physical lines
(Faloutsos, Faloutsos and Faloutsos, 1999)
20ACTOR CONNECTIVITIES
Nodes actors Links cast jointly
N 212,250 actors ?k? 28.78
P(k) k-?
?2.3
21SCIENCE CITATION INDEX
Nodes papers Links citations
1736 PRL papers (1988)
P(k) k-?
(? 3)
(S. Redner, 1998)
22Coauthorship
SCIENCE COAUTHORSHIP
Nodes scientist (authors) Links joint
publication
(Newman, 2000, Barabasi et al 2001)
23Coauthorship
Online communities
Nodes online user Links email contact
Kiel University log files 112 days, N59,912
nodes
Ebel, Mielsch, Bornholdt, PRE 2002.
24Food Web
Nodes trophic species Links trophic
interactions
R.J. Williams, N.D. Martinez , Nature (2000)
R. Sole (cond-mat/0011195)
25Sex-web
Nodes people (Females Males) Links sexual
relationships
4781 Swedes 18-74 59 response rate.
Liljeros et al. Nature 2001
26Most real world networks have the same internal
structure
Scale-free networks
Why?
What does it mean?
27Origins SF
SCALE-FREE NETWORKS
(1) The number of nodes (N ) is NOT fixed.
Networks continuously expand by the addition of
new nodes
Examples
WWW addition of new documents
Citation publication of new papers
28BA model
Scale-free model
(1) GROWTH
At every timestep we
add a new node with m edges (connected to the
nodes already present in the system). (2)
PREFERENTIAL ATTACHMENT
The probability ? that a new node will be
connected to node i depends on the degree ki of
that node
A.-L. Barabási, R. Albert, Science 286, 509 (1999)
29MFT
Mean Field Theory
, with the initial condition
A.-L.Barabási, R. Albert and H. Jeong, Physica A
272, 173 (1999)
30Growth without preferential attachment
31Preferential Attachment
For given ?t, ?k ? ?(k)
Citation network
Internet
(Jeong, Neda, A.-L. B, cond-mat/0104131)
32g exponent is not universal
Extended Model
- prob. p internal links
- prob. q link deletion
- prob. 1-p-q add node
33More models
Other Models
34Presence of a giant (percolating) component
Branching process
The probability that an edge leads to a vertex
with degree k is
The condition that the branching process prevails
35Prot Interaction map
Yeast protein network
Nodes proteins Links
physical interactions (binding)
P. Uetz, et al. Nature 403, 623-7 (2000).
36C. Elegans
Drosophila M.
Li et al. Science 2004
Giot et al. Science 2003
37Origin of the scale-free topology of PPI
networksgene duplication
Proteins with more interactions are more likely
to obtain new links ?(k) k
(preferential attachment)
Wagner 2001 Vazquez et al. 2003 Sole et al.
2001 Rzhetsky Gomez 2001 Qian et al. 2001
Bhan et al. 2002.
38Metabolic network
Nodes chemicals (substrates) Links bio-chemical
reactions
Archaea
Bacteria
Eukaryotes
The metabolic networks of organisms from all
three domains of life are scale-free!
H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and
A.L. Barabasi, Nature, 407 651 (2000)
39Characterizing the links
Metabolism Flux Balance Analysis
(Palsson) Metabolic flux for each reaction
Maximize cv, where c is the unit vector in the
direction of growth (biomass production).
Edwards, J. S. Palsson, B. O, PNAS 97, 5528
(2000). Edwards, J. S., Ibarra, R. U. Palsson,
B. O. Nat Biotechnol 19, 125 (2001). Ibarra, R.
U., Edwards, J. S. Palsson, B. O. Nature 420,
186 (2002).
40Global flux organization in the E. coli metabolic
network
E. Almaas, B. Kovács, T. Vicsek, Z. N. Oltvai,
A.-L. B. Nature, 2004 Goh et al, PRL 2002.
41Inhomogeneity in the local flux distribution
42Robustness
Robustness
Complex systems maintain their basic functions
even under errors and
failures
(cell ? mutations Internet ?
router breakdowns)
43Robust-SF
Robustness of scale-free networks
Attacks
Failures
fc
Albert, Jeong, Barabasi, Nature 406 378 (2000)
44Absence of a critical percolation threshold for ?
3
After random removal of a fraction f of the
vertices
The new degree distribution
Percolation
Critical fraction
Cohen, Erez, ben-Avraham, Havlin, PRL 85, 4626
(2000)
45Achilles Heel
Achilles Heel of complex networks
failure
attack
Internet
R. Albert, H. Jeong, A.L. Barabasi, Nature 406
378 (2000)
46Prot- robustness
Yeast protein network - lethality and topological
position -
Highly connected proteins are more essential
(lethal)...
H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N.
Oltvai, Nature 411, 41-42 (2001)
47Disease spreading in thesusceptible-infected-susc
eptible (SIS) epidemic model
Rate of becoming infected by an infected
neighbor l
Rate of recovery 1
Steady state solution
Epidemic threshold
Pastor-Satorras and Vespignani, PRE 65, 036104
(2002)
48SIS in complex networks
Mean-field approximation
The probability that an edge leads to a vertex
with degree k is
The probability that a neighbor is infected
Steady state solution
49SIS in complex networks
This has a nontrivial solution when
from which we get that the epidemic threshold is
Uniform immunization with probability g does not
help in scale free networks if ? 3.
50Non-uniform immunization of complex networks
51Motifs
Function is often carried out by
subnetworks, rather than by single components.
Motifs Subgraphs that have a significantly
higher density in the real network than in the
randomized version of the studied
network Randomized networks Ensemble of
maximally random networks preserving the degree
distribution of the original network
R. Milo et al., Science 298, 824-827 (2002)
52Three-node connected subgraphs
53Network motifs
54Why do we have motifs?
Hypothesis they are dynamically desirable
building blocks.
Feed-Forward (FF) motive is a noise filter.
55Communitiesdensely connected subgraphs
56Traditional method hierarchical clustering
(agglomerative method)
dendogram
All edges are removed, and then added back one by
one in decreasing order of their
strengths.Communities are defined as the
forming components.
The strength of the relationship between any pair
of vertices can, e.g., be defined as
where
The matrix Al contains the number of walks with
length l between the vertex pairs.
57Girvan-Newman method(divisive method)
It also results in a dendogram, by cutting the
edges one by one.In each step the edge with the
highest betweenness centrality (BC) is
removed.The BC of an edge is the number of
shortest paths between all pairs of vertices that
use this edge.
Girvan and Newman, PNAS 99, 7821 (2002)
58Modularity
When should one stop with the agglomeration/divisi
on?
Newman and Girvan, PRE 69, 026113 (2004)
59Potts model
Minimization of the Hamiltonian
Reichardt and Bornholdt, PRL 93, 218701 (2004)
60 Clique percolation method (CPM)
Most real networks are characterized by
overlapping and nested communities.
Divisive/agglomerative methods fail to identify
the communities when overlaps are significant.
Palla, Derényi, Farkas, and Vicsek, Nature 435,
814-818 (2005)
Derényi, Palla, and Vicsek, Phys. Rev. Lett. 94,
160202 (2005)
61 We define a community as a k-clique percolation
cluster.
k-cliques are complete subgraphs of size k
An example of overlapping k-clique communities
for k4
Advantages of this method
- local,
- allows overlaps,
- density (not distance) based,
- produces no cut-nodes,
62 Studied systems
- Co-authorship networkLos Alamos cond-mat
archive30,739 nodes and 136,065 links - Word association networkSouth Florida Free
Association norms list10,617 nodes and 63,788
links - Protein-protein interaction networkDIP core list
of the yeast S. cerevisiae2,609 nodes and 6,355
links
Links are usually weighted (wij). For each value
of k (typically k3,4,5) a threshold weight can
be introduced. (Note that there is a critical
threshold at which a giant cluster
appears. Optimally the threshold weight should be
chosen close to this critical value.)
63 64Web of communities for the protein interaction
network of yeast
links represent overlaps between the communities
65Community statistics
community size distribution
overlap size distr.
membership number distr.
community degree distribution
66Clique percolation in an ER graph
Branching process
67Dedicated web page for the CPM (software, papers,
data)
http//www.cfinder.org/
Some review papers
Albert and Barabasi, Rev. Mod. Phys. 74, 47
(2002).
Dorogovtsev and Mendes, Adv. Phys. 51, 1079
(2002).
Useful web page with papers, data, and ppt
presentations
http//www.nd.edu/networks/
(Where many of the slides of this course have
been borrowed from.)