Title: Evolution of Networks
1Evolution of Networks
Social links in Canberra, Australia By A. S.
Klovdahl email Alden.Klovdahl_at_anu.edu.au
Itai Yanai Department of Biology Technion
Israel Institute of Technology
2Network Biology
A network is defined by a set of nodes and
connecting edges. Here we will use the terms
networks, graphs, and pathways
interchangeably
Jeong et al. Nature 411, 41 - 42 (2001)
3The evolution of the meaning of protein function
traditional view
post-genomic view
Eisenberg et al. Nature 2000 405 823-6
4Some relevant Zen
- Things derive their being and nature by mutual
dependence and are nothing in themselves.-Nagarju
na, second century Buddhist philosopher - An elementary particle is not an independently
existing, unanalyzable entity. It is, in essence,
a set of relationships that reach outward to
other things.-H.P. Stapp, twentieth century
physicist
5Evolution of Networks
Outline for today
- The Evolution of Biological Systems
- Flagellum
- Blood clotting
- The Structure of Biological Networks
- Scale-free networks
- Error and Attack
- Network motifs
6GATCTACCATGAAAGACTTGTGAATCCAGGAAGAGAGACTGACTGGGCAA
CATGTTATTCAGGTACAAAAAGATTTGGACTGTAACTTAAAAATGATCAA
ATTATGTTTCCCATGCATCAGGTGCAATGGGAAGCTCTTCTGGAGAGTGA
GAGAAGCTTCCAGTTAAGGTGACATTGAAGCCAAGTCCTGAAAGATGAGG
AAGAGTTGTATGAGAGTGGGGAGGGAAGGGGGAGGTGGAGGGATGGGGAA
TGGGCCGGGATGGGATAGCGCAAACTGCCCGGGAAGGGAAACCAGCACTG
TACAGACCTGAACAACGAAGATGGCATATTTTGTTCAGGGAATGGTGAAT
TAAGTGTGGCAGGAATGCTTTGTAGACACAGTAATTTGCTTGTATGGAAT
TTTGCCTGAGAGACCTCATTGCAGTTTCTGATTTTTTGATGTCTTCATCC
ATCACTGTCCTTGTCAAATAGTTTGGAACAGGTATAATGATCACAATAAC
CCCAAGCATAATATTTCGTTAATTCTCACAGAATCACATATAGGTGCCAC
AGTTATCCCCATTTTATGAATGGAGTTheEvolutionofBiological
SystemsGATGAAAACCTTAGGAATAATGAATGATTTGCGCAGGCTCACC
TGGATATTAAGACTGAGTCAAATGTTGGGTCTGGTCTGACTTTAATGTTT
GCTTTGTTCATGAGCACCACATATTGCCTCTCCTATGCAGTTAAGCAGGT
AGGTGACAGAAAAGCCCATGTTTGTCTCTACTCACACACTTCCGACTGAA
TGTATGTATGGAGTTTCTACACCAGATTCTTCAGTGCTCTGGATATTAAC
TGGGTATCCCATGACTTTATTCTGACACTACCTGGACCTTGTCAAATAGT
TTGGACCTTGTCAAATAGTTTGGAGTCCTTGTCAAATAGTTTGGGGTTAG
CACAGACCCCACAAGTTAGGGGCTCAGTCCCACGAGGCCATCCTCACTTC
AGATGACAATGGCAAGTCCTAAGTTGTCACCATACTTTTGACCAACCTGT
TACCAATCGGGGGTTCCCGTAACTGTCTTCTTGGGTTTAATAATTTGCTA
GAACAGTTTACGGAACTCAGAAAAACAGTTTATTTTCTTTTTTTCTGAGA
GAGAGGGTCTTATTTTGTTGCCCAGGCTGGTGTGCAATGGTGCAGTCATA
GCTCATTGCAGCCTTGATTGTCTGGGTTCCAGTGGTTCTCCCACCTCAGC
CTCCCTAGTAGCTGAGACTACATGCCTGCACCACCACATCTGGCTAGTTT
CTTTTATTTTTTGTATAGATGGGGTCTTGTTGTGTTGGCCAGGCTGGCCA
CAAATTCCTGGTCTCAAGTGATCCTCCCACCTCAGCCTCTGAAAGTGCTG
GGATTACAGATGTGAGCCACCACATCTGGCCAGTTCATTTCCTATTACTG
GTTCATTGTGAAGGATACATCTCAGAAACAGTCAATGAAAGAGACGTGCA
TGCTGGATGCAGTGGCTCATGCCTGTAATCTCAGCACTTTGGGAGGCCAA
GGTGGGAGGATCGCTTAAACTCAGGAGTTTGAGACCAGCCTGGGCAACAT
GGTGAAAACCTGTCTCTATAAAAAATTAAAAAATAATAATAATAACTGGT
GTGGTGTTGTGCACCTAGAGTTCCAACTACTAGGGAAGCTGAGATGAGAG
GATACCTTGAGCTGGGGACTGGGGAGGCTTAGGTTACAGTAAGCTGAGAT
TGTGCCACTGCACTCCAGCTTGGACAAAAGAGCCTGATCCTGTCTCAAAA
AAAAGAAAGATACCCAGGGTCCACAGGCACAGCTCCATCGTTACAATGGC
CTCTTTAGACCCAGCTCCTGCCTCCCAGCCTTCT
7Bacterial flagellum
"I must say, for my part, that no more pleasant
sight has ever yet come before my eye than these
many thousands of living creatures, seen all
alive in a little drop of water, moving among one
another, each several creature having its own
proper motion." Antony van Leeuwenhoek
8Flagellum is the poster-child of the
intelligent-design movement
Bacteria swim by rotating their flagellum
9How did such a beautiful structure of a flagellum
evolve?
Filament
Hook
Basal body
The base of the flagellum is a motor that rotates
the filament. M is the rotor and S is the stator
KEGG database http//www.genome.ad.jp/kegg/kegg2.
html
10William Paley (1743-1805)
The argument from design at the molecular level?
http//www.ucmp.berkeley.edu/history/paley.html
11The Argument from Personal Incredulity
Never say, and never take seriously anyone who
says, I cannot believe that so-and-so could have
evolved by gradual selection. I have dubbed this
kind of fallacy the Argument from Personal
Incredulity. Richard Dawkins - River out of
Eden
12The modern form of The Argument from Personal
Incredulity Irreducible Complexity
"An irreducibly complex system cannot be produced
directly by numerous, successive, slight
modifications of a precursor system, because any
precursor to an irreducibly complex system that
is missing a part is by definition nonfunctional.
.... Since natural selection can only choose
systems that are already working, then if a
biological system cannot be produced gradually it
would have to arise as an integrated unit, in one
fell swoop, for natural selection to have
anything to act on."
13A Mouse trap is Irreducibly Complexity
This argument seems to rise above the "argument
from self-personal incredulity." by asserting
that it is a structure "in which the removal of
an element would cause the whole system to cease
functioning"
14Darwin on networks
"If it could be demonstrated that any complex
organ existed which could not possibly have been
formed by numerous, successive, slight
modifications, my theory would absolutely break
down."
15Hopeful monsters
Richard Goldschmidt believed that between species
is a "bridgeless gaps" that could only be
accounted for by large sudden jumps, resulting in
"hopeful monsters."
16Is a hopeful monster required to explain the
evolution of the flagellar?
Filament
Hook
Basal body
KEGG database http//www.genome.ad.jp/kegg/kegg2.
html
17A detour into bacterial pathogenicity
The type III secretory system (TTSS), allows gram
negative bacteria to translocate proteins
directly into the cytoplasm of a host cell
Yersinia pestis
KEGG database http//www.genome.ad.jp/kegg/kegg2.
html
18Type III Secretion system is homologous to
flagellum proteins
Heuck, C. J., 1998, Microbiol. Mol. Biol. Rev.
62 379-433.
19The homologous components correspond to the motor
Heuck, C. J., 1998, Microbiol. Mol. Biol. Rev.
62 379-433.
20Phylogenetic profiles of flagellar ortholog
families
Note that genome i (Chlamydia) has almost only
those genes that are also annotated in type III
secretory pathway.
21Consequences for the irreducible complexity
argument
- The existence of the type III secretory system in
a wide variety of bacteria demonstrates that a
small portion of the irreducibly complex
flagellum can indeed carry out an important
biological function. - Since such a function is clearly favored by
natural selection, the contention that the
flagellum must be fully-assembled before any of
its component parts can be useful is obviously
incorrect. - What this means is that the argument for
intelligent design of the flagellum has failed.
Miller K.R. "Debating Design from Darwin to
DNA," edited by Michael Ruse and William Dembski
22Blood Clotting
Red blood cell Fibrin (blood clot)
The ability of the body to control the flow of
blood following vascular injury is paramount to
continued survival
http//www.indstate.edu/thcme/mwking/blood-coagula
tion.html
23Fibrinogen, a fibrous soluble protein, makes up
3 of blood plasma
http//baximg.baxter.com/images/investors/financia
l/annual_report/1996/fibrinsealant.gif
From Stryers Biochemistry
24The heart of the reaction involves just two
molecules fibrinogen and thrombin
Thrombin
Fibrinogen
Fibrin
Thrombin removes As and Bs, converting
fibrinogen to fibrin. Fibrin proteins clump
together due to the affinity of a and b for the
sticky crevices left by As and Bs excision.
25Rube Goldberg machines Pencil Sharpener
Open window (A) and fly kite (B). String (C)
lifts small door (D) allowing moths (E) to escape
and eat red flannel shirt (F). As weight of
shirt becomes less, shoe (G) steps on switch (H)
which heats electric iron (I) and burns hole in
pants (J). Smoke (K) enters hole in tree (L),
smoking out opossum (M) which jumps into basket
(N), pulling rope (O) and lifting cage (P),
allowing woodpecker (Q) to chew wood from pencil
(R), exposing lead. Emergency knife (S) is
always handy in case opossum or the woodpecker
gets sick and can't work.
26The blood clotting system looks like a Rube
Goldberg machine
Is it Irreducibly Complexity?
From Stryers Biochemistry
27The domain architecture of blood coagulation
proteins reveals a history of exon shuffling
Peer Borks Modules page http//www.bork.embl-he
idelberg.de/Modules/extra.html
28Most of the enzymes involved in clotting are
serine proteases
The serine proteases are homologous. They are
also homologous to the pancreatic serine
proteases trypsin, chymotrypsin, and elastase.
The N-terminal segments are thought to be
responsible, at least in part, for the
specificities of the proteolytic blood clotting
factors.
The Evolution of Vertebrate Blood Clotting By
Kenneth Miller, http//biocrs.biomed.brown.edu/Dar
win/DI/clot/Clotting.html
29Tree of serine proteases reveals a history of the
duplications
A single event can account for the introduction
of the gamma domain and also of the two EGF
domains in the FIX, FX, and protein C
Proteases with the gamma domain form a discrete
cluster. Prothrombin is the deepest division.
Doolittle, R. F., and Feng, D. F., (1987) Cold
Spring Harbor Symposia on Quantitative Biology
52 869-874.
30Based upon the sequence comparisons, a scenario
for when clotting proteins made their appearance
Doolittle, R. F., and Feng, D. F., (1987) Cold
Spring Harbor Symposia on Quantitative Biology
52 869-874.
31Fibrinogen is composed of two each of three
homologous polypeptide chains (a, b, g)
32Invertebrates should have at least one fibrinogen.
human
lamprey
Diverged 450 MYA
50
Thus, the gene duplication giving rise to beta
and gamma ought to have occurred at least 600
MYA.
33Invertebrates DO have fibrinogen.
Parastichopus parvimensis (warty sea cucumber)
34A model for the evolution of blood clotting
Most serine proteases, including trypsin and
thrombin, are auto-catalytic.
The inactive form of the protease (A) is changed
into the active form (A) when two things happen
it is bound to tissue factor (TF) and it is
activated by tissue proteases, including our
protease itself (that's the autocatalytic part).
This means - and this is important - that our
protease is actually involved in cutting two
things Fibrinogen, and also itself, converting
A's inactive precursor protein into A.
Ken Miller, http//biocrs.biomed.brown.edu/Darwin/
DI/clot/Clotting.html
35A gene duplication occurs in the gene for our
protease, producing a new (B) version of the gene
Proteins A and B are identical. Each can bind to
TF, each can cleave fibrinogen into fibrin, and
each can activate itself or its sister serum
protease. So nothing has really changed - we've
just got two copies of the same gene.
Ken Miller, http//biocrs.biomed.brown.edu/Darwin/
DI/clot/Clotting.html
36Network rearrangements
A mutation in the active site of B changes its
behavior, making it a little less likely to cut
fibrinogen and a little more likely to activate
protease A.
But why would natural selection favor a mutation
like this in B's active site?
Ken Miller, http//biocrs.biomed.brown.edu/Darwin/
DI/clot/Clotting.html
37Why a cascade?
The multiple steps of the cascade amplify the
signal from the first stimulus. A cascade
increases the efficiency of the clotting
process. With so many more active proteases in
the neighborhood of the injury, clotting can
occur more quickly, increasing the chances of
surviving a hemorrage.
Ken Miller, http//biocrs.biomed.brown.edu/Darwin/
DI/clot/Clotting.html
38The evolution of networks
- Duplication First, gene duplications occur.
- Modification Second, simple point mutations
(amino acid replacements) modulate the function
of the duplicate. - Regulation Finally, mechanisms are brought into
play that control the amounts of the various
homologous factors.
Doolittle, R.F., Boston Review 1996
39Summary the system evolved by a process of gene
duplications from serine proteases that once were
digestive enzymes.
- Fibrinogen is found in other contexts.
- Serine protease is homologous to digestive
enzymes - The domain composition of the proteins involved
is consistent with an exon shuffling model. - The branching pattern of the proteins involved is
also consistent with a gene duplication model - A plausible scenario of cascade evolution was
presented
40GATCTACCATGAAAGACTTGTGAATCCAGGAAGAGAGACTGACTGGGCAA
CATGTTATTCAGGTACAAAAAGATTTGGACTGTAACTTAAAAATGATCAA
ATTATGTTTCCCATGCATCAGGTGCAATGGGAAGCTCTTCTGGAGAGTGA
GAGAAGCTTCCAGTTAAGGTGACATTGAAGCCAAGTCCTGAAAGATGAGG
AAGAGTTGTATGAGAGTGGGGAGGGAAGGGGGAGGTGGAGGGATGGGGAA
TGGGCCGGGATGGGATAGCGCAAACTGCCCGGGAAGGGAAACCAGCACTG
TACAGACCTGAACAACGAAGATGGCATATTTTGTTCAGGGAATGGTGAAT
TAAGTGTGGCAGGAATGCTTTGTAGACACAGTAATTTGCTTGTATGGAAT
TTTGCCTGAGAGACCTCATTGCAGTTTCTGATTTTTTGATGTCTTCATCC
ATCACTGTCCTTGTCAAATAGTTTGGAACAGGTATAATGATCACAATAAC
CCCAAGCATAATATTTCGTTAATTCTCACAGAATCACATATAGGTGCCAC
AGTTATCCCCATTTTATGAATGGAGTTheStructureofBiological
NetworksGATGAAAACCTTAGGAATAATGAATGATTTGCGCAGGCTCAC
CTGGATATTAAGACTGAGTCAAATGTTGGGTCTGGTCTGACTTTAATGTT
TGCTTTGTTCATGAGCACCACATATTGCCTCTCCTATGCAGTTAAGCAGG
TAGGTGACAGAAAAGCCCATGTTTGTCTCTACTCACACACTTCCGACTGA
ATGTATGTATGGAGTTTCTACACCAGATTCTTCAGTGCTCTGGATATTAA
CTGGGTATCCCATGACTTTATTCTGACACTACCTGGACCTTGTCAAATAG
TTTGGACCTTGTCAAATAGTTTGGAGTCCTTGTCAAATAGTTTGGGGTTA
GCACAGACCCCACAAGTTAGGGGCTCAGTCCCACGAGGCCATCCTCACTT
CAGATGACAATGGCAAGTCCTAAGTTGTCACCATACTTTTGACCAACCTG
TTACCAATCGGGGGTTCCCGTAACTGTCTTCTTGGGTTTAATAATTTGCT
AGAACAGTTTACGGAACTCAGAAAAACAGTTTATTTTCTTTTTTTCTGAG
AGAGAGGGTCTTATTTTGTTGCCCAGGCTGGTGTGCAATGGTGCAGTCAT
AGCTCATTGCAGCCTTGATTGTCTGGGTTCCAGTGGTTCTCCCACCTCAG
CCTCCCTAGTAGCTGAGACTACATGCCTGCACCACCACATCTGGCTAGTT
TCTTTTATTTTTTGTATAGATGGGGTCTTGTTGTGTTGGCCAGGCTGGCC
ACAAATTCCTGGTCTCAAGTGATCCTCCCACCTCAGCCTCTGAAAGTGCT
GGGATTACAGATGTGAGCCACCACATCTGGCCAGTTCATTTCCTATTACT
GGTTCATTGTGAAGGATACATCTCAGAAACAGTCAATGAAAGAGACGTGC
ATGCTGGATGCAGTGGCTCATGCCTGTAATCTCAGCACTTTGGGAGGCCA
AGGTGGGAGGATCGCTTAAACTCAGGAGTTTGAGACCAGCCTGGGCAACA
TGGTGAAAACCTGTCTCTATAAAAAATTAAAAAATAATAATAATAACTGG
TGTGGTGTTGTGCACCTAGAGTTCCAACTACTAGGGAAGCTGAGATGAGA
GGATACCTTGAGCTGGGGACTGGGGAGGCTTAGGTTACAGTAAGCTGAGA
TTGTGCCACTGCACTCCAGCTTGGACAAAAGAGCCTGATCCTGTCTCAAA
AAAAAGAAAGATACCCAGGGTCCACAGGCACAGCTCCATCGTTACAATGG
CCTCTTTAGACCCAGCTCCTGCCTCCCAGCCTTCT
41The Erdos-Renyi model
N nodes, every pair of nodes being connected with
probability p
Albert and Barabasi. REVIEWS OF MODERN PHYSICS,
74 2002 48-97
42What network structure should be used to model a
biological network?
Strogatz S.H., Nature (2001) 410 268
lattice
random
43Metabolic networks
KEGG database http//www.genome.ad.jp/kegg/kegg2.
html
44Metabolic networks
Jeong et al. Nature (2000) 407 651-654
45Graph theoretic description of metabolic networks
Graph theoretic description for a simple pathway
(catalysed by Mg2-dependant enzymes) is
illustrated (a). In the most abstract approach
(b) all interacting metabolites are considered
equally.
Barabasi Oltvai. NRG. (2004) 5 101-113
46Calculating the degree connectivity of a network
1
1
2
2
1
1
6
3
1
5
1
4
3
8
7
2
2
3
4
2
1
2
Degree connectivity distributions
frequency
1 2 3 4 5 6 7 8
degree connectivity
47Connectivity distributions for metabolic networks
E. coli (bacterium)
A. fulgidus (archaea)
averaged over 43 organisms
C. elegans (eukaryote)
Jeong et al. Nature (2000) 407 651-654
48Protein-protein interaction networks
Jeong et al. Nature 411, 41 - 42 (2001) Wagner.
RSL (2003) 270 457-466
(color of nodes is explained later)\
49Degree connectivity distributions differs between
random and observed (metabolic and
protein-protein interaction) networks.
Strogatz S.H., Nature (2001) 410 268
log frequency
log frequency
log degree connectivity
log degree connectivity
50What is so scale-free about these networks?
No matter which scale is chosen the same
distribution of degrees is observed among nodes
51A simple model for generating scale-free
networks
- Evolution networks expand continuously by the
addition of new vertices, and - Preferential-attachment (rich get richer) new
vertices attach preferentially to sites that are
already well connected.
Barabasi Bonabeau Sci. Am. May 2003 60-69
Barabasi and Albert. Science (1999) 286 509-512
52Scale-free network model
To incorporate the growing character of the
network, starting with a small number (m0) of
vertices, at every time step we add a new vertex
with m (lt m0 ) edges that link the new vertex to
m different vertices already present in the
system.
Barabasi and Albert. Science (1999) 286 509-512
53Scale-free network model
To incorporate preferential attachment, we assume
that the probability P that a new vertex will be
connected to vertex i depends on the connectivity
k i of that vertex, so that P(k i ) k i /S j k
j .
Barabasi and Albert. Science (1999) 286 509-512
54Scale-free network model
This network evolves into a scale-invariant state
with the probability that a vertex has k edges,
following a power law with an exponent 2.9 /-
0.1 After t time steps, the model leads to a
random network with t m0 vertices and mt edges.
Barabasi and Albert. Science (1999) 286 509-512
55Comparing Random Vs. Scale-free Networks (both
with 130 nodes and 215 links)
The importance of the connected nodes in the
scale-free network 27 of the nodes are reached
by the five most connected nodes, in the
scale-free network more than 60 are reached.
Modified from Albert et al. Science (2000) 406
378-382
56Failure and Attack
Failure Removal of a random node. Attack The
selection and removal of a few nodes that play a
vital role in maintaining the networks
connectivity.
Albert et al. Science (2000) 406 378-382
a macroscopic snapshot of Internet connectivity
by K. C. Claffy
57Random networks are homogeneous so there is no
difference between failure and attack
Diameter of the network
Fraction nodes removed from network
Modified from Albert et al. Science (2000) 406
378-382
58Scale-free networks are robust to failure but
susceptible to attack
Diameter of the network
Fraction nodes removed from network
Modified from Albert et al. Science (2000) 406
378-382
59Yeast protein-protein interaction networks
the phenotypic effect of removing the
corresponding protein
Jeong et al. Nature 411, 41 - 42 (2001)
60A positive correlation between lethality and
connectivity
Pearsons linear correlation coefficient 0.75
Average and standard deviation for the various
clusters.
Jeong et al. Nature 411, 41 - 42 (2001)
61Network expansion by gene duplication
b shows a small protein interaction network
(blue) and the genes that encode the proteins
(green). When cells divide, occasionally one or
several genes are copied twice into the
offsprings genome (illustrated by the green and
red circles). This induces growth in the protein
interaction network because now we have an extra
gene that encodes a new protein (red circle). The
new protein has the same structure as the old
one, so they both interact with the same
proteins. Ultimately, the proteins that
interacted with the original duplicated protein
will each gain a new interaction to the new
protein. Therefore proteins with a large number
of interactions tend to gain links more often, as
it is more likely that they interact with the
protein that has been duplicated. This is a
mechanism that generates preferential attachment
in cellular networks. Indeed, in the example that
is shown it does not matter which gene is
duplicated, the most connected central protein
(hub) gains one interaction. In contrast, the
square, which has only one link, gains a new link
only if the hub is duplicated.
Barabasi Oltvai. NRG. (2004) 5 101-113
62The transcriptional regulation network of
Escherichia coli.
Shai S. Shen-Orr, Ron Milo, Shmoolik Mangan Uri
Alon (2002) Nature Genetics 31 64 - 68
63Motifs in the networks
- Deployed a motif detection algorithm on the
transcriptional regulation network. - Identified three recurring motifs (significant
with respect to random graphs).
Shai S. Shen-Orr, Ron Milo, Shmoolik Mangan Uri
Alon (2002) Nature Genetics 31 64 - 68
64Convergent evolution of gene circuits
Are the components of the feed-forward loop for
example homologous?
Circuit duplication is rare in the transcription
network
Conant and Wagner. Nature Genetics (2003) 34
264-266
65The End, Thanks