Title: break
1break
2Evolutionary rates
Reference Dans book chapter 4
3Evolutionary rates - history
- The first to suggest using DNA and proteins to
investigate evolutionary history. - (They discussed molecular evolution before the
genetic code was established).
4Linus Pauling (1901-1994)
- The only person ever to receive two unshared
Nobel Prizesfor Chemistry (1954) and for Peace
(1962). - His introductory textbook General Chemistry,
revised three times since its first printing in
1947 and translated into 13 languages, has been
used by generations of undergraduates.
5Linus Pauling (1901-1994)
- Also wrote popular science books, e.g., How to
Live Longer and Feel Better, and Vitamin C and
the Common Cold. - Published over 1,000 articles and books.
- Used to protest against nuclear testing.
6Linus Pauling (1901-1994)
- He received a Ph.D. in chemistry and mathematical
physics from California Institute of Technology
(Caltech) in 1925 (age 24).
7Evolutionary rates
Rate is distance divided by time. Distance is
number of substitutions per site. Time is in
years. The time must be doubled, because the
sequences evolved independently.
d
8Evolutionary rates
This formula is not accurate for closely related
taxa, in which polymorphism must be taken into
account (Takahata and Satta 1997).
9Mean Rate of Nucleotide Substitutions in
Mammalian Genomes
10-9 Substitutions/site/year
Evolution is a very slow process at the molecular
level (Nothing happens)
10Sequence alignments
Alignment is needed for phylogeny and for
molecular evolution. We will assume that the
alignment is given. How to construct alignment
is outside the scope of this course.
11Synonymous vs. nonsynonymous substitutions
For most proteins, it is observed that the rate
of synonymous substitutions (silent
substitutions) is much larger than the
nonsynonymous rate (amino-acid modifying
substitutions).
UUU -gt UUC (both encode phenylalanine )
synonymous UUU -gt CUU (phenylalanine to
leucine) nonsynonymous
12A lot
A little
13Synonymous vs. nonsynonymous substitutions
14Synonymous vs. nonsynonymous substitutions
15(No Transcript)
16Empirical findings
Important proteins evolve slower than unimportant
ones.
17break
18Insulin
19Insulin
1953, Frederick Sanger determines the amino-acid
sequence of insulin. This is the FIRST protein
whose amino-acid sequence was determined. It
demonstrated that insulin is comprised of only
L-amino acids.
20Insulin
Insulin was characterized to be composed of two
chains (AB), linked together by S-S bonds.
21 AA
30 AA
21Insulin
How is the 2 chain protein synthesized? Donald
Steiner (University of Chicago) gave the
answer. He studied an islet-cell adenoma of the
pancreas, a rare human tumor producing large
amounts of insulin.
22Adenoma
Adenoma is a benign tumor (not a malignant
tumor). Benign in English harmless Benign
tumor A tumor that does not recur locally and
does not spread to other parts of the
body. Adenoma is from a glandular (i.e., from a
gland) origin. Adenomas can grow from many
organs including the colon, adrenal, pituitary,
thyroid.
23Insulin
He sliced the pancreatic tumor and incubated it
with tritiated leucine and then analyzed it. He
found a new protein that was later proven to be
the biosynthetic precursor of insulin, the
proinsulin.
24Insulin
Proinsulin has 30 residues that are absent from
insulin.
25(No Transcript)
26Insulin
There is even a former form of proinsulin, called
preproinsulin. It contains additional 19 AA at
the N-terminus. This 19 AA hydrophobic stretch
directs the preproinsulin to the ER.
Preproinsulin -gt Proinsulin (ER membrane) From
the ER it moves on to the Golgi and then to
secretory granules. Proinsulin -gt Insulin
(Granules)
27Alignment of preproinsulin
Xenopus MALWMQCLP-LVLVLLFSTPNTEALANQHL Bos
MALWTRLRPLLALLALWPPPPARAFVNQHL
. .. . Xenopus
CGSHLVEALYLVCGDRGFFYYPKIKRDIEQ Bos
CGSHLVEALYLVCGERGFFYTPKARREVEG
Xenopus
AQVNGPQDNELDG-MQFQPQEYQKMKRGIV Bos
PQVG---ALELAGGPGAGGLEGPPQKRGIV
.. Xenopus
EQCCHSTCSLFQLENYCN Bos
EQCCASVCSLYQLENYCN
.
28(No Transcript)
29(No Transcript)
30Empirical findings
Functional regions evolve slower than
nonfunctional regions.
31Rates of amino-acid replacements in different
proteins
32Clotting The end reaction
thrombin
fibrinogen fibrin
33(No Transcript)
34(No Transcript)
35(No Transcript)
36Synonymous vs. nonsynonymous substitutions
Histone H4 between human and wheat excess of
synonymous substitutions
37Mean nonsynonymous rate 0.74 ? 0.67 (10-9
substitutions per site per year)
Mean synonymous rate 3.51 ? 1.01 (10-9
substitutions per site per year)
38The coefficient of variation is an attribute of a
distribution its standard deviation divided by
its mean
Coefficient of variation of nonsynonymous rate 91
Coefficient of variation of synonymous rate 29
39Transition vs. transversion rates
Ratio 1.5 4.4 1.1
Degeneracy class
0
4
2
40break
41Computing synonymous and non-synonymous rates
Silent and non-silent
42Computing synonymous and non-synonymous rates
3
3
43Ka/Ks
Our goal is to be able to compare two (or later,
more) sequences and to compare the rate of
neutral evolution (determined by the synonymous
rate) with than of the non-synonymous rate. The
lower the ratio of non-synonynous substitutions
to synonymous ones, the higher the intensity of
the purifying selection.
44Computing synonymous and non-synonymous rates
p-distance of synonymous subs. 3/6 p-distance
of nonsynonymous subs. 3/6
3
3
Problematic p-distance does not correct for
multiple substitutions Solution compute the
JC correction to the p-distance.
45Computing synonymous and non-synonymous rates
Assume a protein without selection (evolving
neutrally).
GAA (Glu)
TAA (Stop)
CAA (Gln)
AAC (Asn)
ACA (Thr)
AAA (Lys)
AAG (Lys)
AGA (Arg)
AAT (Asn)
ATA (Ile)
The random chance of a synonymous substitution is
much smaller than the chance of a nonsynonymous
one.
46Computing synonymous and non-synonymous rates
Assume a protein without selection (evolving
neutrally).
CCA (Pro)
TCA (Ser)
ACA (Thr)
GCC (Ala)
GAA (Glu)
GCA (Ala)
GCG (Ala)
GGA (Gly)
GCT (Ala)
GTA (Val)
This is also different for different codons.
47Computing synonymous and non-synonymous rates
So when one observe 6 times more nonsynonymous
substitutions than synonymous ones does it
indicate that the protein is under purifying
selection??? We must normalize for the
potentials for silent vs. non-silent mutations of
the codons in question.
48break
49Nei Gojobori (1986)method
Masatoshi Nei
Takashi Gojobori
50Counting synonymous sites Consider a particular
position in a codon (j1,2,3). Let fj be the
fraction of synonymous changes at this site.
51In TTT (Phe), the first two positions are
nonsynonymous, because no synonymous changes can
occur in them, and the third position is 1/3
synonymous and 2/3 nonsynonymous because one of
the three possible changes is synonymous.
52Counting synonymous sites Let s be the number of
synonymous sites for each codon. s is in fact,
the proportion, out of 3, of synonymous
substitutions, assuming equal probability for
each type of substitution.
For this example, s 1/3.
53Counting synonymous sites Let n be the number of
non-synonymous sites for each codon. n is in
fact, the proportion, out of 3, of non-synonymous
substitutions, assuming equal probability for
each type of substitution.
For this example, n 22/3.
54Counting synonymous sites Assume we have r
codons (3r sites). Let si and ni be s and n for
the ith codon. We define
55Classification of sites S is in fact, the
proportion, out of 3r, of synonymous
substitutions, assuming equal probability for
each type of substitution.
56Classification of sites We have two
sequences ACG CCG ATT ATG CCT CTA S for these
two sequences, will be the average S of the 2
sequence. The same goes for N.
57Counting synonymous substitutions
So far we have counted the potential for
synonymous and nonsynonymous substitutions. Now
we wish to count the actual number of synonymous
and nonsynonymous substitutions.
58Counting synonymous substitutions
For two codons that differ by only one
nucleotide, the difference is easily inferred.
GTC (Val) ?? GTT (Val) synonymous GTC (Val) ??
GCC (Ala) nonsynonymous.
59Counting synonymous substitutions
We define sd and nd to be the number of
synonymous and nonsynonymous substitutions per
codon. GTC (Val) ?? GTT (Val) sd 1, nd
0 GTC (Val) ?? GCC (Ala) sd 0, nd 1
60Counting synonymous substitutions
For two codons that differ by two or more
nucleotides, the estimation problem is more
complicated, because we need to determine the
order in which the substitutions occurred.
61Pathway (1) requires one synonymous and one
nonsynonymous substitutions, whereas pathway (2)
requires two nonsynonymous substitutions.
62If there are 3 differences between two codons,
there are 6 possible paths. ABC ?? XYZ A
changed first, B second and finally C. A changed
first, C second and finally B. B changed first, A
second and finally C. B changed first, C second
and finally A. C changed first, A second and
finally B. C changed first, B second and finally
A.
63There are two approaches to deal with multiple
substitutions at a codon
64The unweighted method Average the numbers of the
different types of substitutions for all the
possible scenarios. For example, if we assume
that the two pathways are equally likely, then
the number of nonsynonymous substitutions is (1
2)/2 1.5, and the number of synonymous
substitutions is (1 0)/2 0.5.
65The weighted method. Employ an a priori criteria
to assign the probability of each pathway. For
instance, if the weight of pathway 1 is 0.9, and
the weight for pathway 2 is 0.1, then the number
of nonsynonymous substitutions between the two
codons is (0.9 ? 1) (0.1 ? 2) 1.1, and the
number of synonymous substitutions is 0.9.
66(No Transcript)
67Counting synonymous sites Assume we have r
codons (3r sites). Let and be sd and nd
for the ith codon. We define
Total number of observed substitutions
68Counting synonymous substitutions per synonymous
sites We define p-distances for each type of
substitution
These distances, are than corrected using the JC
formula
69Three types of selection If dn lt ds ? purifying
selection If dn ds ? neutral evolution If dn gt
ds ? positive selection
70Humans are not so special?
71break
72Generation time and genomic evolution in
primates Vincent M. Sarich Allan C.
Wilson Science vol 179 1144-1147 (1973).
A primate
73Some background on Primates
New world monkeys (Platyrrhines)
Haplorhines (Higher primates)
Gibbons
Hominidae
Catarrhines
Old world monkeys
Tarsiers
Prosimians (Strepsirhines)
http//www.whozoo.org/mammals/Primates/primatephyl
ogeny.htm
74Some background on Primates
- Primates 233 species and 13 families
- The smallest living primate is the pygmy marmoset
(NW monkey), which weighs around 70 g the
largest is the gorilla, weighing up to around 175
kg.
http//animaldiversity.ummz.umich.edu/site/account
s/information/Primates.html
75Some background on Primates
- Most primate species live in the tropics or
subtropics, although a few, most notably humans,
also inhabit temperate regions. - Except for a few terrestrial species, primates
are arboreal. Some species eat leaves or fruit
others are insectivorous or carnivorous.
Arbor tree in Latin
76Prosimians
77Great apes
Hominidae is the primate family, which includes
the extant species of humans, chimpanzees,
gorillas, and orangutans, as well as many extinct
species. The members of the family are called
hominids. The family is also called great apes.
78Great apes
Originally non-human great apes were called
Pongidae. However, this original definition
suggests that Pongidae is a monophyletic family
which is not the case.
79Many studies have showed a correlation between
time of divergence and amount of evolutionary
(molecular) distance Protein sequences of
species that diverged earlier, show more
differences.
p-dist
time
80Theres a big disagreement if time should be
measured in terms of astronomical time (i.e.,
years) or generation length.
81The generation-time-hypothesis The number of
substitutions is proportional to the number of
generations.
A (human)
O
B (tree shrew)
Prediction Short generation ? More generations
since divergence ? More substitutions (in B)
82Absolute rates of evolution demand knowledge of
divergence dates (from the fossil
record). However, relative rates of evolution
can be computed from the phylogeny This will be
done using the relative rate method.
83Assume 3 taxa, A, B and C.
A (human)
O
B (tree shrew)
C (outgroup)
T1
T2
84Assume 3 taxa, A, B and C.
85Assume 3 taxa, A, B and C.
The generation time hypothesis predicts
BO gt AO BOOC gt AOOC BC gt AC
In words, the distance of species with short
generation time from an outgroup, should be
higher compared to species with longer generation
time.
86Assume 3 taxa, A, B and C.
They used (C) modern carnivore species as their
outgroup.
87The authors compared immunological distances
between a few species and carnivore species. The
distance between Homo sapiens and each one of 4
carnivore species was computed, and they reported
the average. The 4 carnivore species are
Hyaena, Genetta, Ursus, and Arctogalida.
88Hyaena, Genetta, Ursus, and Arctogalida.
89Genetta genetta (small-spotted genet)
Although catlike in appearance and habit, the
genet is not a cat but a member of the family
Viverridae. Genets were kept as pets by the
ancient Egyptians as they are today by Berbers in
North Africa. From the Greek empire to the Middle
Ages, the genet was kept as a rat catcher and was
often portrayed on tapestries of the period. The
domestic cat eventually replaced the genet,
probably because it is more efficient in killing
rats-and perhaps because it is less smelly.
90Results Immunological distances from carnivore
species Homo sapiens 162 Macaca mulatta
(rhesus monkey) 166 Ateles geoffroyi (spider
monkey) 149 Nycticebus coucang (slow
loris) 125 Lemur fulvus (brown
lemur) 135 Tarsius spectrum (tarsier) 137 Tupai
a glis (tree shrew) 156
91Results Immunological distances from carnivore
species Homo sapiens 162 Macaca mulatta
(rhesus monkey) 166 Ateles geoffroyi (spider
monkey) 149 Nycticebus coucang (slow
loris) 125 Lemur fulvus (brown
lemur) 135 Tarsius spectrum (tarsier) 137 Tupai
a glis (tree shrew) 156
Prosimian
92India, Malaysia, Sumatra, Java, Borneo,
Philippines
Nycticebus coucang (slow loris)
Life span is 20 years (generation time lt 20
years). Nocturnal and arboreal, they spend the
day sleeping in a tight ball up a tree.
93These results are against the generation-time
hypothesis Homo sapiens 162 Macaca
mulatta (rhesus monkey) 166 Ateles geoffroyi
(spider monkey) 149 Nycticebus coucang (slow
loris) 125 Lemur fulvus (brown
lemur) 135 Tarsius spectrum (tarsier) 137 Tupai
a glis (tree shrew) 156
No correlation of distances with generation
length, for homo-prosimians
Prosimian
94Results Immunological distances from carnivore
species Homo sapiens 162 Macaca mulatta
(rhesus monkey) 166 Ateles geoffroyi (spider
monkey) 149 Nycticebus coucang (slow
loris) 125 Lemur fulvus (brown
lemur) 135 Tarsius spectrum (tarsier) 137 Tupai
a glis (tree shrew) 156
Scandentia
95Common tree shrew - TUPAIA GLIS
Order Climbing Mammals (Scandentia)Family
Tupaiidae.
96Common tree shrew - TUPAIA GLIS
This small order of tree shrews was at one time
placed in the midst of controversy is it a
primate (order Primates) or an insectivore (order
Insectivora).
For several years, different groups placed the
tree shrews in either one of these orders.
Finally, in 1984 this issue was resolved when
they were placed in their own order, called
Scandentia. Some researchers still argue that
they are the most primitive form of the primates,
however.
97Tarsius spectrum(tarsier)
Although data are not available on the lifespan
of this species, another member of the genus, T.
syrichta, is reported to have lived 13.5 years in
captivity. Tarsius spectrum is likely to have a
similar maximum lifespan.
98Results Immunological distances from carnivore
species Homo sapiens 162 Macaca mulatta
(rhesus monkey) 166 Ateles geoffroyi (spider
monkey) 149 Nycticebus coucang (slow
loris) 125 Lemur fulvus (brown
lemur) 135 Tarsius spectrum (tarsier) 137 Tupai
a glis (tree shrew) 156
No correlation of distances with generation
length. Homo has the longest, tree shrew, the
shortest.
99break
100An evolutionary experiment
Spalax ehrenberghi
101(No Transcript)
102(No Transcript)
103The structural protein composing the lens is
called a-crystallin. It is composed of two
subunits, aA and aB. Each subunit is a
single-copy gene located on a different
chromosome. The two subunits have approximately
57 sequence homology, probably reflecting
ancient gene duplication. They also have low
sequence similarity to heat-shock proteins
(possible origin of family).
104In Spalax, aA-crystallin lost its functional role
more than 25 million years ago, when the mole rat
became subterranean and presumably lost use of
its eyes.
105The aA-crystallin of Spalax evolves 4 times
faster than the aA-crystallins in other rodents,
such as rats, mice, hamsters, gerbils and
squirrels. Functional relaxation. The
aA-crystallin of Spalax evolves 5 times slower
than pseudogenes. It is still functional.
106The aA-crystallin of Spalax possess all the
prerequisites for normal function and expression,
including the proper signals for alternative
splicing. The aA-crystallin of Spalax was shown
to still be present in the rudimentary lens of
the mole rat. Functional.
107Explanation 1 There is good evidence that the
rudimentary eye, though not able to detect light
anymore is still of vital importance for
photoperiod perception, which is required for the
physiological adaptations of the animal to
seasonal changes.
108Explanation 2 The blind mole rat lost its
vision more recently than 25 million years ago.
The rate of nonsynonymous substitutions after
nonfunctionalization has been underestimated.
Contradicting evidence The aA-crystallin gene
is still an intact gene as far as the essential
molecular structures for its expression are
concerned.
109Explanation 3 The aA-crystallin-gene product
serves a function unrelated to that of the eye.
Supporting evidence 1. aA-crystallin has been
found in other tissues. 2. aA-crystallin also
functions as a chaperone that binds denaturing
proteins and prevents their aggregation. 3. The
regions within aA-crystallin responsible for
chaperone activity are conserved in the mole rat.