Title: Molecular basis of evolution.
1 Molecular basis of evolution.
- Goal to reconstruct the evolutionary history of
all organisms in the form of phylogenetic trees. - Classical approach phylogenetic trees were
constructed based on the comparative morphology
and physiology. - Molecular phylogenetics phylogenetic trees are
constructed by comparing DNA/protein sequences
between organisms.
2Evolution of mankind.
- Analysis of mitochondrial DNA proposes that Homo
sapiens evolved from one group of Homo erectus in
Africa (African Eve) 100,000 200,000 years ago.
American indians I, 25-35,000
Europeans 40-50,000
American indians II, 7-9,000
Asians 55-75,000
Africans 100,000
Adam appeared 250,000 years ago, much earlier!
3Mechanisms of evolution.
- By mutations of genes. Mutations spread through
the population via genetic drift and/or natural
selection. - - By gene duplication and recombination.
4Mutational changes of DNA sequences.
- 1. Substitution. 3.
Insertion. - Thr Tyr Leu Leu Thr
Tyr Leu Leu - ACC TAT TTG CTG ACC TAT TTG
CTG -
- ACC TCT TTG CTG ACC TAC TTT
GCT G - Thr Tyr Leu Leu Thr
Tyr Phe Ala - 2. Deletion. 4.
Inversion. - Thr Tyr Leu Leu Thr
Tyr Leu Leu - ACC TAT TTG CTG ACC TAT TTG
CTG - ACC TAT TGC TG- ACC TTT ATG
CTG - Thr Tyr Cys Thr
Phe Met Leu
5Synonymous and nonsynonymous nucleotide
substitutions.
- Synonymous substitutions in codons do not change
the encoding amino acid, non-synonymous
substitutions do. - ds/dn lt 1 indicates positive natural selection.
- ds, dn - of synonymous substitutions per
- non-synonymous site
6Gene duplication and recombination.
- New genes/proteins occur through gene duplication
and recombination.
Gene 1
Ancestral globin
duplication
Gene 2
globin
globin
hemoglobin
myoglobin
New gene
Duplication
Recombination
7Measures of evolutionary distance between amino
acid sequences.
- 1. P-distance. Evolutionary distance is usually
measures by the number of amino acid
substitutions. -
nd number of amino acid differences between two
sequences n number of aligned amino acids.
8Poisson correction for evolutionary distance.
- 2. PC-distance. Takes into account multiple
substitutions and therefore is proportional to
divergence time. - PC-distance can be expressed through the
p-distance
9Another method to estimate evolutionary
distances amino acid substitution matrices.
- 3. Distance from amino acid substitution
matrices. Substitutions occur more often between
amino acids of similar properties. - - Dayhoff (1978) derived first matrices from
multiple alignments of close homologs. - - The number of aa substitutions is measured
in terms of accepted point mutations (PAM) one
aa substitution per 100 sites. - - Dayhoff-distance can be approximated by
gamma-distance with a2.25.
10Fixation of mutations.
- Not all mutations are spread through population.
Fixation when a mutation is incorporated into a
genome of species. - Majority of mutations are neutral (Kimura), do
not effect the fitness of organism. - Fixation rate depends on the size of population
(N), fitness (s) and mutation rate (µ)
11Phylogenetic analysis.
- Phylogenetic trees are derived from multiple
sequence alignments. Each column describes the
evolution of one site. - Each position/site in proteins/nucleic acids
changes in evolution independently from each
other. - Insertions/deletions are usually ignored and
trees are constructed only from the aligned
regions.
12Evolutionary tree constructed from rRNA analysis.
13The concept of evolutionary trees.
- Trees consist of nodes and branches, topology -
branching pattern. - The length of each branch represents the number
of substitutions occurring between two nodes. If
rate of evolution is constant, branches will have
the same length (molecular clock hypothesis). - The distance along the tree is calculated by
summing up all intervening branch lengths. - Trees can be binary or bifurcating.
- Trees can be rooted and unrooted. The root is
placed by including a taxon which is known to
branch off earlier than others.
14Accuracies of phylogenetic trees.
- Two types of errors
- Topological error
- Branch length error
- Bootstrap test
- Resampling of alignment columns with
replacement recalculating the tree counting how
many times this topology occurred bootstrap
confidence value. If it is close to 100
reliable topology/interior branch.
15Estimation of species divergence time.
- Assumption rate constancy, molecular clock.
- Find T1, if T2 is known.
T1
T2
A
B
C
16Estimation of evolutionary rates in hemoglobin
alpha-chains.
P-distance PC-distance Gamma-distance
Human/cow 0.121 0.129 0.134
Human/kangaroo 0.186 0.205 0.216
Human/carp 0.486 0.665 0.789
Estimate the evolutionary rate of divergence
between human and cow (time of divergence between
these groups is 90 millions years).
17Methods for phylogenetic trees construction.
Set of related sequences
Multiple sequence alignments
Strong sequence similarity?
Maximum parsimony methods
Yes
No
Recognizable sequence similarity?
Yes
Distance methods
No
Analyze reliability of prediction
Maximum likelihood methods
181. Distance methods. Calculating branch lengths
from distances.
A B C
A ----- 20 30
B ----- ----- 44
C ----- ----- -----
a
c
b
19Neighbor-joining method.
- NJ is based on minimum evolution principle (sum
of branch length should be minimized). - Given the distance matrix between all sequences,
NJ joins sequences in a tree so that to give the
estimate of branch lengths. - Starts with the star tree, calculates the sum of
branch lengths. -
C
B
b
c
D
a
d
e
A
E
20Neighbor-joining method.
- 2. Combine two sequences in a pair, modify the
tree. - 3. Treat cluster CDE as one sequence X,
calculate average distances between A and X,
B and X, calculate a and b.
C
B
c
b
d
D
a
e
A
E
4. Treat AB as a single sequence, calculate c, d
and e. 5. Calculate the sum of branch lengths,
S. 5. Repeat the cycle and calculate S for other
pair, choose the lowest S.
21Classwork I
- Given a multiple sequence, construct distance
matrix (p-distance) and calculate the branch
lengths. - APTHASTRLKHHDDHH
- ALTKKSTRIRHIPD-H
- DLTPSSTIIR-YPDLH
-
22Classwork II NJ tree using MEGA.
- Go to CDD webpage and retrieve alignment of
cd00157 in FASTA format. - Import this alignment into MEGA and convert it to
MEGA format http//www.megasoftware.net/mega3/mega
.html . - 3. Construct NJ tree using different distance
measures with bootstrap. - 4. Analyze obtained trees.
232.1 Maximum parsimony definition of informative
sites.
- Maximum parsimony tree tree, that requires the
smallest number of evolutionary changes to
explain the differences between external nodes. - Site, which favors some trees over the others.
- 1 2 3 4 5 6 7
- A A G A C T G
- A G C C C T G
- A G A T T T C
- A G A G T T C
-
- Site is informative (for nucleotide sequences) if
there are at least two different kinds of letters
at the site, each of which is represented in at
least two of the sequences.
242. Maximum parsimony.
Site 3
1.G
3.A
1.G
2.C
2.C
1.G
G
A
A
A
A
A
2.C
4.A
3.A
4.A
4.A
3.A
Tree 1.
Tree 2.
Tree 3.
Site 3 is not informative, all trees are realized
by the same number of substitutions. Advantage
deals with characters, dont need to compute
distance matrices. Disadvantage
- multiple substitutions are not
considered - branch
lengths are difficult to calculate
- slow
252.3 Maximum parsimony method.
- Identify all informative sites in the alignment.
- 2. Calculate the minimum number of
substitutions at each informative site. - 3. Sum number of changes over all informative
sites for each tree. - 4. Choose tree with the smallest number of
changes.
26Maximum likelihood methods.
- Similarity with maximum parsimony
- - for each column of the alignment all
possible trees are calculated - - trees with the least number of
substitutions are more likely - Advantage of maximum likelihood over maximum
parsimony - - takes into account different rates of
substitution between different amino acids and/or
different sites - - applicable to more diverse sequences
27Classwork maximum marsimony.
- Search the NCBI Conserved Domain Database for
pfam00127. - Construct maximum parsimony tree using MEGA3.
- Analyze this tree and compare it with the
phylogenetic tree from the research paper.