Molecular Evolution

About This Presentation

Title:

Molecular Evolution

Description:

Distance-based tree construction methods (UPGMA and neighbour joining) ... Exceptions and limitations: The primate lineage appears to have a somewhat lower ... – PowerPoint PPT presentation

Number of Views:76

Avg rating:3.0/5.0

Slides: 13

Provided by: isrecI

Category:

more less

Transcript and Presenter's Notes

Title: Molecular Evolution

1
Molecular Evolution

The neutral theory of molecular evolution
The molecular clock hypothesis
Positive selection
Phylogenetic trees
Evolutionary distance measures
Distance-based tree construction methods (UPGMA
and neighbour joining)

EPFL Bioinformatics I 16 Jan 2006
2
Neutral theory of molecular evolution

Historical background (Haldanes paradox)
Early population geneticists believed that most
polymorphisms are maintained by balancing
selection.
Balancing selection implies a genetic load
because homozygotes are less fit than
heterozygotes.
When protein electrophoresis became available, it
was found that a very large number of genes were
actually polymorphic. This appeared to imply an
unacceptably high genetic load for the human and
other populations.

Kimuras neutral theory of molecular evolution
provides an explanation for Haldanes paradox
Claim The large majority of observed molecular
polymorphisms reflect neutral changes. Likewise,
most substitutions observed between homologous
genes are selectively neutral.
Implications Gene (protein) families evolve
through neutral mutations and purifying
selection. Most genes (proteins) have not been
improved during the period of metazoan evolution.

EPFL Bioinformatics I 16 Jan 2006
3
The molecular clock hypothesis
In 1965, Zuckerkandl and Pauling proposed that
for any given lineage, the rate of molecular
evolution (amino acid substitutions per year) is
constant over time. In other words there exists
a universal molecular clock. Implications
Mutation and substitution rates are the same in
all lineages. This supports the neutral theory of
molecular evolution (no dependence on population
size and generation time) but is difficult to
reconcile with claims that mutation rates differ
along chromosomes. Exceptions and limitations
The primate lineage appears to have a somewhat
lower evolutionary rate than other lineages. The
theory is not readily testable for lower
eukaryotes and bacteria, for which a fossil
record is lacking. Note further the rate
appears to be proportional to time, and not to
the number of generations or cell divisions. The
independence of generation time speaks against
positive selection as a driving force of
evolution. The independence of cell cycles
suggests that most mutations do not happen during
replication.
EPFL Bioinformatics I 16 Jan 2006
4
Proteins evolve at different rates

Making the following assumptions
All amino acid replacements are selectively
neutral (neutral theory)
There is a constant molecular clock
A given protein (e.g. an enzyme) has the same
function and thus evolves under the same
purifying selection conditions in all species
it follows that
a given protein evolves at a constant rate in all
lineages
However
Different proteins may evolve at different rates
due to varying levels of functional constraints
At the nucleotide sequence, the different rates
are primarily reflected by non-silent base
substitutions (assuming that silent substitutions
are selectively neutral).
These predictions are matched by many protein
families

EPFL Bioinformatics I 16 Jan 2006
5
Positive Selection

Positive selection may occur
When the function of a protein is improved, e.g.
the efficiency or substrate specificity of an
enzyme
When a protein is undergoing adaptation to
changes in the environment, e.g. viral surface
proteins try to escape the immune system. This
case is documented by many examples and may
generally be more frequent than functional
improvement.
Potential evidence for positive selection
Ratio of silent versus non-silent amino acid
substitutions
Accelerated or population-size dependent rate of
amino acid substitutions in a particular lineage
More silent replacements among within species
polymorphisms than among between species
replacements (evidence for positive selection in
the past).

EPFL Bioinformatics I 16 Jan 2006
6
Phylogenetic trees
Rooted tree
Rooted tree satisfying molecular clock
hypothesis all leaves at same distance from the
root.
root
6
root
7
8
time
7
6
8
3
5
1
2
4
2
1
3
4
5
Unrooted tree
Note 1-5 are called leaves, or leave nodes. 6-8
are inferred nodes corresponding to ancestral
species or molecules. Branches are also called
edges. The edge lengths reflect evolutionary
distances.
3
4
8
6
2
7
5
1
Bioinformatics I 16 Jan 2006
7
Phylogenetic trees
A phylogenetic tree is a graph reflecting the
approximate distances between a set of objects in
a hierarchical fashion. A tree is also called a
dendrogram. There are different types of trees
Unrooted versus rooted trees A rooted tree has
an additional node representing the origin, in
molecular phylogeny the last common ancestor of
the sequences analyzed. In general, the root
cannot be directly inferred from the data. It may
be inferred from the paleontological record, from
a trusted outlier, or on the basis of the
molecular clock hypothesis. Scaled and
unscaled trees In an unscaled tree, the length
of the branches are not important. Only the
topology counts. In phylogeny, trees are usually
scaled. Binary trees each node branches into
two daughter nodes. Other trees are usually not
considered in phylogegy as they can easily be
approximated by binary trees with very short
edges between nodes. Note A rooted (unrooted)
tree connecting n objects (leaves) has 2n1
(2n2) nodes altogether and 2n2 (2n3) edges
Bioinformatics I 16 Jan 2006
8
Phylogenetic tree reconstruction, overview

Computational challenge There is an enormous
number of different topologies even for a
relatively small number of sequences
3 sequences 1
4 sequences 3
5 sequences 15
10 sequences 2,027,025
20 sequences 221,643,095,476,699,771,875
Consequence Most tree construction algorithm are
heuristic methods not guaranteed to find the
optimal topology.
Input data for two major classes of algorithms
Input data distance matrix, examples UPGMA,
neighbor-joining
2. Input data multiple alignment parsimony,
maximum likelihood
Distance matrix methods use distances computed
from pairwise or multiple alignments as input.

Bioinformatics I 16 Jan 2006
9
Distance measures for phylogenetic tree
construction
Distance measures respect the following
constraints d 0 if the sequences are
identical, d gt 0 if the sequences are
different Distances between molecular sequences
are computed from pair-wise alignment scores.
For closely related DNA sequences, one could
simply use f , the fraction of non-identical
residues (readily computed from the identity
value returned by an alignment program). For
more distantly related sequences, the
Jukes-Cantor distance, dij ¾log(14f/3) is
preferred. This measures is supposed to be
proportional to evolutionary time. It takes into
account that the percent identity values
saturates at 25 over time. For protein
sequences aligned with the aid of a substitution
matrix, an approximate distance is often computed
as follows
Sobs observed pairwise alignment
score Smax maximum score (average of sequences of
sequences against themselves Srand expected
score for random sequences of same length and
composition
Bioinformatics I 16 Jan 2006
10
Distance matrixbased methods Example UPGMA

Unrooted pair-group method with arithmetic means
(UPGMA)
Initialization
assign each sequence to its own cluster
define one leave node for each single-sequence
cluster
put all leave nodes at height zero
Iteration.
determine the two clusters for which the distance
is minimal and combine them in a new cluster.
compute the distance between the new cluster and
all other clusters by averaging over all
pair-wise distances between cluster elements
define a new node for the new cluster and place
it at height corresponding to the average
distance between the cluster elements
Termination
When only two clusters remain, place root at the
average distance between the elements of the two
clusters
Limitation of UPGMA The algorithms implicitly
assumes a constant evolutionary rate in all
branches. It is therefore unfit to test the
molecular clock hypothesis.
An alternative method called neighbour-joining
provides more realistic branch lengths.

Bioinformatics I 16 Jan 2006
11
Distance matrixbased methods Neighbor joining

Underlying assumptions
Additivity The distance between two leaves is
the sum of the lengths of the edges on the path
connecting them (may at best be approximately
true)
Motivation
Additivity is a less stringent assumption than
the molecular clock assumption
.If different branches of a tree evolve at
different rates, the closest pair of leaves may
not be neighboring leaves, see example below.
Output an un-rooted tree
Principle
A modified distance measure Dij is used to detect
neighbors, which is obtained by subtracting the
average distances to all other leaves from the
distance dij

0.4
3
Example The closest pair of leaves 1,2 are not
neigbors d120.3, d130,5, however D12 1.1,
D13 1.2
1
0.1
0.1
4
2
0.1
0.4
Bioinformatics I 16 Jan 2006
12
Distance matrixbased methods Neighbor joining

Initialization
Define T to be the set of leave nodes,
Initialize the current set of nodes as LT.
Iteration
Pick a pair i, j from L for which Dij is minimal
Define a new node k and set dkm ½(dimdjmdij),
for all m in L, also compute Dkm for all m.
Add k to T with edges of length
dij ½(dijrirj)
djk dijdik
Remove i and j from L and add k.
Termination
When L consists of two leaves i and j add
remaining edge between i and j, with length dij
Formulae
where L is the size of the current set of leave
nodes

Bioinformatics I 16 Jan 2006

Write a Comment

User Comments (0)

About PowerShow.com

Molecular Evolution - PowerPoint PPT Presentation

Molecular Evolution

Distance-based tree construction methods (UPGMA and neighbour joining) ... Exceptions and limitations: The primate lineage appears to have a somewhat lower ... – PowerPoint PPT presentation