Title: Molecular Evolution NUI Maynooth June 2001
1Molecular Evolution NUI Maynooth June 2001
2Aims of the course
- To introduce the theory and practice of
phylogenetic inference from molecular data - To provide an introduction to some of the most
useful methods and computer programmes
3Introduction to the course
- Some basic concepts e.g. phylogeny, monophyly,
homology analogy - Exploring patterns in sequence data
- Alignment - ClustalW (done)
- Phylogenetic analysis
- Parsimony
- Distance matrix analysis
- Maximum likelihood
- How robust are phylogenetic hypotheses?
4Phylogenetics (Cladistics)
- Based upon evolutionary relationships i.e. upon
common ancestry - Cladogram is a tree diagram which depicts a
hypothesised evolutionary history - A Phylogram is a tree which indicates by branch
length the degree of change believed to have
occurred along each lineage
5Cladograms and phylograms
Bacteria 1
Bacteria 2
Cladograms show branching order - branch lengths
are meaningless
Bacteria 3
Eukaryote 1
Eukaryote 2
Eukaryote 3
Eukaryote 4
Phylograms show branch order and branch lengths
Bacteria 1
Bacteria 2
Bacteria 3
Eukaryote 1
Eukaryote 2
Eukaryote 3
Eukaryote 4
6Rooting using outgroups
Archaea outgroup
Bacteria 1
Rooted by outgroup
Bacteria 2
Bacteria 3
Eukaryote 1
Eukaryote 2
Eukaryote 3
Root
Eukaryote 4
7How construct a phylogeny?
- What kind of data?
- How to analyse it?
8Richard Owen
9Owens definition of homology
- Homologue the same organ under every variety of
form and function (true or essential
correspondence) - Analogy superficial or misleading similarity
- Richard Owen 1843
10Charles Darwin
11Darwin and homology
- The natural system is based upon descent with
modification .. the characters that naturalists
consider as showing true affinity (i.e.
homologies) are those which have been inherited
from a common parent, and, in so far as all true
classification is genealogical that community of
descent is the common bond that naturalists have
been seeking - Charles Darwin, Origin of species
1859 p. 413
12Homology is...
- Homology similarity that is the result of
inheritance from a common ancestor -
identification and analysis of homologies is
central to phylogenetics
13Phylogenetics
- Sees homology as evidence of common ancestry
- Uses tree diagrams to portray relationships based
upon recency of common ancestry - Monophyletic groups (clades) - contain species
which are more closely related to each other than
to any outside of the group
14Monophyletic groups
Archaea outgroup
Bacteria
monophyletic groups (clades)
Bacteria
Bacteria
Eukaryote
Eukaryote
Eukaryote
Eukaryote
15How construct a phylogeny?
- What kind of data?
- How to analyse it?
16Fossil primate skulls
17Microbial morphologies - some are complex but
many are simple - for example look at a drop of
lake water
18Linus Pauling
Linus Pauling and his co-workers asked the
questions Where in organisms has the greatest
amount of information about their past history
survived ? How it can be extracted ?
19Molecules can tell us about the past
The sequences of DNA, RNA and protein molecules
are documents of evolutionary history
20What sequences should we use?
- Choice of sequence - appropriate for question
(fast or slow evolving - close or distant
relationships). - Many sequences are a mosaic of different rates
- 16S rRNA different structural regions evolve at
different rates - Proteins - synonymous (silent) rate (codon
position 3) is often faster than nonsynonymous
(positions 1 2 - changes aa) rate of change - Transitions occur more readily than transversions
2116S rRNA structure
22Exploring patterns in sequence data
- Do the sequences contain phylogenetic signal for
the relationships of interest? (too conserved or
too variable) - Are sequences saturated for change at the level
of relationship to be investigated? - Do sequences manifest biased base compositions
(e.g thermophilic convergence) or biased codon
usage patterns which may obscure phylogenetic
signal
23Saturation in sequence data
- Saturation is due to multiple changes at the same
site subsequent to lineage splitting - Models of evolution attempt to infer the missing
information through correcting for multiple
hits - Most data will contain some fast evolving sites
which are potentially saturated (e.g. in proteins
often position 3) - In severe cases the data becomes essentially
random and all information about relationships
can be lost
24Multiple changes at a single site
Seq 1 AGCGAG Seq 2 GCGGAC
Number of changes
Seq 1
Seq 2
25Biased base compositions?
- Do sequences manifest biased base compositions
(e.g thermophilic convergence) or biased codon
usage patterns which may obscure phylogenetic
signal
26A case study in phylogenetic analysisDeinococcus
and Thermus
- Deinococcus are radiation resistant bacteria
- Thermus are thermophilic bacteria
- BUT
- Both have the same very unusual cell wall based
upon ornithine - Both have the same menaquinones (Mk 9)
- Both have the same unusual polar lipids
- Congruence between these complex characters
supports a phylogenetic relationship between
Deinococcus and Thermus
27 Guanine Cytosine in 16S rRNA genes
Thermophiles Thermus thermophilus Aquifex
pyrophilus Mesophiles Deinococcus
radiodurans Bacillus subtilis
guanine cytosine at variable sites
72 73 52 50
28A four taxon problem for Deinococcus and
Thermus(Thermus, Deinococcus, Bacillus, Aquifex)
- Aquifex and Bacillus are thermophiles and
mesophiles, respectively - No data suggest that Aquifex and Bacillus are
specifically related to either Deinococcus or
Thermus - If all four bacteria are included in an analysis
the true tree should place Thermus and
Deinococcus together
Thermus
Aquifex
The true tree
Deinococcus
Bacillus
29Most methods of analysis will be fooled by the
base compositional biases in the data
Aquifex 73 GC
Bacillus 50 GC
Deinococcus 52 GC
Thermus 72 GC
The wrong tree - places taxa which share similar
base compositions together
30Is there a molecular clock?
- The idea of a molecular clock was initially
suggested by Zuckerkandl and Pauling in 1962 - They noted that rates of amino acid replacements
in animal haemoglobins were roughly proportional
to time - as judged against the fossil record
31The molecular clock for alpha-globinEach point
represents the number of substitutions separating
each animal from humans
shark
carp
platypus
number of substitutions
chicken
cow
Time to common ancestor (millions of years)
32There is no universal molecular clock
- The initial proposal saw the clock as a Poisson
process with a constant rate - Now known to be more complex - differences in
rates occur for - different sites in a molecule
- different genes
- different regions of genomes
- different genomes in the same cell
- different taxonomic groups for the same gene
- there is no universal molecular clock
33Rates of amino acid replacement in different
proteins
34Are there local molecular clocks?
- If there is no universal molecular clock are
there local clocks? - Can individual molecular data sets yield useful
estimates of times of divergence? - Requires
- demonstration of rate constancy in the data set
(some kind of relative rate test) - sufficient external data - fossils - to reliably
calibrate the clock
35Relative rate test (Wilson Sarich, 1973)
Under a molecular clock the distance (K) from A
to O (the common ancestor of A and B), and B to
O, should be the same We can measure relative
rates for A and B by reference to an outgroup
C KAC - KBC 0 gt0 indicates rate Agt rate B lt0
indicates rate Bgtrate A
O
A
B
C
one can therefore exclude taxa which violate rate
constancy
36Some potential problems with clocks
- Need a good fossil record to calibrate the clock
- often missing (e.g. for bacteria?) - Windows on fossil time estimates are often large
- How calculate amount of divergence between 2
sequences? (use a model - subject of much of the
present course)
37Using fossils to date splitting events
inferred timing of split
A
B
C
A
B
C
Time
therefore estimates may be very imprecise
38Phylogenetic inferences are premised on
- Phylogenetic inferences are premised on the
inheritance of ancestral characters, and on the
existence of an evolutionary history defined by
changes in these characters - A tree like model of evolution (paralogy, lateral
transfer?)
39Gene trees and species trees
ORTHOLOGY
40Paralogy can produce misleading trees
Gene phylogenies
Organism phylogeny
A
a1
b1
B
Misleading tree from incomplete sampling
c1
C
a2
b2
c2
gene duplication
PARALOGY
41The malic enzyme tree contains paralogues
Anas a duck !
42Phylogenetic analysis requires careful thought
- Phylogenetic analysis is frequently treated as a
black box into which data are fed (often gathered
at considerable cost) and out of which The Tree
springs - (Hillis, Moritz Mable 1996, Molecular
Systematics)