Title: Molecular Systematics
1Molecular Systematics Evolution
2Darwins letter to Thomas Huxley 1857
- The time will come I believe, though I shall not
live to see it, when we shall have fairly true
genealogical (phylogenetic) trees of each great
kingdom of nature
Haeckels pedigree of man
3Aims of the course
- To introduce the theory and practice of
phylogenetic inference from molecular data - To introduce some of the most useful methods and
computer programmes
4How construct a phylogeny
5Richard Owen
6(No Transcript)
7Owens definition of homology
- Homologue the same organ under every variety of
form and function (true or essential
correspondence) - Analogy superficial or misleading similarity
- Richard Owen 1843
8(No Transcript)
9Charles Darwin
10Darwin and homology
- The natural system is based upon descent with
modification .. the characters that naturalists
consider as showing true affinity (i.e.
homologies) are those which have been inherited
from a common parent, and, in so far as all true
classification is genealogical that community of
descent is the common bond that naturalists have
been seeking - Charles Darwin, Origin of species 1859 p. 413
11Homology is...
- Homology similarity that is the result of
inheritance from a common ancestor - the
identification and analysis of homologies is
central to phylogenetic systematics
12Phylogenetic systematics
- Sees homology as evidence of common ancestry
- Uses tree diagrams to portray relationships based
upon recency of common ancestry - Monophyletic groups (clades) - contain species
which are more closely related to each other than
to any outside of the group
13Cladograms and phylograms
Bacterium 1
Cladograms show branching order - branch lengths
are meaningless
Bacterium 2
Bacterium 3
Eukaryote 1
Eukaryote 2
Eukaryote 3
Eukaryote 4
Phylograms show branch order and branch lengths
Bacterium 1
Bacterium 2
Bacterium 3
Eukaryote 1
Eukaryote 2
Eukaryote 3
Eukaryote 4
14Rooting using an outgroup
archaea
archaea
Unrooted tree
archaea
Rooted by outgroup
bacteria outgroup
archaea
Monophyletic group
archaea
archaea
eukaryote
Monophyletic group
eukaryote
root
eukaryote
eukaryote
15Fossil skulls
16From left to right the primate species depicted
on the tree are a lemur (Lemur catta), an
adapid (Hoanghonius stehlini), a tarsier (Tarsius
bancanus), an omomyid (Shoshonius cooperi), a
proto-monkey (Eosimias centennicus), a South
American monkey (Saimiri sciureus), an Old World
monkey (Mandrillus sphinx), a great ape (Gorilla
gorilla), and a human (Homo sapiens)
17Microbial morphologies - some are complex but
many are simple - for example look at a drop of
lake water
18Linus Pauling
19Molecules as documents of evolutionary history
- We may ask the question where in the now living
systems the greatest amount of information of
their past history has survived and how it can be
extracted - Best fit are the different types of
macromolecules (sequences) which carry the
genetic information
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24Exploring patterns in sequence data 1
- Which sequences should we use?
- Do the sequences contain phylogenetic signal for
the relationships of interest? (might be too
conserved or too variable) - Are there features of the data which might
mislead us about evolutionary relationships?
25The importance of 16S rRNA for those interested
in the early evolution of life lies in its slow
evolution, which allows the historical record
preserved in its gene sequences to be kept
relatively intact.
26Small subunit ribosomal RNA
18S or 16S rRNA
27An alignment involves hypotheses of positional
homology between bases or amino acids
Alignment of 16S rRNA sequences from different
bacteria
28But not all proteins have the same rate of
change.. Rates of amino acid replacement differ
in different proteins for example
29Case Study Florida Dentist case 1990
- Each Branch represents the sequence from part of
the env gene of HIV-1 - Viral sequences were obtained from the dentist
and seven of his former patients(A-G),also
infected with HIV - A,B,C,E and G have sequences very closely related
to those of the dentist suggesting that he
infected them. - D and F, had other risk factors for infection,
their viruses are separated from the dentist by
sequences taken from controls. - Because HIV is so variable, 2 different sequences
are included for dentist and patient A.
30Is there a molecular clock?
- The idea of a molecular clock was initially
suggested by Zuckerkandl and Pauling in 1962 - They noted that rates of amino acid replacements
in animal haemoglobins were roughly proportional
to time - as judged against the fossil record
31The molecular clock for alpha-globinEach point
represents the number of substitutions separating
each animal from humans
shark
carp
number of substitutions
platypus
chicken
cow
Time to common ancestor (millions of years)
32There is no universal molecular clock
- The initial proposal saw the clock as a Poisson
process with a constant rate - Now known to be more complex - differences in
rates occur for - different sites in a molecule
- different genes
- different regions of genomes
- different genomes in the same cell
- different taxonomic groups for the same gene
- There is no universal molecular clock
33Unequal rates in different lineages may cause
problems for phylogenetic analysis
- Felsenstein (1978) made a simple model phylogeny
including four taxa and a mixture of short and
long branches
TRUE TREE
WRONG TREE
p gt q
- All methods are susceptible to long branch
problems - Methods which do not assume that all sites change
at the same rate are generally better at
recovering the true tree
34Saturation in sequence data
- Saturation is due to multiple changes at the same
site subsequent to lineage splitting - Most data will contain some fast evolving sites
which are potentially saturated (e.g. in proteins
often position 3) - In severe cases the data becomes essentially
random and all information about relationships
can be lost
35Multiple changes at a single site - hidden changes
Seq 1 AGCGAG Seq 2 GCGGAC
Number of changes
Seq 1
Seq 2
36Exploring patterns in sequence data 2
- Do sequences manifest biased base compositions
(e.g thermophilic convergence) or biased codon
usage patterns which may obscure phylogenetic
signal
37A case study in phylogenetic analysisDeinococcus
and Thermus
- Deinococcus are radiation resistant bacteria
- Thermus are thermophilic bacteria
- BUT
- Both have the same very unusual cell wall based
upon ornithine - Both have the same menaquinones (Mk 9)
- Both have the same unusual polar lipids
- Congruence between these complex characters
supports a phylogenetic relationship between
Deinococcus and Thermus
38Congruence the agreement between estimates of
phylogeny (relationships) based on different
characters.
39 Guanine Cytosine in 16S rRNA genes from
mesophiles and thermophiles
GC all sites
variable sites
Thermophiles Thermotoga maritima Thermus
thermophilus Aquifex pyrophilus Mesophiles Deino
coccus radiodurans Bacillus subtilis
62 64 65 55 55
72 72 73 52 50
40Shared nucleotide or amino acid composition
biases can also cause problems for phylogenetic
analysis
Aquifex
Thermus
Aquifex (73)
Bacillus (50)
True tree
Wrong tree
16S rRNA
Bacillus
Thermus (72)
Deinococcus
Deinococcus (52 GC)
Aquifex
The correct tree can be obtained if a model is
used which allows base/aa composition to vary
between sequences -LogDet/Paralinear
Distances Heterogeneous Maximum Likelihood
Bacillus
Thermus
Deinococcus
41Gene trees and species trees
A
a
Species tree
Gene tree
B
b
D
c
We often assume that gene trees give us species
trees
42Orthologues and paralogues
paralogous
A
C
b
orthologous
orthologous
A
c
B
C
a
b
A mixture of orthologues and paralogues sampled
Duplication to give 2 copies paralogues on the
same genome
Ancestral gene
43Two homologous genes are orthologous if their
most recent common ancestor did not undergo a
gene duplication, otherwise they are called
paralogues (or paralogous genes)
44The malic enzyme gene tree contains a mixture of
orthologues and paralogues
Gene duplication
Anas a duck!
Plant chloroplast
Plant mitochondrion
45Summary
- There may be conflicting patterns in data which
can potentially mislead us about evolutionary
relationships - Our methods of analysis need to be able to deal
with the complexities of sequence evolution and
to recover any underlying phylogenetic signal - Some methods may do this better than others
depending on the properties of individual data
sets - All trees are simply hypotheses!
46Phylogenetic analysis requires careful thought
- Phylogenetic analysis is frequently treated as a
black box into which data are fed (often gathered
at considerable cost) and out of which The Tree
springs - (Hillis, Moritz Mable 1996, Molecular
Systematics)