Molecular Systematics - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Molecular Systematics

Description:

The time will come I believe, though I shall not live to see it, when we shall ... carp. shark. There is no universal molecular clock ... – PowerPoint PPT presentation

Number of Views:1132
Avg rating:3.0/5.0
Slides: 47
Provided by: marti283
Category:

less

Transcript and Presenter's Notes

Title: Molecular Systematics


1
Molecular Systematics Evolution
2
Darwins letter to Thomas Huxley 1857
  • The time will come I believe, though I shall not
    live to see it, when we shall have fairly true
    genealogical (phylogenetic) trees of each great
    kingdom of nature

Haeckels pedigree of man
3
Aims of the course
  • To introduce the theory and practice of
    phylogenetic inference from molecular data
  • To introduce some of the most useful methods and
    computer programmes

4
How construct a phylogeny
  • What kind of data?

5
Richard Owen
6
(No Transcript)
7
Owens definition of homology
  • Homologue the same organ under every variety of
    form and function (true or essential
    correspondence)
  • Analogy superficial or misleading similarity
  • Richard Owen 1843

8
(No Transcript)
9
Charles Darwin
10
Darwin and homology
  • The natural system is based upon descent with
    modification .. the characters that naturalists
    consider as showing true affinity (i.e.
    homologies) are those which have been inherited
    from a common parent, and, in so far as all true
    classification is genealogical that community of
    descent is the common bond that naturalists have
    been seeking
  • Charles Darwin, Origin of species 1859 p. 413

11
Homology is...
  • Homology similarity that is the result of
    inheritance from a common ancestor - the
    identification and analysis of homologies is
    central to phylogenetic systematics

12
Phylogenetic systematics
  • Sees homology as evidence of common ancestry
  • Uses tree diagrams to portray relationships based
    upon recency of common ancestry
  • Monophyletic groups (clades) - contain species
    which are more closely related to each other than
    to any outside of the group

13
Cladograms and phylograms
Bacterium 1
Cladograms show branching order - branch lengths
are meaningless
Bacterium 2
Bacterium 3
Eukaryote 1
Eukaryote 2
Eukaryote 3
Eukaryote 4
Phylograms show branch order and branch lengths
Bacterium 1
Bacterium 2
Bacterium 3
Eukaryote 1
Eukaryote 2
Eukaryote 3
Eukaryote 4
14
Rooting using an outgroup
archaea
archaea
Unrooted tree
archaea
Rooted by outgroup
bacteria outgroup
archaea
Monophyletic group
archaea
archaea
eukaryote
Monophyletic group
eukaryote
root
eukaryote
eukaryote
15
Fossil skulls
16
From left to right the primate species depicted
on the tree are a lemur (Lemur catta), an
adapid (Hoanghonius stehlini), a tarsier (Tarsius
bancanus), an omomyid (Shoshonius cooperi), a
proto-monkey (Eosimias centennicus), a South
American monkey (Saimiri sciureus), an Old World
monkey (Mandrillus sphinx), a great ape (Gorilla
gorilla), and a human (Homo sapiens)
17
Microbial morphologies - some are complex but
many are simple - for example look at a drop of
lake water
18
Linus Pauling
19
Molecules as documents of evolutionary history
  • We may ask the question where in the now living
    systems the greatest amount of information of
    their past history has survived and how it can be
    extracted
  • Best fit are the different types of
    macromolecules (sequences) which carry the
    genetic information

20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
Exploring patterns in sequence data 1
  • Which sequences should we use?
  • Do the sequences contain phylogenetic signal for
    the relationships of interest? (might be too
    conserved or too variable)
  • Are there features of the data which might
    mislead us about evolutionary relationships?

25
The importance of 16S rRNA for those interested
in the early evolution of life lies in its slow
evolution, which allows the historical record
preserved in its gene sequences to be kept
relatively intact.
26
Small subunit ribosomal RNA
18S or 16S rRNA
27
An alignment involves hypotheses of positional
homology between bases or amino acids
Alignment of 16S rRNA sequences from different
bacteria
28
But not all proteins have the same rate of
change.. Rates of amino acid replacement differ
in different proteins for example
29
Case Study Florida Dentist case 1990
  • Each Branch represents the sequence from part of
    the env gene of HIV-1
  • Viral sequences were obtained from the dentist
    and seven of his former patients(A-G),also
    infected with HIV
  • A,B,C,E and G have sequences very closely related
    to those of the dentist suggesting that he
    infected them.
  • D and F, had other risk factors for infection,
    their viruses are separated from the dentist by
    sequences taken from controls.
  • Because HIV is so variable, 2 different sequences
    are included for dentist and patient A.

30
Is there a molecular clock?
  • The idea of a molecular clock was initially
    suggested by Zuckerkandl and Pauling in 1962
  • They noted that rates of amino acid replacements
    in animal haemoglobins were roughly proportional
    to time - as judged against the fossil record

31
The molecular clock for alpha-globinEach point
represents the number of substitutions separating
each animal from humans
shark
carp
number of substitutions
platypus
chicken
cow
Time to common ancestor (millions of years)
32
There is no universal molecular clock
  • The initial proposal saw the clock as a Poisson
    process with a constant rate
  • Now known to be more complex - differences in
    rates occur for
  • different sites in a molecule
  • different genes
  • different regions of genomes
  • different genomes in the same cell
  • different taxonomic groups for the same gene
  • There is no universal molecular clock

33
Unequal rates in different lineages may cause
problems for phylogenetic analysis
  • Felsenstein (1978) made a simple model phylogeny
    including four taxa and a mixture of short and
    long branches

TRUE TREE
WRONG TREE
p gt q
  • All methods are susceptible to long branch
    problems
  • Methods which do not assume that all sites change
    at the same rate are generally better at
    recovering the true tree

34
Saturation in sequence data
  • Saturation is due to multiple changes at the same
    site subsequent to lineage splitting
  • Most data will contain some fast evolving sites
    which are potentially saturated (e.g. in proteins
    often position 3)
  • In severe cases the data becomes essentially
    random and all information about relationships
    can be lost

35
Multiple changes at a single site - hidden changes
Seq 1 AGCGAG Seq 2 GCGGAC
Number of changes
Seq 1

Seq 2
36
Exploring patterns in sequence data 2
  • Do sequences manifest biased base compositions
    (e.g thermophilic convergence) or biased codon
    usage patterns which may obscure phylogenetic
    signal

37
A case study in phylogenetic analysisDeinococcus
and Thermus
  • Deinococcus are radiation resistant bacteria
  • Thermus are thermophilic bacteria
  • BUT
  • Both have the same very unusual cell wall based
    upon ornithine
  • Both have the same menaquinones (Mk 9)
  • Both have the same unusual polar lipids
  • Congruence between these complex characters
    supports a phylogenetic relationship between
    Deinococcus and Thermus

38
Congruence the agreement between estimates of
phylogeny (relationships) based on different
characters.
39
Guanine Cytosine in 16S rRNA genes from
mesophiles and thermophiles
GC all sites
variable sites
Thermophiles Thermotoga maritima Thermus
thermophilus Aquifex pyrophilus Mesophiles Deino
coccus radiodurans Bacillus subtilis
62 64 65 55 55
72 72 73 52 50
40
Shared nucleotide or amino acid composition
biases can also cause problems for phylogenetic
analysis
Aquifex
Thermus
Aquifex (73)
Bacillus (50)
True tree
Wrong tree
16S rRNA
Bacillus
Thermus (72)
Deinococcus
Deinococcus (52 GC)
Aquifex
The correct tree can be obtained if a model is
used which allows base/aa composition to vary
between sequences -LogDet/Paralinear
Distances Heterogeneous Maximum Likelihood
Bacillus
Thermus
Deinococcus
41
Gene trees and species trees
A
a
Species tree
Gene tree
B
b
D
c
We often assume that gene trees give us species
trees
42
Orthologues and paralogues
paralogous
A
C
b
orthologous
orthologous
A
c
B
C
a
b
A mixture of orthologues and paralogues sampled
Duplication to give 2 copies paralogues on the
same genome
Ancestral gene
43
Two homologous genes are orthologous if their
most recent common ancestor did not undergo a
gene duplication, otherwise they are called
paralogues (or paralogous genes)
44
The malic enzyme gene tree contains a mixture of
orthologues and paralogues
Gene duplication
Anas a duck!
Plant chloroplast
Plant mitochondrion
45
Summary
  • There may be conflicting patterns in data which
    can potentially mislead us about evolutionary
    relationships
  • Our methods of analysis need to be able to deal
    with the complexities of sequence evolution and
    to recover any underlying phylogenetic signal
  • Some methods may do this better than others
    depending on the properties of individual data
    sets
  • All trees are simply hypotheses!

46
Phylogenetic analysis requires careful thought
  • Phylogenetic analysis is frequently treated as a
    black box into which data are fed (often gathered
    at considerable cost) and out of which The Tree
    springs
  • (Hillis, Moritz Mable 1996, Molecular
    Systematics)
Write a Comment
User Comments (0)
About PowerShow.com