Title: CS 177 Phylogenetics I
1CS 177 Phylogenetics I
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Model of sequence evolution
Phylogenetic trees and networks Cladistic and
phenetic methods Computer software and demos
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
2Phylogenetic Inference I
Recommended readings
A science primer Phylogenetics http//www.ncbi.nl
m.nih.gov/About/primer/phylo.html Brown, S.M.
(2000) Bioinformatics, Eaton Publishing, pp.
145-160 Brown, S.M. Molecular
Phylogenetics www.med.nyu.edu/rcr/rcr/course/PPT/p
hylogen.ppt Hillis, D.M. Moritz, G. Mable,
B.K. (1996) Molecular Systematics, 2.
Edition, Sinauer Associates, 655 pp. Mount,
D.W. (2001) Bioinformatics,Cold Spring Harbor
Lab Press, pp.237-280
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
3CS 177 Phylogenetic Inference I
Evolution
The theory of evolution is the foundation upon
which all of modern biology is built
From anatomy to behavior to genomics, the
scientific method requires an appreciation of
changes in organisms over time It is
impossible to evaluate relationships among gene
sequences without taking into consideration the
way these sequences have been modified over time
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
Ernst Haeckel (1834-1919)
4CS 177 Phylogenetic Inference I
Relationships
Similarity searches and multiple alignments of
sequences naturally lead to the question How
are these sequences related? and more
generally How are the organisms from which
these sequences come related?
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
5Classifying Organisms
Nomenclature is the science of naming organisms
Evolution has created an enormous diversity, so
how do we deal with it? Names allow us to talk
about groups of organisms. - Scientific
names were originally descriptive phrases not
practical - Binomial nomenclature
gt Developed by Linnaeus, a Swedish naturalist
gt Names are in Latin, formerly the
language of science gt
binomials - names consisting of two parts
gt The generic name is a noun. gt The
epithet is a descriptive adjective. - Thus
a species' name is two words e.g. Homo
sapiens
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
Carolus Linnaeus (1707-1778)
6Classifying Organisms
Taxonomy is the science of the classification of
organisms Taxonomy deals with the naming and
ordering of taxa. The Linnaean hierarchy
1. Kingdom 2. Division 3. Class
4. Order 5. Family 6. Genus
7. Species
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
Evolutionary distance
7Classifying Organisms
Systematics is the science of the relationships
of organisms Systematics is the science of how
organisms are related and the evidence for those
relationships Systematics is divided primarily
into phylogenetics and taxonomy Speciation --
the origin of new species from previously
existing ones - anagenesis - one species
changes into another over time -
cladogenesis - one species splits to make two
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
Reconstruct evolutionary history
Phylogeny
8Phylogenetics
Phylogenetics is the science of the pattern of
evolution. A. Evolutionary biology is the
study of the processes that generate diversity,
while phylogenetics is the study of the pattern
of diversity produced by those processes.
B. The central problem of phylogenetics
1. How do we determine the relationships between
species? 2. Use evidence from shared
characteristics, not differences 3. Use
homologies, not analogies 4. Use derived
condition, not ancestral a.
synapomorphy - shared derived characteristic
b. plesiomorphy - ancestral
characteristic C. Cladistics is phylogenetics
based on synapomorphies. 1. Cladistic
classification creates and names taxa based only
on synapomorphies. 2. This is the
principle of monophyly 3. monophyletic,
paraphyletic, polyphyletic 4. Cladistics
is now the preferred approach to phylogeny
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
The phylogeny and classification of life as
proposed by Haeckel (1866)
9Phylogenetics
Evolutionary theory states that groups of
similar organisms are descendedfrom a common
ancestor. Phylogenetic systematics is a method
of taxonomic classification basedon their
evolutionary history.
It was developed by Hennig, a German
entomologist, in 1950.
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
Willi Hennig (1913-1976)
10Phylogenetics
Phylogenetics is the science of the pattern of
evolution Evolutionary biology versus
phylogenetics - Evolutionary biology is the
study of the processes that generate diversity
- Phylogenetics is the study of the pattern of
diversity produced by those processes
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
11Phylogenetics
Who uses phylogenetics? Some examples Evolutiona
ry biologists (e.g. reconstructing tree of
life) Systematists (e.g. classification of
groups) Anthropologists (e.g. origin of human
populations) Forensics (e.g. transmission of HIV
virus to a rape victim) Parasitologists (e.g.
phylogeny of parasites, co-evolution) Epidemiolog
ists (e.g. reconstruction of disease
transmission) Genomics/Proteomics (e.g. homology
comparison of new proteins)
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
12Phylogenetic trees
The central problem of phylogenetics how do we
determine the relationships between taxa?
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
in phylogenetic studies, the most convenient way
of presenting evolutionary relationships among a
group of organisms is the phylogenetic tree
13Phylogenetics
Phylogenetics is the science of the pattern of
evolution. A. Evolutionary biology is the
study of the processes that generate diversity,
while phylogenetics is the study of the pattern
of diversity produced by those processes.
B. The central problem of phylogenetics
1. How do we determine the relationships between
species? 2. Use evidence from shared
characteristics, not differences 3. Use
homologies, not analogies 4. Use derived
condition, not ancestral a.
synapomorphy - shared derived characteristic
b. plesiomorphy - ancestral
characteristic C. Cladistics is phylogenetics
based on synapomorphies. 1. Cladistic
classification creates and names taxa based only
on synapomorphies. 2. This is the
principle of monophyly 3. monophyletic,
paraphyletic, polyphyletic 4. Cladistics
is now the preferred approach to phylogeny
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
14Phylogenetic trees
Node a branchpoint in a tree (a presumed
ancestral OTU) Branch defines the relationship
between the taxa in terms of descent and
ancestry Topology the branching patterns of the
tree Branch length (scaled trees only)
represents the number of changes that have
occurred in the branch Root the common ancestor
of all taxa Clade a group of two or more taxa or
DNA sequences that includes both their common
ancestor and all their descendents Operational
Taxonomic Unit (OTU) taxonomic level of sampling
selected by the user to be used in a study, such
as individuals, populations, species, genera, or
bacterial strains
Branch
Node
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
Clade
Root
15Phylogenetic trees
There are many ways of drawing a tree
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
16Phylogenetic trees
There are many ways of drawing a tree
E
C
D
B
A
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
17Phylogenetic trees
There are many ways of drawing a tree
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
no meaning
18Phylogenetic trees
There are many ways of drawing a tree
Bifurcation
Trifurcation
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
Bifurcation versus Multifurcation (e.g.
Trifurcation) Multifurcation (also called
polytomy) a node in a tree that connects more
than three branches. A multifurcation may
represent a lack of resolution because of too few
data available for inferring the phylogeny (in
which case it is said to be a soft
multifurcation) or it may represent the
hypothesized simultaneous splitting of several
lineages (in which case it is said to be a hard
multifurcation).
19Phylogenetic trees
Trees can be scaled or unscaled (with or without
branch lengths)
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
20Phylogenetic trees
Trees can be unrooted or rooted
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
21Phylogenetic trees
Trees can be unrooted or rooted
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
These trees show five different evolutionary
relationships among the taxa!
22Phylogenetic trees
Possible evolutionary trees
2
4
3
Taxa (n)
Taxa (n)
Unrooted/rooted
2 1/1 3 1/3 4 3/15
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
23Phylogenetic trees
Possible evolutionary trees
Taxa (n) rooted (2n-3)!/(2n-2(n-2)!) unrooted (2n-5)!/(2n-3(n-3)!)
2 1 1
3 3 1
4 15 3
5 105 15
6 954 105
7 10,395 954
8 135,135 10,395
9 2,027,025 135,135
10 34,459,425 2,027,025
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
24Phylogenetic trees
How to root?
In most cases not available
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
25Phylogenetic trees
How to root?
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
This must involve assumptions BEWARE!
26Phylogenetic trees
How to root?
Using outgroups
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
27Phylogenetic trees
Exercise rooted/unrooted scaled/unscaled
A
B
C
Taxonomy and phylogenetics Phylogenetic
trees Cladistic versus phenetic
analyses Homology and homoplasy
F
E
D
28Phylogenetics
- What are useful characters?
- Use homologies, not analogies!
- Homology common ancestry of two or more
character states - Analogy similarity of character states not due
to shared ancestry - - Homoplasy a collection of phenomena that leads
to similarities in character states for reasons
other than inheritance from a common ancestor
(e.g. convergence, parallelism, reversal) -
- Homoplasy is huge problemin morphology data
sets! - But in molecular data sets, too!
Taxonomy and phylogenetics Phylogenetic
trees Homology and homoplasy Cladistic versus
phenetic analyses
Euphorbiaceae(euphorb spines are modified
shoots)
Cactaceae (cactus spines are modified leaves)
29Phylogenetics
Molecular data and homoplasy
gene sequences represent character
data characters are positions in the sequence
(not all workers agree some say one gene is one
character) character states are the nucleotides
in the sequence (or amino acids in the case of
proteins)
Taxonomy and phylogenetics Phylogenetic
trees Homology and homoplasy Cladistic versus
phenetic analyses
Problems the probability that two nucleotides
are the same just by chance mutation is 25 what
to do with insertions or deletions (which may
themselves be characters) homoplasy in sequences
may cause alignment errors
30Phylogenetics
Molecular data and homoplasy Orthologs vs.
Paralogs
When comparing gene sequences, it is important to
distinguish between identical vs. merely similar
genes in different organisms Orthologs are
homologous genes in different species with
analogous functions Paralogs are similar genes
that are the result of a gene duplication A
phylogeny that includes both orthologs and
paralogs is likely to be incorrect Sometimes
phylogenetic analysis is the best way to
determine if a new gene is an ortholog or paralog
to other known genes
Taxonomy and phylogenetics Phylogenetic
trees Homology and homoplasy Cladistic versus
phenetic analyses
31Phylogenetics
What are useful characters?
- Use derived condition, not ancestral
-
- Synapomorphy (shared derived character)
homologous traits share the same character
state because it originated in their immediate
common ancestor - Plesiomorphy (shared ancestral character)
homologous traits share the same character
state because they are inherited from a common
distant ancestor
Taxonomy and phylogenetics Phylogenetic
trees Homology and homoplasy Cladistic versus
phenetic analyses
32Phenetics versus cladistics
Within the field of taxonomy there are two
different methods and philosophies of building
phylogenetic trees cladistic and phenetic
Phenetic methods construct trees (phenograms) by
considering the current states of characters
without regard to the evolutionary history that
brought the species to their current
phenotypesphenograms are based on overall
similarity Cladistic methods construct trees
(cladograms) rely on assumptions about ancestral
relationships as well as on current
datacladograms are based on character evolution
(e.g. shared derived characters)
Cladistics is becoming the method of choice it
is considered to be more powerfuland to provide
more realistic estimates, however, it is slower
than phenetic algorithms
33Phenetics vs. cladistics
An example
34Phenetics vs. cladistics
Phenetic (overall similarity)
A
B
C
35Phenetics vs. cladistics
Cladistics (character evolution e.g. shared
derived characters)
A
B
C
36Model of sequence evolution
The problem - A basic process in the evolution
of a sequence is change in that sequence over
time - Now we are interested in a mathematical
model to describe that - It is essential to have
such a model to understand the mechanisms of
change and is required to estimate both the
rate of evolution and the evolutionary history of
sequences
37Model of sequence evolution
Nucleotide
38Models of sequence evolution
Examples
Jukes-Cantor model (1969)
All substitutions have an equal probability and
base frequencies are equal
39Models of sequence evolution
Examples
Felsenstein (1981)
All substitutions have an equal probability, but
there are unequal base frequencies
40Models of sequence evolution
Examples
Kimura 2 parameter model (K2P) (1980)
Transitions and transversions have different
probabilities
41Models of sequence evolution
Examples
Hasegawa, Kishino Yano (HKY) (1985)
Transitions and transversions have different
probabilities,base frequencies are unequal
42Models of sequence evolution
Examples
General time reversible model (GTR)
Different probabilities for each
substitution,base frequencies are unequal
43Models of sequence evolution
a
A
G
Jukes-Cantor
b
b
b
b
C
T
a
K2P
Felsenstein
HKY
GTR
44More models of sequence evolution
- Currently, there are more than 60 models
described - plus gamma distribution and invariable sites
- accuracy of models rapidly decreases for highly
divergent sequences - problem more complicated models tend to be less
accurate (and slower)
- How to pick an appropriate model?
- use a maximum likelihood ratio test
- - implemented in Modeltest 3.06 (Posada
Crandall, 1998)