Title: Introduction to Phylogenetic Trees
1Introduction to Phylogenetic Trees
- Jana Sperschneider
- Institute for Computer Science
- Albert-Ludwigs-Universität, Freiburg
2Contents
- Introduction
- Biological foundations
- Terminology
- Applications of phylogenetic trees
- Construction of phylogenetic trees
- Molecular phylogenetic analysis
- Advantages
- Procedure
- Distance Based Methods
- Example UPGMA algorithm
- Neighbor Joining
3Biological Foundations
- Evolution is driven by
- Inheritance
- Variation
- Mutations
- Recombination
- Selection
- All organisms share a common ancestry
4Terminology
- Phylogeny
- The evolutionary relationships among organisms,
based on a common ancestor - Phylogenetics
- Area of research concerned with finding the
genetic relationships between species - (Greek phylon race and genetic birth)
5Terminology
- Phylogenetic tree Visual representation of
evolutionary distances between species
6Terminology
Rooted tree Unrooted tree
7Applications of phylogenetic trees
- Evolution studies
- Systematic biology
- Medical research and epidemiology
- Ecology
- Others like grouping languages, medieval
manuscripts
8Contents
- Introduction
- Biological foundations
- Terminology
- Applications of phylogenetic trees
- Construction of phylogenetic trees
- Molecular phylogenetic analysis
- Advantages
- Procedure
- Distance Based Methods
- Example UPGMA algorithm
- Neighbor Joining
9Definitions
- Classic phylogenetic analysis uses morphological
features - Anatomy, size, number of legs, beak shape
- Modern phylogenetic analysis uses molecular
information - Genetic material (DNA and protein sequences)
Molecular phylogenetic analysis
10Advantages of molecular phylogenetic analysis
- Analogous features (share common function, but
NOT common ancestry) can be misleading - DNA sequences more simple to model, we only have
the four states A, C, G, T - DNA samples for sequence analysis easy to prepare
11Procedure Steps of a molecular phylogenetic
analysis
- Decide what sequences to examine
- Determine the evolutionary distances between the
sequences and build distance matrix - Phylogenetic tree construction
121. Decide what to examine
- Choose homologous sequences in different species
- Homologous sequences must, by definition, be
derived from a common ancestral sequence - Homology is not similarity
132. Determine the evolutionary distances and build
distance matrix
- For molecular data, evolutionary distances can be
the observed number of nucleotide differences
between the pairs of species. - Distance matrix simply a table showing the
evolutionary distances between all pairs of
sequences in the dataset
142. Determine the evolutionary distances and build
distance matrix - A simple example
- AGGCCATGAATTAAGAATAA
- AGCCCATGGATAAAGAGTAA
- AGGACATGAATTAAGAATAA
- AAGCCAAGAATTACGAATAA
- Distance Matrix
- In this example the evolutionary
distance is expressed as the number of
nucleotide differences for each sequence
pair. For example, sequences 1 and 2 are
20 nucleotides in length and have four
differences, corresponding to an
evolutionary difference of 4/20 0.2.
153. Phylogenetic Tree Construction example (UPGMA
algorithm)
UPMGA (Michener Sokal 1957)
Bear Raccoon 0.13
0.13
- 1. Pick smallest entry Dij
- 2. Join the two intersecting species and assign
branch lengths Dij/2 to each of the nodes -
163. Phylogenetic Tree Construction example (UPGMA
algorithm)
Bear Raccoon 0.13
0.13
3. Compute new distances to the other species
using arithmetic means
173. Phylogenetic Tree Construction example (UPGMA
algorithm)
Bear Raccoon Seal
0.13 0.1825 0.1825
1. Pick smallest entry Dij 2. Join the two
intersecting species and assign branch lengths
Dij/2 to each of the nodes
183. Phylogenetic Tree Construction example (UPGMA
algorithm)
Bear Raccoon Seal
0.13 0.1825 0.1825
- Compute new distances to the other species using
arithmetic means
193. Phylogenetic Tree Construction example (UPGMA
algorithm)
Bear Raccoon Seal Weasel
0.13 0.1825 0.2
0.2
- Pick smallest entry Dij.
- Join the two intersecting species and assign
branch lengths Dij/2 to each of the nodes. - Done!
-
20Weakness of UPGMA
- UPGMA assumes a constant molecular clock (i.e.
accumulate mutations at the same rate) - All leaves in the same level
- Only constructs rooted trees
21Neighbor Joining (Saitou and Nei, 1987)
- Most widely used distance-based method
- UPGMA showed that it is not enough to just pick
closest neighbors - Neighbor Joining
- Principle Join nodes that are close to each
other and far from everything else - Produces an unrooted tree
22In the next talk
- How to construct a tree from incomplete distance
matrices - Maximum-likelihood methods
- Under a model of sequence evolution, find the
tree which gives the highest likelihood of
observed data - Parsimony methods
- Choose tree that minimizes number of changes
required to explain data
23References
- Mona Singh, Phylogenetics,
- http//www.cs.princeton.edu/mona/Lecture/phyloge
ny-slides.pdf - T. A. Brown, Genomes, http//www.ncbi.nlm.nih.gov/
books/bv.fcgi?ridgenomes - Joe Felsenstein, Phylogeny methods,
http//evolution.gs.washington.edu/gs541/2005/lect
ure26.pdf - Microbial diversity lecture, http//www.mbio.ncsu.
edu/JWB/MB409/home.htm - Michener, C.D. and Sokal, R.R. (1957) A
quantitative approach to a problem in
classification. Evolution, 11, 130-162.
24Thanks for listening!