Title: The Genome Access Course Phylogenetic Analysis
1TheGenomeAccessCoursePhylogenetic Analysis
2The Great Chainof Being
From Didacus Valades, Rhetorica Christiana (1579)
3Phylogenetics
- Developed by Willi Henning (Grundzüge einer
Theorie der Phylogenetischen Systematik, 1950
Phylogenetic Systematics, 1966)
4What is the ancestral sequence?
- pfeffer
- pepper
- (pf/p)e(ff/pp)er
5What is the ancestral sequence?
- five
- quinque
- pento
- funf
- panj
- viisi
- pompe
- queig
- pindzé
penkwe
6What is the ancestral sequence?
- HGLCSAIP
- HGICSGIP
- HG(L/I)CS(A/G)IP
7Evolutionary Trees
- A tree is a connected, acyclic 2D graph
- Leaf Taxon
- Node Vertex
- Branch Edge
- Tree length sum of all branch lengths
- Phylogenetic trees are binary trees
8A Generic Tree
9Evolutionary Trees
- Rooted
- common ancestor
- unique path to any leaf
- directed
- Unrooted
- root could be placed anywhere
- fewer possible than rooted
10Rooted Tree
generated by DRAWGRAM (PHYLIP)
11Unrooted Tree
generated by DRAWTREE (PHYLIP)
12Possible Evolutionary Trees
13Paralogs Orthologs
1A
2A
3A
1B
2B
3B
14Genes vs. Species
- Sequences show gene relationships, but
phylogenetic histories may be different for gene
and species - Genes evolve at different speeds
- Horizontal gene transfer
15(No Transcript)
16(No Transcript)
17Methods for Phylogenetic Analysis
- Character-State
- Maximum Parsimony
- Maximum Likelihood
- Genetic Distance
- Fitch Margoliash
- Neighbor-Joining
- Unweighted Pair Group
18Phylogenetic Software
- PHYLIP
- PAUP (Available in GCG)
- TREE-PUZZLE
- PhyloBLAST
- Felsenstein maintains an extensive list of
programs on the PHYLIP site
19PHYLIP Programs
- dnapars/protpars
- dnadist/protdist
- dnaml (use fastDNAml instead)
- neighbor
- fitch/kitsch
- drawtree/drawgram
20Maximum Parsimony
- Most common method
- Allows use of all evolutionary information
- Build and score all possible trees
- Each node is a transformation in a character
state - Minimize treelength
- Best tree requires the fewest changes to derive
all sequences
21Which is the more parsimonious tree?
9 Node Crossings
8 Node Crossings
22Maximum Likelihood
- Reconstruction using an explicit evolutionary
model - Tree is calculated separately for each nucleotide
site. The product of the likelihoods for each
site provides the overall likelihood of the
observed data. - Demanding computationally
- Slowest method
- Use to test (or improve) an existing tree
23Clustering Algorithms
- Use distances to calculate phylogenetic trees
- Trees are based on the relative numbers of
similarities and differences between sequences - A distance matrix is constructed by computing
pairwise distances for all sequences - Clustering links successively more distant taxa
24DNA Distances
- Distances between pairs of DNA sequences are
relatively simple to compute as the sum of all
base pair differences between the two sequences - Can only work for pairs of sequences that are
similar enough to be aligned - All base changes are considered equal
- Insertion/deletions are generally given a larger
weight than replacements (gap penalties). - Possible to correct for multiple substitutions at
a single site, which is common in distant
relationships and for rapidly evolving sites.
25Amino Acid Distances
- More difficult to compute
- Substitutions have differing effects on structure
- Some substitutions require more than one DNA
mutation - Use replacement frequencies (PAM, BLOSUM)
26Fitch Margoliash
- 3 sequences are combined at a time to define
branches and calculate their length - Additive branch lengths
- Accurate for short branches
27Neighbor Joining
- Most common method of tree construction
- Distance matrix adjusted for each taxon depending
on its rate of evolution - Good for simulation studies
- Most efficient computationally
28UPGMA Unweighted Pair Group Methods Using
Arithmetic Averages
- Simplest method
- Calculates branch lengths between most closely
related sequences - Averages distance to next sequence or cluster
- Predicts a position for the root
29Phylogenetic Complications
- Errors
- Loss of function
- Convergent evolution
- Lateral gene transfer
30Validation
- Use several different algorithms and data sets
- NJ methods generate one tree, possibly supporting
a tree built by parsimony or maximum likelihood - Bootstrapping
- Perturb data and note effect on tree
- Repeat many times
- Unchanged 90, trees correctness is supported
31Are there bugs in our genome?
N-acetylneuraminate lyase
32The End