Phylogenetic trees

About This Presentation

Title:

Phylogenetic trees

Description:

In a 'binary' tree, all nodes have degree 1 or 3, except the ... Evolutionary linguistics. How to estimate distances? T. Jukes and C. Cantor. Berkeley, 1969 ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 21

Provided by: ianho9

Category:

more less

Transcript and Presenter's Notes

Title: Phylogenetic trees

1
Phylogenetic trees

BioE131/231

HIV phylogeny. Simmonds et al (left) Yamamoto et
al (right)
2
Star vs hierarchical phylogenies
Hierarchical
Star
3
Rooted and unrooted trees
Degree of a node the number of neighbors of
that node. In a binary tree, all nodes have
degree 1 or 3, except the root which has degree
2. (i.e. each node has 0 or 2 children). A binary
tree with N leaf nodes has N-1 internal
nodes(c.f. table-tennis tournament...)
4
Rooted directed
Root node
Internal node
Clade, subtree
Leaf node,taxon (pl. taxa)
Synonyms phylogeny, tree, dendrogram, cladogram
5
Rooting via outgroups
6
Root can be ambiguous
No outgroup... Earliest studies placed LUCA on
eukarya-bacteria branch later studies suggested
bacteria-archaea
7
Ultrametric trees molecular clocks
Branch lengths are typically in units of average
number of substitutions per site.Thus, branch
lengths of gt1 have large estimation errors
Ultrametric
Non-ultrametric
(distance)
X
Height of node X
Q. Why are non-ultrametric trees necessary? A.
Mutation rate 1/(generation time) Also
correlated w/other physiological variables (e.g.
metabolic rate) Longitudinal data (e.g. serial
viral sequencing from same host) can also
generate non-ultrametric trees, since leaf nodes
are not contemporaneous
Wen-Hsiung Li, 1985 (2003 Balzan Prize)
8
Newick formata.k.a. New Hampshire format

Rooted tree topologies(A,B,(C,D))
Branch lengths(A.1,B.2,(C.3,D.4).5)
Internal node names(A.1,B.2,(C.3,D.4)E.5)F

9
Algorithms for phylogenetic reconstruction

Start with a multiple alignment
use substitutions to evaluate trees
indels informative, but harder to model
Parsimony
find the tree with the fewest substitutions
Likelihood
find the tree with the most likely
substitutions(transition/transversion bias, long
branches, ...)
sum probabilities over unseen ancestral states
enumerating all possible tree topologies is
sloooooow
Distance matrix
Start by computing all pairwise distances
Quick approximation to likelihood methods

10
UPGMA algorithm

Creates ultrametric trees
Basic idea
Two closest nodes must be siblings
Parent is equidistant between siblings
Distance from parent to any other node is average
of distances of siblings to those nodes

11
UPGMA algorithm

Input a distance matrix, Dij
Let N be the set of nodes to be joined
Let the height of node i be Hi
Initialize Hi0 for all the leaf nodes in N
While N contains gt1 node
Find i j, the two closest nodes in N
(i,j) argmini,j Dij
Create a new node, k, the parent of (i,j)
Set Hk .5 (Hi Hj Dij)
Branch length k?i is (Hk-Hi) and similarly for
k?j
For all nodes n in N (excluding i j)
Set Dkn .5 (Din Dkn)
Add k to N remove i j

N2 entries
N-1 steps
N2 steps
O(N3) timeIf we maintain argminj Dij for each j,
then it is O(N2) O(N2) memory
12
UPGMA in Perl

Questions
How to represent a tree?
For each node, need children/parents/both, name,
branch length to parent...
How to print a tree in Newick format?
Recursive (print a particular node)
Pre-order traversal (parents before children)
How to represent a distance matrix?
Can side-step some of these...

13
Identify nodes by name, not by number

Entry Dij of distance matrix is distanceiname
-gtjnamewhere iname is the name of node i

14
Accessing the distance matrix

Set of all nodes, N keys (distance)
Removing a node from the set delete
distanceiname

15
Construct the Newick representation on-the-fly

Siblings (i, j) (iname , jname)
Branch lengths Branch k?i has length
ki Branch k?j has length kj
Name of new node (k) (inameki,jnamekj)
Then, Newick-format tree is just the name of the
root node (plus a semicolon)

16
Other phylogeny algorithms

Neighbor-joining (e.g. neighbor program)
Parents not equidistant from siblings
Weighted neighbor-joining (e.g. weighbor
program)
Corrects for long-branch estimation error
Quartet-puzzling (e.g. tree-puzzle program)
Looks at sets of 4 nodes, instead of pairs
MCMC sampling (e.g. MrBayes program)
Stochastically explores tree space
Slow, but provides much more information(confiden
ce limits, etc.)

17
Long branch attraction

Arises because sequences on long branches share
chance similarities
Some methods (esp. parsimony) interpret this
incorrectly as relatedness
Solutions
add more taxa to break up the branches
use more realistic likelihood models

18
Confidence estimates

Bootstrap
Sample a random subset of alignment columns (with
replacement) and build a tree from those
Repeat a large number of times
Support for a branch
defined as of trees that include that branch
identify a branch by its partitioning of the taxa
MCMC is a more statistically rigorous way to get
confidence estimates for trees
because it samples directly from the posterior
distribution of trees

Phylogenetic trees - PowerPoint PPT Presentation

Phylogenetic trees

In a 'binary' tree, all nodes have degree 1 or 3, except the ... Evolutionary linguistics. How to estimate distances? T. Jukes and C. Cantor. Berkeley, 1969 ... – PowerPoint PPT presentation