Phylogenetic Tree Reconstruction - PowerPoint PPT Presentation

1 / 53

About This Presentation

Title:

Phylogenetic Tree Reconstruction

Description:

Mutation probabilities: P(a|b, t) Models for evolutionary mutations. Jukes Cantor. Kimura model ... If the probabilistic model is correct, the ML distances ... – PowerPoint PPT presentation

Number of Views:181

Avg rating:3.0/5.0

Slides: 54

Provided by: stat57

Category:

more less

Transcript and Presenter's Notes

Title: Phylogenetic Tree Reconstruction

1
Phylogenetic Tree Reconstruction
2
Phylogenetic Tree

A tree that represents the relationship of a set
of species or genetic sequences is called a
phylogenetic tree.

3
Phylogenetic tree

Phylogeny the relationship of species
Leaves species or sequences (OTUs operational
taxonomic units)
Internal nodes ancestors of particular groups of
the OTUs.
Branch Length the degree of relatedness between
the species or sequences corresponding to the
nodes at the endpoints of the branch.

4
Orthologues / Paralogues

Genes which diverged because of speciation are
called orthologues. Above a tree of orthologues
based on a set of alpha haemoglobins.
Genes which diverged by gene duplication are
called paralogues. Below a tree of paralogues,
the alpha, beta, gamma, delta, epsilon, zeta and
theta chains of human haemoglobins, and
myoglobin.

5
Rooted / Unrooted Tree
6
Counting Trees
7
Counting Trees
8
Rrooting the tree
To root a tree mentally, imagine that the tree is
made of string. Grab the string at the root
and tug on it until the ends of the string (the
taxa) fall opposite the root
9
Steps of phylogenetic tree reconstruction

Choosing a family of homologous sequences
Aligning the sequences and obtaining a reduced
multiple alignment by discarding the columns that
contain gaps.
Inferring a phylogenetic tree from the reduced
multiple alignment.

10
Methods of Phylogenetic tree reconstruction

Maximum parsimony methods
Distance methods
Probabilistic methods arising from the maximum
likelihood approach

11
Maximum Parsimony Method

Site
OTU 1 2 3 4 5 6 7 8 9
-----------------------
1 T C A G A T C A A
2 T T A G A A C A A
3 T T C G A T C G A
4 T T C T A A G G A

Target find rooted tree topologies, not branch
lengths.
Principle search for tree that requires the
smallest number of character state changes
between the OTUs (the sequence always evolve in
the most economic way.)
Operation at least two different kinds of
residues at the site, each of which is found in
at least two of the OUT sequences

12
Maximum Parsimony
13
Maximum Parsimony
14
Traditional Parsimony

If N is moderate, Fitch's algorithm is realistic.
If N is large, the branch and bound algorithm
should be used.

15
Distance Methods

Reconstruct trees (rooted or unrooted, depending
on the method) from a set of pairwise distances,
d (d_ij), between the sequences in a fixed
reduced multiple alignment.
Given a tree relating the OTUs, obtain a
tree-generated distance matrix d'.
Question is dd'? Or, is the distance function
additive?

16
Additivity

Theorem (Four point condition) Let d be a
distance function on a M (a set of N OUTs) and
Ngt4. Then d is additive if and only if the
following condition holds for every set of four
distinct numbers 1 lti,j,k,lltN two of the sums
d_ijd_kl, d_ikdjl, d_ild_jk
coincide and are greater than or equal to the
third one. (Saitou and Nei, 1987, Mol. Biol.
Evol.)

17
Additivity
18
Distance function

distance score counted as
number of mismatched positions in the alignment
number of sequence positions that must be changed
to generate the second sequence
Success depends on degree the distances among a
set of sequences can be made additive on a
predicted evolutionary tree

19
Example of Distance Analysis

Distances can be shown as a table
A ACGCGTTGGGCGATGGCAAC
B ACGCGTTGGGCGACGGTAAT
C ACGCATTGAATGATGATAAT
D ACACATTGAGTGATAATAAT

20
Neighbour joining

Very popular method
Produces unrooted tree
Assumes additivity distance between pairs of
leaves sum of lengths of edges connecting them
that is, dd'.
Constructs tree by sequentially joining subtrees

21
Neighbor Joining Once we know the correct (i,j)
pair
22
Neighbour Joining
23
Neighbor joining algorithm
24
Neighbour Joining why not pick the smallest
(i,j) pair?
25
Example of Distance Analysis

Using this information, a tree can be drawn
A ACGCGTTGGGCGATGGCAAC
B ACGCGTTGGGCGACGGTAAT
C ACGCATTGAATGATGATAAT
D ACACATTGAGTGATAATAAT

26
Drawbacks of neighbor joining

In practice, the distance function is often a
pseudodistance function, which does not satisfy
the four-point condition. (The triangle
inequality is hard to satisfy either)
The algorithm may produce more than one tree,
these trees may have branches of negative
lengths, the matrix d' may not coincide with the
original distance matrix d.

27
Special distance function Ultrameric distance

Definition A distance function d on a set M of
OTUs is called ultrameric, if for any three
distinct elements x_i, x_j, x_k, two of the
distances d_ij, d_ik, d_jk concide and are
greater than or equal to the third.
It satisfies the four-point condition.
It is additive, and can be recovered by the
generated phylogenetic tree.

28
UPUPGMA -- Unweighted Pair Group Method with
Arithmetic meanGMA(sequential clustering method)
29
UPGMA distance function between two clusters
30
UPGMA
31
UPGMA Step 1combine B and C
32
UPGMA step 2combine BC and D
(1012)/2
(46)/2
33
UPGMA step 3combine A and E
34
UPGMA step 4combine AE and BCD
35
UPGMA Result
36
(No Transcript)
37
When UPGMA fails
38
Maximum Likelihood method

Assumption Maximum likelihood supposes a model
of evolution along tree branches.
Strategy Find parameters (tree, branch lengths,
substitution rate) that maximizes the likelihood
assigned to the data.
Note Model of evolution does not include
insertion and deletion of the nucleotides.
In Phylip package program PROTML

39
Probabilistic Methods

The phylogenetic tree represents a generative
probabilistic model (like HMMs) for the observed
sequences.
Background probabilities q(a)
Mutation probabilities P(ab, t)
Models for evolutionary mutations
Jukes Cantor
Kimura model
Felsenstein model
Hasegawa-Kishino-Yano model

40
Jukes Cantor model

A model for mutation rates

Mutation occurs at a constant rate
Each nucleotide is equally likely to mutate into
any other nucleotide with rate alpha.

41
Kimura 2-parameter model

Allows a different rate for transitions and
transversions.

42
Mutation Probabilities

The rate matrix R is used to derive the mutation
probability matrix S
S is obtained by integration. For Jukes Cantor
q can be obtained by setting t to infinity

43
Mutation Probabilities

All models satisfy the following properties
Markovian property
Reversibility
Exist stationary probabilities Pa s.t.

44
Probabilistic Approach

Given P,q, the tree topology and branch lengths,
we can compute

45
Computing the Tree Likelihood
46
Tree Likelihood Computation

Define P(Lka) prob. of leaves below node k
given that xka
Init for leaves P(Lka)1 if xka 0 otherwise
Iteration if k is node with children i and j,
then
TerminationLikelihood is

47
Maximum Likelihood (ML)

Score each tree by
Assumption of independent positions
Branch lengths t can be optimized
Gradient ascent
EM
We look for the highest scoring tree
Exhaustive
Sampling methods (Metropolis)

48
Optimal Tree Search

Perform search over possible topologies

49
Computational Problem

Such procedures are computationally expensive!
Computation of optimal parameters, per candidate,
requires non-trivial optimization step.
Spend non-negligible computation on a candidate,
even if it is a low scoring one.
In practice, such learning procedures can only
consider small sets of candidate structures

50
Max Likelihood versus Parsimony

(Example from BSA p. 225)
Choose tree T, with unequal branch lengths.
Generate 1000 sequences of length N according to
probabilistic model
(A) Reconstruction by ML (B)
Reconstruction by Parsimony

51
Max Likelihood versus NJ

(Example from BSA p. 225)
Choose tree T, with unequal branch lengths.
Generate 1000 sequences of length N according to
probabilistic model
(A) Reconstruction by ML (B)
Reconstruction by NJ

Conclusion ML infers right tree as N gets
largerl. If the probabilistic model is correct,
the ML distances shall be very close to additive,
therefore the NJ method predicts the correct
tree.
52
Phylip - practicalities