Phylogenetic Tree Reconstruction - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Phylogenetic Tree Reconstruction

Description:

Mutation probabilities: P(a|b, t) Models for evolutionary mutations. Jukes Cantor. Kimura model ... If the probabilistic model is correct, the ML distances ... – PowerPoint PPT presentation

Number of Views:181
Avg rating:3.0/5.0
Slides: 54
Provided by: stat57
Category:

less

Transcript and Presenter's Notes

Title: Phylogenetic Tree Reconstruction


1
Phylogenetic Tree Reconstruction
2
Phylogenetic Tree
  • A tree that represents the relationship of a set
    of species or genetic sequences is called a
    phylogenetic tree.

3
Phylogenetic tree
  • Phylogeny the relationship of species
  • Leaves species or sequences (OTUs operational
    taxonomic units)
  • Internal nodes ancestors of particular groups of
    the OTUs.
  • Branch Length the degree of relatedness between
    the species or sequences corresponding to the
    nodes at the endpoints of the branch.

4
Orthologues / Paralogues
  • Genes which diverged because of speciation are
    called orthologues. Above a tree of orthologues
    based on a set of alpha haemoglobins.
  • Genes which diverged by gene duplication are
    called paralogues. Below a tree of paralogues,
    the alpha, beta, gamma, delta, epsilon, zeta and
    theta chains of human haemoglobins, and
    myoglobin.

5
Rooted / Unrooted Tree
6
Counting Trees
7
Counting Trees
8
Rrooting the tree
To root a tree mentally, imagine that the tree is
made of string. Grab the string at the root
and tug on it until the ends of the string (the
taxa) fall opposite the root
9
Steps of phylogenetic tree reconstruction
  • Choosing a family of homologous sequences
  • Aligning the sequences and obtaining a reduced
    multiple alignment by discarding the columns that
    contain gaps.
  • Inferring a phylogenetic tree from the reduced
    multiple alignment.

10
Methods of Phylogenetic tree reconstruction
  • Maximum parsimony methods
  • Distance methods
  • Probabilistic methods arising from the maximum
    likelihood approach

11
Maximum Parsimony Method
  • Site
  • OTU 1 2 3 4 5 6 7 8 9
  • -----------------------
  • 1 T C A G A T C A A
  • 2 T T A G A A C A A
  • 3 T T C G A T C G A
  • 4 T T C T A A G G A
  • Target find rooted tree topologies, not branch
    lengths.
  • Principle search for tree that requires the
    smallest number of character state changes
    between the OTUs (the sequence always evolve in
    the most economic way.)
  • Operation at least two different kinds of
    residues at the site, each of which is found in
    at least two of the OUT sequences

12
Maximum Parsimony
13
Maximum Parsimony
14
Traditional Parsimony
  • If N is moderate, Fitch's algorithm is realistic.
  • If N is large, the branch and bound algorithm
    should be used.

15
Distance Methods
  • Reconstruct trees (rooted or unrooted, depending
    on the method) from a set of pairwise distances,
    d (d_ij), between the sequences in a fixed
    reduced multiple alignment.
  • Given a tree relating the OTUs, obtain a
    tree-generated distance matrix d'.
  • Question is dd'? Or, is the distance function
    additive?

16
Additivity
  • Theorem (Four point condition) Let d be a
    distance function on a M (a set of N OUTs) and
    Ngt4. Then d is additive if and only if the
    following condition holds for every set of four
    distinct numbers 1 lti,j,k,lltN two of the sums
    d_ijd_kl, d_ikdjl, d_ild_jk
    coincide and are greater than or equal to the
    third one. (Saitou and Nei, 1987, Mol. Biol.
    Evol.)

17
Additivity
18
Distance function
  • distance score counted as
  • number of mismatched positions in the alignment
  • number of sequence positions that must be changed
    to generate the second sequence
  • Success depends on degree the distances among a
    set of sequences can be made additive on a
    predicted evolutionary tree

19
Example of Distance Analysis
  • Distances can be shown as a table
  • A ACGCGTTGGGCGATGGCAAC
  • B ACGCGTTGGGCGACGGTAAT
  • C ACGCATTGAATGATGATAAT
  • D ACACATTGAGTGATAATAAT

20
Neighbour joining
  • Very popular method
  • Produces unrooted tree
  • Assumes additivity distance between pairs of
    leaves sum of lengths of edges connecting them
    that is, dd'.
  • Constructs tree by sequentially joining subtrees

21
Neighbor Joining Once we know the correct (i,j)
pair
22
Neighbour Joining
23
Neighbor joining algorithm
24
Neighbour Joining why not pick the smallest
(i,j) pair?
25
Example of Distance Analysis
  • Using this information, a tree can be drawn
  • A ACGCGTTGGGCGATGGCAAC
  • B ACGCGTTGGGCGACGGTAAT
  • C ACGCATTGAATGATGATAAT
  • D ACACATTGAGTGATAATAAT

26
Drawbacks of neighbor joining
  • In practice, the distance function is often a
    pseudodistance function, which does not satisfy
    the four-point condition. (The triangle
    inequality is hard to satisfy either)
  • The algorithm may produce more than one tree,
    these trees may have branches of negative
    lengths, the matrix d' may not coincide with the
    original distance matrix d.

27
Special distance function Ultrameric distance
  • Definition A distance function d on a set M of
    OTUs is called ultrameric, if for any three
    distinct elements x_i, x_j, x_k, two of the
    distances d_ij, d_ik, d_jk concide and are
    greater than or equal to the third.
  • It satisfies the four-point condition.
  • It is additive, and can be recovered by the
    generated phylogenetic tree.

28
UPUPGMA -- Unweighted Pair Group Method with
Arithmetic meanGMA(sequential clustering method)
29
UPGMA distance function between two clusters
30
UPGMA
31
UPGMA Step 1combine B and C
32
UPGMA step 2combine BC and D
(1012)/2
(46)/2
33
UPGMA step 3combine A and E
34
UPGMA step 4combine AE and BCD
35
UPGMA Result
36
(No Transcript)
37
When UPGMA fails
38
Maximum Likelihood method
  • Assumption Maximum likelihood supposes a model
    of evolution along tree branches.
  • Strategy Find parameters (tree, branch lengths,
    substitution rate) that maximizes the likelihood
    assigned to the data.
  • Note Model of evolution does not include
    insertion and deletion of the nucleotides.
  • In Phylip package program PROTML

39
Probabilistic Methods
  • The phylogenetic tree represents a generative
    probabilistic model (like HMMs) for the observed
    sequences.
  • Background probabilities q(a)
  • Mutation probabilities P(ab, t)
  • Models for evolutionary mutations
  • Jukes Cantor
  • Kimura model
  • Felsenstein model
  • Hasegawa-Kishino-Yano model

40
Jukes Cantor model
  • A model for mutation rates
  • Mutation occurs at a constant rate
  • Each nucleotide is equally likely to mutate into
    any other nucleotide with rate alpha.

41
Kimura 2-parameter model
  • Allows a different rate for transitions and
    transversions.

42
Mutation Probabilities
  • The rate matrix R is used to derive the mutation
    probability matrix S
  • S is obtained by integration. For Jukes Cantor
  • q can be obtained by setting t to infinity

43
Mutation Probabilities
  • All models satisfy the following properties
  • Markovian property
  • Reversibility
  • Exist stationary probabilities Pa s.t.

44
Probabilistic Approach
  • Given P,q, the tree topology and branch lengths,
    we can compute

45
Computing the Tree Likelihood
46
Tree Likelihood Computation
  • Define P(Lka) prob. of leaves below node k
    given that xka
  • Init for leaves P(Lka)1 if xka 0 otherwise
  • Iteration if k is node with children i and j,
    then
  • TerminationLikelihood is

47
Maximum Likelihood (ML)
  • Score each tree by
  • Assumption of independent positions
  • Branch lengths t can be optimized
  • Gradient ascent
  • EM
  • We look for the highest scoring tree
  • Exhaustive
  • Sampling methods (Metropolis)

48
Optimal Tree Search
  • Perform search over possible topologies

49
Computational Problem
  • Such procedures are computationally expensive!
  • Computation of optimal parameters, per candidate,
    requires non-trivial optimization step.
  • Spend non-negligible computation on a candidate,
    even if it is a low scoring one.
  • In practice, such learning procedures can only
    consider small sets of candidate structures

50
Max Likelihood versus Parsimony
  • (Example from BSA p. 225)
  • Choose tree T, with unequal branch lengths.
  • Generate 1000 sequences of length N according to
    probabilistic model
  • (A) Reconstruction by ML (B)
    Reconstruction by Parsimony

51
Max Likelihood versus NJ
  • (Example from BSA p. 225)
  • Choose tree T, with unequal branch lengths.
  • Generate 1000 sequences of length N according to
    probabilistic model
  • (A) Reconstruction by ML (B)
    Reconstruction by NJ

Conclusion ML infers right tree as N gets
largerl. If the probabilistic model is correct,
the ML distances shall be very close to additive,
therefore the NJ method predicts the correct
tree.
52
Phylip - practicalities
  • Menu-driven, no command line
  • Input file format
  • First line ltnumber of sequencesgt ltnumber of
    letters per sequencegt
  • Next lines Sequences
  • First ten characters is the sequence name
  • Then sequence follows. Spaces and newlines are
    allowed.
  • Dashes (-) signify gaps
  • Example

53
The End
Write a Comment
User Comments (0)
About PowerShow.com