Molecular phylogenetics 3

About This Presentation

Title:

Molecular phylogenetics 3

Description:

Gibbon A C C G C C C C C A 0 0 0 0 0 0 0 0 0 0. 5 7 6 11 5 12 7 7 6 3. Spectral analysis ... In this case, the best non-trivial split is {Orang-utan, Gibbon} ... – PowerPoint PPT presentation

Number of Views:110

Avg rating:3.0/5.0

Slides: 17

Provided by: jimpr8

Category:

more less

Transcript and Presenter's Notes

Title: Molecular phylogenetics 3

1
Molecular phylogenetics 3

Level 3 Molecular Evolution and Bioinformatics
Jim Provan

Page and Holmes Sections 6.5-6
2
Maximum likelihood

Principle of likelihood suggests that the
explanation that makes the observed outcome most
probable is preferred
More formally
LD Pr (D H)
In a phylogenetic context
D is the set of sequences being compared
H is a phylogenetic tree
The tree that makes the data the most probable
evolutionary outcome is the maximum likelihood
estimate of the phylogeny

3
Models, data and hypotheses

Maximum likelihood requires three elements
A model of sequence evolution
A tree
A data set
ML methods of tree building must solve two
problems
For a given tree topology, what set of branch
lengths makes the observed data most likely
Which tree has the greatest likelihood

4
Models, data and hypotheses

Suppose we have two sequences, 1 and 2, separated
by an average of d substitutions per site
d mt
Given a model of substitution for each site we
can compare the probability Pij(d) that two
sequences separated by d would have nucleotides i
and j
For example, if sequence 1 had nucleotide A then
PAG(d) is the probability that sequence 2 has a G
in the corresponding position
The log likelihood of obtaining the observed
sequences is the sum of the log likelihoods of
each individual site

5
Models, data and hypotheses

What model?
Transition/transversion ratio
Base composition
Variation in rate across sites
In all but simplest models (e.g. Jukes-Cantor),
differences in transition / transversion rates
can be taken into account
Keeping other parameters constant, it is possible
to calculate ML estimates of individual parameters

6
Likelihood ratio tests

We can test alternative hypotheses concerning the
same data using a likelihood ratio test
Likelihood ratio statistic (D) is the ratio of
the alternative hypothesis (H1) to the null
hypothesis (H0)
Because likelihoods are often very small, it is
more convenient to use log likelihoods
D log L1 log L0
where
L1 is the maximum likelihood of the alternative
hypothesis H1
L0 is the maximum likelihood of the null
hypothesis H0
Can be used to test various hypotheses such as
whether a particular model of evolution is valid,
whether a molecular clock adequately describes
the data or whether one phylogenetic hypothesis
is better than another

7
Testing models

A model can be tested to measure how well it fits
the observed data by comparing likelihood a tree
and a model confers on the data (Ltree) with
theoretical best (Lmax)
Likelihood ratio test can be performed to test
the adequacy of the HKY85 model to describe the
hominid mtDNA data set

8
Testing rate variation

If sequences are evolving at different rates,
then an ultrametric tree will give a poor
representation of relationships between taxa
2D log Lno clock log Lclock

9
Comparing phylogenetic hypotheses

If two trees are not significantly different then
the sum of these likelihood differences

will not be significantly different from zero
10
Objections to likelihood

Requires an explicit model of evolution
This is a strength, since it makes us aware of
the assumptions being made
However, dependence on a model raises question of
which model to use
Computationally expensive
Finding the best combination of model and tree is
technically difficult
Computing likelihood is also time consuming and
it may be that there is more than one maximal
likelihood value for a given tree
Suggested that likelihood is better for testing
models rather than as an all-purpose phylogenetic
tool

11
Splits

In the above example, the split gorilla,
orang-utan, gibbon,human, chimp can be
written as 00011 in binary notation, or 3 in
decimal notation
One advantage is that we can refer to any split
by a single number

12
Spectral analysis

Provides a means of visualising support for each
split
In simple terms, consists of plotting the
frequencies of each split in the data set
Straightforward if there is two states for each
character

Human G T C A T C A T C C 1 1 0 1 1 0 1 1
0 1 Chimp A T T A C C A T T C 0 1 1 1 0 0
1 1 1 1 Gorilla G T T G T T A T T A 1 1 1
0 1 1 1 1 1 0 Orang-utan A C C A C T C C C
A 0 0 0 1 0 1 0 0 0 0 Gibbon A C C G C
C C C C A 0 0 0 0 0 0 0 0 0 0 5 7
6 11 5 12 7 7 6 3
13
Spectral analysis
14
Spectral analysis

Since all splits cannot coexist in the same tree,
some method is needed to decide which splits to
use to construct the tree
Five trivial splits will be in every tree
One possible solution is to choose the two
mutually compatible, non-trivial splits which
have the best support
In this case, the best non-trivial split is
Orang-utan, Gibbon
The next best supported split is Human, Chimp,
which is compatible with this split
This gives the basic topology Human, Chimp,
Gorilla, Orang-utan, Gibbon
Problems with spectral analysis
Computationally expensive (half a million splits
for 20 sequences)
Potential for more than two character states

15
Split decomposition
1 2 3 4 5 6 7 8 9 Human T C C T T A A A
A Chimp T T C T A T A A A Gorilla T T A C A A T
A A Orang-utan C C A C A A A T A Gibbon C C A C
A A A A T
16
Split decomposition

Write a Comment

User Comments (0)

About PowerShow.com

Molecular phylogenetics 3 - PowerPoint PPT Presentation

Molecular phylogenetics 3

Gibbon A C C G C C C C C A 0 0 0 0 0 0 0 0 0 0. 5 7 6 11 5 12 7 7 6 3. Spectral analysis ... In this case, the best non-trivial split is {Orang-utan, Gibbon} ... – PowerPoint PPT presentation