Likelihood Methods - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

Likelihood Methods

Description:

Likelihood Methods. Likelihood maximization is a very common ... Try P = 0.8. L(D) = 0.8 * 0.8 * 0.2 = 0.128 ... Tries out many nested models of ... – PowerPoint PPT presentation

Number of Views:97

Avg rating:3.0/5.0

Slides: 18

Provided by: cat86

Category:

more less

Transcript and Presenter's Notes

Title: Likelihood Methods

1
Likelihood Methods
2
Likelihood maximization is a very common approach
to inference
3
A coin is known to be biased The coin is tossed
three times two heads and one tail Use
principal of ML to estimate the probability of
throwing a head
Try P 0.2
L(D) 0.2 0.2 0.8 0.032
Try P 0.4
L(D) 0.4 0.4 0.6 0.096
Likelihood of the Data
Try P 0.6
L(D) 0.6 0.6 0.4 0.144
Try P 0.8
L(D) 0.8 0.8 0.2 0.128
Probability of a head
4
Genetic Distance
Given a model of the way sequences evolve we can
derive a function that describes the likelihood
of observing a particular pair of sequences as a
function of the inferred genetic distance between
them
5
A C G G A G Consider simplified Jukes and Cantor
model
D 0.3 0.3 0.3 0.7 0.063
D 0.6 0.6 0.6 0.4 0.144
Likelihood
D 0.9 0.9 0.9 0.1 0.081
Note this is a major simplification
Distance
6
Genetic Distance using Maximum Likelihood

Require a model of evolution
Optimise all parameters of the model
Each evolutionary event has an associated
likelihood given an inferred genetic distance
The likelihood of the sequence-pair is a function
of the genetic distance (just the product of the
likelihoods of each of the inferred events at
each sequence position)
Function is minimized

7
Phylogenetic trees using Maximum Likelihood

Require a model of evolution
Each substitution has an associated likelihood
given a branch of a certain length
A function is derived to represent the likelihood
of the data given the tree, branch-lengths and
additional parameters
Optimise over parameters of the model
Optimise over branch lengths
Sum the likelihood over all possible sequences at
ancestral nodes
Search for the best tree (using heuristics such
as TBR)

8
Models can be made more parameter rich to
increase their realism

The most common additional parameters are
A correction to allow different rates for each
type of nucleotide change
A correction for the proportion of sites which
are unable to change
A correction for variable rates at those sites
which can change
The values of the additional parameters will be
estimated in the process

9
A gamma distribution can be used to model site
rate heterogeneity
Can be iterative.

Rates of evolution can be worked out accurately
if the tree is known
Accurate rates of evolution for each sequence
position improve the accuracy of the tree

Rates programme
Rates
Tree
Tree programme
10
Likelihood and the number of parameters

More parameters always leads to a better fit of
the data

11
More parameters always leads to a higher value of
the likelihood whether or not the additional
parameters are providing a significantly better
fit to the data
12

Are the extra parameters justified?
- Likelihood ratio test (applies to nested
models)
- Akaike Information Criterion

dof number of additional parameters
13
One model is nested in another if it is a special
case of the more general model e.g. the Jukes
and Cantor model and Kimura 2P model
14

Modeltest
- Uses PAUP
- Tries out many nested models of nucleotide
substitution
- Decides how many parameters are justified by
the data

GTR does not overfit the data for at least some
HIV sequences
15
Likelihood-based tests of topologies

Kishino-Hasegawa test
Trees specified apriori
KH can be used to test whether two competing
hypotheses have significantly different
likelihood
Involves non-parametric bootstrap to get an idea
of by how much the likelihoods vary
NB should not be used to test trees that have
been chosen on the basis of the data!

16
Likelihood-based tests of topologies

Shimodaira-Hasegawa test
Can be used to test confidence of ML tree
compared to related trees (e.g. second most
likely tree from the data)
PAUP, Andrew Rambaut http//evolve.zoo.ox.ac.uk/so
ftware/shtests

17
Inferring Sequences at Ancestral Nodes

Maximum likelihood estimates of tree topologies
also provide inferred sequences at ancestral
nodes
Analysis of sequences at ancestral nodes and
sequence changes at ancestral branches can
provide information about the timing of the
acquiring of a novel trait or mutation
PAML (Phylogenetic Analysis using Maximum
Likelihood)
Confidence intervals provided
Selection can be inferred