Likelihood Methods - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Likelihood Methods

Description:

Likelihood Methods. Likelihood maximization is a very common ... Try P = 0.8. L(D) = 0.8 * 0.8 * 0.2 = 0.128 ... Tries out many nested models of ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 18
Provided by: cat86
Category:

less

Transcript and Presenter's Notes

Title: Likelihood Methods


1
Likelihood Methods
2
Likelihood maximization is a very common approach
to inference
3
A coin is known to be biased The coin is tossed
three times two heads and one tail Use
principal of ML to estimate the probability of
throwing a head
Try P 0.2
L(D) 0.2 0.2 0.8 0.032
Try P 0.4
L(D) 0.4 0.4 0.6 0.096
Likelihood of the Data
Try P 0.6
L(D) 0.6 0.6 0.4 0.144
Try P 0.8
L(D) 0.8 0.8 0.2 0.128
Probability of a head
4
Genetic Distance
Given a model of the way sequences evolve we can
derive a function that describes the likelihood
of observing a particular pair of sequences as a
function of the inferred genetic distance between
them
5
A C G G A G Consider simplified Jukes and Cantor
model
D 0.3 0.3 0.3 0.7 0.063
D 0.6 0.6 0.6 0.4 0.144
Likelihood
D 0.9 0.9 0.9 0.1 0.081
Note this is a major simplification
Distance
6
Genetic Distance using Maximum Likelihood
  • Require a model of evolution
  • Optimise all parameters of the model
  • Each evolutionary event has an associated
    likelihood given an inferred genetic distance
  • The likelihood of the sequence-pair is a function
    of the genetic distance (just the product of the
    likelihoods of each of the inferred events at
    each sequence position)
  • Function is minimized

7
Phylogenetic trees using Maximum Likelihood
  • Require a model of evolution
  • Each substitution has an associated likelihood
    given a branch of a certain length
  • A function is derived to represent the likelihood
    of the data given the tree, branch-lengths and
    additional parameters
  • Optimise over parameters of the model
  • Optimise over branch lengths
  • Sum the likelihood over all possible sequences at
    ancestral nodes
  • Search for the best tree (using heuristics such
    as TBR)

8
Models can be made more parameter rich to
increase their realism
  • The most common additional parameters are
  • A correction to allow different rates for each
    type of nucleotide change
  • A correction for the proportion of sites which
    are unable to change
  • A correction for variable rates at those sites
    which can change
  • The values of the additional parameters will be
    estimated in the process

9
A gamma distribution can be used to model site
rate heterogeneity
Can be iterative.
  • Rates of evolution can be worked out accurately
    if the tree is known
  • Accurate rates of evolution for each sequence
    position improve the accuracy of the tree

Rates programme
Rates
Tree
Tree programme
10
Likelihood and the number of parameters
  • More parameters always leads to a better fit of
    the data

11
More parameters always leads to a higher value of
the likelihood whether or not the additional
parameters are providing a significantly better
fit to the data
12
  • Are the extra parameters justified?
  • - Likelihood ratio test (applies to nested
    models)
  • - Akaike Information Criterion

dof number of additional parameters
13
One model is nested in another if it is a special
case of the more general model e.g. the Jukes
and Cantor model and Kimura 2P model
14
  • Modeltest
  • - Uses PAUP
  • - Tries out many nested models of nucleotide
    substitution
  • - Decides how many parameters are justified by
    the data

GTR does not overfit the data for at least some
HIV sequences
15
Likelihood-based tests of topologies
  • Kishino-Hasegawa test
  • Trees specified apriori
  • KH can be used to test whether two competing
    hypotheses have significantly different
    likelihood
  • Involves non-parametric bootstrap to get an idea
    of by how much the likelihoods vary
  • NB should not be used to test trees that have
    been chosen on the basis of the data!

16
Likelihood-based tests of topologies
  • Shimodaira-Hasegawa test
  • Can be used to test confidence of ML tree
    compared to related trees (e.g. second most
    likely tree from the data)
  • PAUP, Andrew Rambaut http//evolve.zoo.ox.ac.uk/so
    ftware/shtests

17
Inferring Sequences at Ancestral Nodes
  • Maximum likelihood estimates of tree topologies
    also provide inferred sequences at ancestral
    nodes
  • Analysis of sequences at ancestral nodes and
    sequence changes at ancestral branches can
    provide information about the timing of the
    acquiring of a novel trait or mutation
  • PAML (Phylogenetic Analysis using Maximum
    Likelihood)
  • Confidence intervals provided
  • Selection can be inferred
Write a Comment
User Comments (0)
About PowerShow.com