Title: An introduction to maximum likelihood
1An introduction to maximum likelihood
2What does parsimony assume?
- Traditional view Just character independence -
with enough characters you should converge to the
true phylogeny - Felsenstein (1978) used a simple example to show
that parsimony assumes more
3Four taxon case
- Two states 0 1
- Changes in each direction equally probable
- Probability of change of states on a branch P
or Q - Which data patterns favor the true tree?
A B C D 0 0 1 1 1 1 0 0
4How can the 1100 pattern arise?
- Change on branches AB (PQ) and either
- No change on the other three (1-Q)2(1-P)
- Change on the other three PQ2
- PQ(1-Q)2(1-P) Q2P
0
5What is the probability of those outcomes?
- Prob1100 PQ(1-Q)2(1-P) Q 2 P
- Prob0011 (1-P)(1-Q)Q(1-Q)(1-P)(1-Q)QP
0
6Consider the probability of data favoring the
tree (A,C)(B,D)
- Prob1010 P(1-Q)Q2(1-P) (1-Q)2P
- Prob0101 (1-P)QQ(1-Q)PQ(1-Q)(1-P)
0
7Probability of consistency
- Parsimony will be consistent if
- Prob1010 Prob0101 Prob1100 Prob0011
- If we assume Q is less than 0.5, consistency
requires that P2 Q(1-Q)
8Consistency is not guaranteed
9Inconsistency
- When the model is inconsistent the tree gets
worse as you add more data - Long branch attraction (LBA)
10Possible responses
- It only applies to four taxa and two states
- Still applies to 4-state data
- Gets worse with more taxa
- Consistency is not so important
- Real data are not in the Felsenstein zone
11Maximum likelihood
- A general approach to estimating parameters in
statistics - Has many desirable statistical properties
- Felsenstein suggested it could be applied to
phylogenetic inference and that it should avoid
LBA
12The maximum likelihood criterion
- The best estimate of a parameter is the value
that would be most likely to generate the
observed data
13Application to phylogeny
- Assume a model of evolution
- Find the tree that would be most likely to give
the observed data given the model - Branch lengths are taken into account
- Uses all data (variant and invariant)
14An example (from Swofford et al. 1996)
- What can we say about the placement of another
taxon with state C?
15An example (from Swofford et al. 1996)
- Parsimony the new taxon could attach in several
places
16An example (from Swofford et al. 1996)
- ML - One place is favored
- State at ? most likely A
17An outline of the ML approachConsider one
character, i
(It is useful to arbitrarily root the tree)
18Sum across all possible histories for i
There are 4(n-2) arrangements for n taxa
19For each tree we calculate the likelihood of
getting the observed states L(i)
A
G
G
G
t2
t3
t4
t5
A
t1
A
L(i) PA x PA-A(t1)x PA-G(t1)x PA-G(t1)x
PA-A(t1)x PA-G(t1)
20Multiply across all sites (assume independence)
L will be very smalllnL will be a large negative
number
21Tree searching
- Search for the set of branch-lengths that
maximize L ( lower -lnL score) - Record that score
- Search for tree topologies with the best score
Time consuming
22Critical issues glossed over
- Where do we get Pn - the probability of state n
at the arbitrary root node? - Equiprobable (25)
- Empirical (frequency in the entire matrix)
- Estimated (optimized by ML on each tree)
- Where do we get Pi-j(t) - the probability of
going from state i to state j in time t?
23Typical Simplifying Assumptions
- Stationarity
- Reversibility
- Site independence
- Markovian process (no memory)
24The simplest model of molecular evolution
Jukes-Cantor
Instantaneous rate matrix (Q-matrix)
25The simplest model of molecular evolution
Jukes-Cantor
Instantaneous rate matrix (Q-matrix)
26Calculating probabilities of change
- To convert the Q matrix into a matrix giving the
probability of starting at state i and ending in
state j, t time units later uses the formula
P(t) eQt
27The simplest model of molecular evolution
Jukes-Cantor
Substitution probability matrix (P-matrix)
28More complicated (realistic) models for DNA
- Allow deviation from equiprobable base
frequencies - HKY85 F81GTR
- Allow two substitution types (ti and tv)
- K2P HKY85
- Allow for six substitution types
- GTR
29Relationship among models
30Relationship between MP and ML
- One argument - MP is inherently nonparametric ?
No direct comparison possible - MP is an ML model that makes particular
assumptions
31The Goldman (1990) model(see Lewis 1998 for more)
- We force all branch lengths to be equal
- The Likelihood for a character only considers the
set of ancestral states that maximizes the
likelihood
32Why use MP
- The model is clearly less realistic, but
- We can do more thorough searches and data
exploration (computational efficiency) - Robust results will usually still be supported
33Why use ML
- The model (assumptions) are explicit
- We can statistically compare alternative models
- We can conduct parametric statistical tests
(under the assumption that we have used the
correct model) - But, even the most complex model is still
unrealistically simple