Title: Phylogenetic inference
1Phylogenetic inference
Minicourse Molecular Phylogenetics an update of
new methodological developments during the 48th
Annual Meeting of the Sociedade Brasileira de
Genetica, Aguas de Lindoia, Sao Paulo, Brasil
(Sept 2002)
- Many methods available, using different
techniques, many software packages - For molecular data, the trend is towards using
methods based on explicit models based on
realistic assumptions - New improved methods and tests appear in the
literature constantly
2Phylogenetic inference
- This minicurso will review some of the widely
used (traditional) approaches and introduce two
recent developments - Bayesian inference
- Genetic algorithms (MetaGA)
- Review
- algorithmic vs. optimality criteria approaches
(parsimony, distance methods and ML) - Tree searching (heuristic search)
- Models (using Modeltest to choose one)
3Classification of phylogenetic methods
4Distance and discrete data
5Algorithms versus optimality criteria
- Phylogenetic inference is an estimation procedure
(best estimate) - Only have information about the contemporary
molecules (and organisms) - How do we choose a tree from the set of all
possible trees? - Two basic approaches
- Algorithmic just follow a sequence of steps
- Optimality criterion how to compare trees
6Algorithmic methods
- Combine tree inference and the definition of a
preferred tree into a single statement - Include UPGMA and all forms of pair-group cluster
analysis, and neighbor joining - Computationally fast because they go straight to
the final solution - The task of finding an optimal tree can not be
separated from that of evaluating a specific tree
7Optimality criteria
- Two logical steps
- Define an optimality criterion (objective
function for evaluating trees) - Find the tree(s) with the best value for the
objective function (may use algorithms) - Evolutionary assumptions made in the first step
are decoupled from the computations involved in
the second step - Price for logical clarity is that these methods
can be very slow
8Use of algorithms
- Different use in the two approaches
- In purely algorithmic methods, the algorithm
defines the tree selection criterion and is
fundamental - In criterion-based methods, algorithms are merely
tools used in evaluating and searching for
optimal trees - Reliability of the tree?
9Optimality criteria
- Parsimony select the tree that minimizes the
total tree length (number of steps or character
transformations required to explain a given set
of data) - Some methods are based on models of evolutionary
change assumptions are made explicit. - Is parsimonys model-free nature an advantage
or a disadvantage? - Parsimony does make assumptions (consistency)
10Optimality methods (cont.)
- Maximum likelihood evaluates the probability
that a proposed model of evolution and the
hypothesized history could give rise to the
observed data (attempts to estimate the actual
amount of change) - Usually more consistent estimates with lower
variance than other methods robust to violations
of assumptions
11Optimality criteria (cont.)
- Pairwise distance methods also minimize the
effect of multiple hits when using appropriate
model to estimate the true evolutionary distance
between two sequences (less desirable than full
ML) - Additive and ultrametric distances can be fitted
to a tree such that all pairwise distances are
equal to the sum of the branches along the path
connecting them in the tree
12- Observed distances are obtained directly from the
sequences themselves and patristic distances
from a tree - For additive and ultrametric distances, the
observed and tree distances match exactly
Additive
Ultrametric
For real data this is rarely the case, indicating
that observed distances cannot be completely
accurately represented by a tree.
13Classification of phylogenetic methods
14UPGMA an algorithmic method
- Cluster analysis Unweighted pair group method
using arithmetic averages (Sneath and Sokal 1973) - Assumes ultrametricity
15UPGMA example
- Given a matrix of pairwise distances, find the
clusters (taxa) i and j such that dij is the Min
value in the table - Define the depth of the branching between i and j
(lij) to be dij/2 - If i and j were the last two clusters, the tree
is complete. Otherwise, create a new cluster
called u - Define a distance from u to each other cluster
(k, with k ? i or j) to be the average of the
distances dki and dkj - Go back to step 1 with one less cluster clusters
i and j have been eliminated, and cluster u has
been added
16Distance Matrices and phenogram
17Classification of phylogenetic methods
18Parsimony methods
- The most widely-used method, familiar notion in
science (simplicity) - Shared attributes among taxa are inherited from
common ancestors - When character conflicts occur, ad hoc hypotheses
cannot be avoided if you want to explain all the
data, and assumptions of homoplasy must be
invoked
19Parsimony
- From all sets of possible trees, find all trees ?
such that L(?) is minimal - B is the number of branches
- N is the number of characters
- k and k are the two nodes incident to each
branch k - xkj and xkj represent either elements of the
input matrix or optimal-character assignments
made to internal nodes - Diff(y,z) is a function specifying the cost of
transformation from y to z along any branch--for
unrooted trees diff(y,z)diff(z,y) Diff may be
defined by cost matrix - The coefficient w is the weight assigned to each
character (a priori or a posteriori)
20Other parsimony variants
- Dollo parsimony every derived character must be
uniquely derived (originate only once in the
tree) - Homoplasy only reversals are allowed (no
parallelism or convergence) - In practice, Dollo parsimony does not require
inclusion of hypothetical ancestors just
character polarity (unrooted Dollo) - Convenient for restriction-site characters
(easier to loose that to gain a site)
21Dollo parsimony and RFLP data
Relaxed Dollo criterion, may be appplied using
generalized parsimony
22Generalized Parsimony
- All parsimony variants can be subsumed into a
generalized method that assigns a cost for each
possible transformation - Costs are represented in a m-by-m cost matrix S,
where each element Sij represents the increase in
tree length due to a transformation from state i
to j - The cost of each transformation (weight) can be
determined a priori (e.g. for RFLPs or for
transition/transversion changes) or a posteriori
(using the same data, e.g. successive
approximations method)
23Generalized Parsimony Cost matrices
24Protein parsimony
- A 20x20 matrix specifies the cost for each
possible transformation - The matrix may be based on the genetic code
(PROTPARS matrix) and/or the biochemical
properties of the amino acids themselves (Dayhoff
matrices)
25Difference in perspective MP and ML
- Parsimony seeks solutions that minimize the
amount of change required to explain the data
(underestimates superimposed changes) - ML attempts to estimate the actual amount of
change (by specifying the evolutionary model that
will account for the data with the highest
likelihood) - Methods that incorporate models of evolutionary
change can make more efficient use of the data
26Classification of phylogenetic methods
27Distance methods
- Experimentally derived distances are assumed to
be estimates of true distances - We want to fit them to a mathematical model
(additive tree) and find the optimal value for
the adjustable parameters - Branching pattern
- Branch lenghts
- Some methods Fitch Margoliash, minimum evolution
(ME)
28 Distance Methods
- Alternative approach to ML for minimizing the
impact of the underestimation problem if
corrected distances are used - Corrected distances are assumed to be estimates
of the true evolutionary distance (between a pair
of sequences) - Distance methods are less desirable
approximations to a full ML approach, but much
faster - But some drawbacks of character data-to-distance
transformations are information loss and
difficulty for combination of two or more data
sets
29The problem...
- We have uncertain data (distance estimates) that
we want to fit to a particular mathematical model
(an additive tree) and find optimal values for
the adjustable parameters - The branching pattern
- The branch lengths
30An additive distance measure defines a tree...
For any 2 sequences, the value in the distance
matrix should correspond to the sum of the branch
lengths along the path between the 2 sequences on
the tree.
31When distances are not ultrametric but only
metric they can still be represented by a tree
An additive tree
Additive trees also represent additive distances
exactly...
32- While this tree is additive, it is not
ultrametric - Notice that sequences b and c are the most
similar (3), but ARE NOT the most closely related - Similarity and and evolutionary relationship will
only coincide exactly if the distances are
ultrametric
33Additive-tree methods
- Due to the finite amount of available data,
stochastic variation will cause deviations of the
estimated evolutionary distances from perfect
tree additivity... - even when evolution proceeds according to the
model used for distance correction (JC, K2P,
HKY85, etc) - Many methods exist that derive a tree (w/ branch
lengths) from a distance matrix to come closest
to being additive - The discrepancy (distortion) between observed
and tree distances can be used as an indicator
(optimality criterion) of how well observed
distances fit a tree like representation (but
confusion with algorithm)
34Fitch-Margoliash and related methods
- E definition of disagreement between data and
tree - Alpha and weights must be defined
- If alpha1 then this is an absolute difference
criterion - If alpha2 thenthis is a least squares criterion
- Weighting schemes (w) more commonly used are 1,
1/dij, 1/dij2, and 1/variance(dij)
35Minimum Evolution Method
- Use unweighted least squares criterion (w1,
alpha2) to fit branch lengths, but a different
criterion to evaluate and compare trees
Optimality criterion sum of the absolute values
of the BL that minimize the squared deviations
between observed and path-length distances
2T-3 is the number of independent branches in an
unrooted tree
36Computation and tree-searchproblems
- Sometimes negative branch lengths will be defined
to optimize the fit (E in the equation) some
solutions - Outright rejection of the tree with negative
branch lenghts (too drastic) - Constrain the optimization process to disallow
negative branch lengths (set them to zero) - Least-square and minimum-abs-deviation methods
assume that each pairwise distance is independent
(not generally true because of common
evolutionary history of the molecules) - Also remember loss of information when
summarizing discrete data as a distance matrix
37Classification of phylogenetic methods
38Maximum Likelihood methods
- Evaluates a hypothesis about evolutionary history
in terms of the probability that a proposed model
of the evolutionary process and the hypothesized
history would give rise to the observed data - L Pr (DH)
- A history with a higher probability of giving
rise to the current state of affairs is preferred - Cavalli-Sforza and Edwards (1967) and Felsenstein
(1981, 1993) and others.
39ML Objective
- Data are observed sequences (DNA or prot)
- Unknowns are branching order (topology) and
branch lengths of a tree - A concrete model of evolution that transforms one
sequence into another needs to be specified
(fully defined or with uncertain parameters that
need to be estimated from the data) - L Pr (DH)
- Trees with higher likelihoods are preferred
40Calculating L for a tree
- Aligned sequences for 4 taxa
- We want to evaluate the tree shown
- What is the prob that this tree generated the
data?
41Calculating L for a tree
- Root the tree at any internal node (models are
time-reversible) - Assumption of independence allows to calculate L
for each site separately - Then combine the likelihoods into a total value
at the end
42Calculating L for a tree
- To calculate L for some site j, we must consider
all possible scenarios by which the tip sequences
could have evolved - Specifically, the root (6) may have had A, C, T,
or G - For each of these possibilities, the other
internal node (5) also might have possessed any
of the 4 nucleotides
43Calculating L for a tree
- Thus, there are 4x416 possibilities to consider
44Calculating L for a tree
- Calculate the probability of each and sum them to
obtain the total probability for site j - Assume that the changes along each branch are
independent (Markov model) - Thus, the Pr of any single scenario is equal to
the product of the Pr of the changes required by
that scenario
45Calculating L for a tree
- Because the Pr of any single observation is an
extremely small number, we evaluate the log of
the likelihood instead - Probabilities are accumulated as the sum of logs
of the single-site likelihoods
46Difference in perspective MP and ML
- Parsimony seeks solutions that minimize the
amount of change required to explain the data
(underestimates superimposed changes) - ML attempts to estimate the actual amount of
change (by specifying the evolutionary model that
will account for the data with the highest
likelihood) - Methods that incorporate models of evolutionary
change can make more efficient use of the data
47Difference in perspective ML vs MP
48Parsimony and likelihood
ML and MP scores for all 15 unrooted trees for
mtDNA sequence data
49MP and Inconsistency
50Long Branch Attraction
- The Felsenstein Zone
- What are the assumptions for MP?
- How can we tell if theres LBA in our data?
51Searching for optimal trees
- Methods with explicit optimality criteria
- Parsimony
- Maximum likelihood
- Additive-tree distance
- Separate the problem of
- evaluating the tree
- finding the optimal tree(s)
- Can we evaluate all possible trees for a
particular problem?
52Searching for optimal trees
- For small to moderate data sets, with as many as
8-20 taxa, we can use exact methods - Exact methods guarantee the discovery of all
optimal trees - Exact methods include
- Exhaustive search
- Branch-and-bound search
53How many trees?
54And for more than 10 taxa?
55Exhaustive search enumerate al possible trees
56Branch-and-bound Does not require exhaustive
search and yet provides an exact solution. 1.
Traverse a search tree in a depth-first
sequence 2. Select upper bound (L) on optimal
value of chosen criterion. 3. Move along path to
tips and evaluate trees. If tree is L then
dispense the rest of that path.
57Approximate methods
- For larger data sets computing time becomes
prohibitive and we only explore some subset of
all possible trees (hoping that the optimal trees
will be found in the subset explored) - Heuristic approaches sacrifice the guarantee of
optimality in favor of reduced computer time - Use hill climbing methods. Initial tree starts
the process, then we seek to improve its score - When we can find no way to further improve the
score, we stop.We dont know if we reached a
local or a global optimum
58Initial trees
- May be obtained by stepwise addition, the most
commonly used method - Similar to exhaustive search but evaluate trees
at every step, each time you add a new taxon and
only follow the path derived from the optimal
tree - Which taxa do you choose first? Which do you
connect next? - These are greedy algorithms
59Stepwise addition
60- Initial trees also may be obtained by star
decomposition, another greedy algorithm
61Branch swapping
- To improve the initial estimate we can perform
sets of predefined rearrangements on the tree - Any of these rearrangements amounts to a stab in
the dark - Globally optimal trees may be several
rearrangements away from the starting tree - If a better tree is found, a new round of
rearrangements is then performed in the new tree - Several branch-swapping algorithms are available
62Branch swapping by tree bisection and
reconnection (TBR) 1. Tree is bisected along a
branch, yielding two disjunct subtrees 2. The
subtrees are reconnected by joining a pair of
branches, one from each subtree 3. All possible
bisections and pairwise reconnections are
evaluated
63Branch swapping by subtree prunning and
regrafting 1. A subtree is pruned from the tree
(e.g. A,B) 2. The subtree is then regrafted to a
different location on the tree 3. All possible
subtree removals and reattachment points are
evaluated
64Branch swapping by nearest-neighbor interchanges
(NNI) 1. Each interior branch of the tree
defines a local region of four subtrees
2. Interchanging a subtree on one side of the
branch with one from the other constitutes an
NNI 3. Two such rearrangements are possible for
each interior branch (all interior branches are
swapped)
65Landscapes and the problem of islands of trees