Title: Parsimony continued
1Parsimony continued
2Many issues glossed over
- What if characters disagree?
- How is the tree score determined?
- How can we root the trees?
- How do we find the optimal tree?
- How can we evaluate the robustness of our
conclusions?
3Tree score calculation
In
Ltot ? Ln Wn
I1
The tree score is the sum of the minimum number
of weighted steps (Ln) for each character
multiplied by the weight of that character (Wn)
4How is the minimum number of steps calculated?
- Postorder traversal algorithm
- The tree is arbitrarily rooted
- Each internal node is inspected to see if there
is an intersection in the possible states of its
descendant nodes - If not, length is increased
- It is not necessary to identify all ancestral
state reconstructions (this requires a preorder
traversal)
5Why weight characters?
- If we think some characters are less prone to
homoplasy, we can upweight them - Character weights are multiplied by the character
length
6We can also weight character state transitions
- Unordered, flat-weighted Fitch parsimony
To state
From state
Step matrix
7We can also weight character state transitions
- Common examples
- Ordered character states (morphology)
To state
From state
Step matrix
8We can also weight character state transitions
- Common examples
- Transitions vs. transversions
To state
From state
Step matrix
9We can also weight character state transitions
- Common examples
- Gains less likely than loss (restriction sites)
To state
From state
Step matrix (Asymmetric)
10The weighting game
- When should you weight characters/character-states
? - If you think that they differ in evidential power
- How much should you modify weights?
- There is no simple formula
- It is probably better to err on the side of less
extreme weights - Often sensible to try a range of weights
11Many issues glossed over
- What if characters disagree?
- How is the tree score determined?
- How can we root the trees?
- How do we find the optimal tree?
- How can we evaluate the robustness of our
conclusions?
12Rooting methods for parsimony
- Outgroup method (time reversible models)
- Include one or more taxa that are known from
prior information to be outside the ingroup the
group that is the focus of study - Assume near clock-like behavior
- Midpoint rooting
13Rooting methods for parsimony
- Outgroup method (time reversible models)
- Include one or more taxa that are known from
prior information to be outside the ingroup the
group that is the focus of study - Assume near clock-like behavior
- Midpoint rooting
- Include characters with asymmetric step-matrices
(time irreversible models) - Pick the root that results in the shortest tree
141-gt 0
L 1
A
B
C
D
F
E
1-gt 0
1-gt 0
L 2
15Many issues glossed over
- What if characters disagree?
- How is the tree score determined?
- How can we root the trees?
- How do we find the optimal tree?
- How can we evaluate the robustness of our
conclusions?
16Rooted trees
Polytomy
Binary/dichotomous/fully-resolved
polyotomous/unresolved
17Unrooted trees
Polytomy
18Number of unrooted fully resolved trees for t taxa
i t
i 3
19How many places can you add another taxon?
- Two taxa
- Three taxa
- Four Taxa
- Five taxa
20Number of rooted trees?
- The root is just one more taxon same formula but
t number of taxa 1
21The number of trees gets big
- Number of binary unrooted trees
- 1
- 3
- 15
- 105
- 2,027,025
- 2.2 x 1020
- 2.8 x 1074
- 1 x 101074
- Number of tips
- 3
- 4
- 5
- 6
- 10
- 20
- 50
- 500
22How do you find the optimal tree?
23How do you find the optimal tree?
- Exhaustive (lt12 taxa)
- Branch-and-bound (lt18 taxa)
- Obtain the length of a random tree (initial upper
bound) - As trees are built determine length
- If length exceeds upper bound then that tree and
all its descendant trees are ignored
24How do you find the optimal tree?
- Exhaustive (lt12 taxa)
- Branch-and-bound (lt18 taxa)
- Heuristic search (unlimited?)
25Heuristic searches
- Search for optimal trees by finding good trees
and then rearranging them in the hopes of finding
an even better tree
26Heuristic search
Suboptimal island of trees
Global optimum
Starting trees
Treespace
27Getting starting trees
- Random tree - not done
- User tree (e.g., a NJ tree)
- Build a tree by adding taxa to the location that
is optimal - Can hold more than one tree at each step
28Taxon addition order
- As-is
- In the order of the matrix (not done for
parsimony) - Simple taxon addition
- use a distance algorithm to decide order
- Closest taxon addition
- Add the taxon that makes the optimal tree
- Random taxon addition order
- Repeat many times
29Branch swapping
- Nearest-neighbour interchange (NNI)
30Branch swapping
- Subtree pruning and regrafting (SPR)
31Branch swapping
- Tree-bisection reconnection (TBR)
32Many issues glossed over
- What if characters disagree?
- How is the tree score determined?
- How can we root the trees?
- How do we find the optimal tree?
- How can we evaluate the robustness of our
conclusions?
33Even if the shortest trees is the best estimate
of the true tree - the true tree might not be the
shortest
We should consider suboptimal trees
We should use statistical tests to help us
determine what to actually believe
34Questions we can ask
- Are the data random or do they have signal?
- How much homoplasy is there?
- To what extent are particular elements of the
trees (clades) supported? - What alternative results can we reject?
More later.