Title: Tests for phylogenetic signal
1Tests for phylogenetic signal
2CI and RI
A
A
G
T
G A
C
C
T
A
B
T
G
T
G A
C
T
A
A
B
A
C
D
F
E
C
A
G
C
G G
C
C
T
A
D
A
G
C
G G
A
C
T
A
E
A
G
C
T A
A
G
T
A
F
A
G
C
T A
A
G
A
A
O
b
s
L
M
i
n
L
M
a
x
L
3CI and RI
A
T
G A
C
C
T
B
T
G A
C
T
A
B
A
C
D
F
E
C
C
G G
C
C
T
D
C
G G
A
C
T
E
C
T A
A
G
T
F
C
T A
A
G
A
O
b
s
L
M
i
n
L
M
a
x
L
4General points
- CI and RI apply to a particular set of characters
(even one character) and a specified tree - For a given set of data all trees of the same
length will have the same CI, CI, and RI - Length is the most basic measure of signal
5How can we evaluate the significance of the
signal?
- We can compare the observed tree length (or CI)
with what we would obtain if there were no
phylogenetic signal
The permutation tail probability (PTP) test
6Permuting data removes phylogenetic signal
- Taxon 1 ACATTTA
- Taxon 2 ACGATTA
- Taxon 3 AGGATAG
- Taxon 4 GAAAAC?
- Taxon 5 GATA?CG
Permuted data sets
Taxon 1 GAAA?AA Taxon 2 ACAATC? Taxon 3
GAGTATG Taxon 4 AGTATCG Taxon 5 ACGATTA
7Example with signal
Number of Tree length replicates
------------------------- 1222 1
1669 1 1671 1
1672 1 1673 1 1674
1 1675 2 1676 2
1678 1 1679 2
1680 4 1681 5 1682
8 1683 4 1684 4
1685 2
Number of Tree length replicates
------------------------- 1686 8
1687 7 1688 6
1689 8 1690 6 1691
3 1692 2 1693
3 1694 3 1695 3
1696 3 1697 2
1699 2 1702 1 1704
2 1705 1
8Example without signal
Number of Tree length replicates
------------------------- 1924
3 1926 1 1927
4 1928 1
1929 2 1930 8
1931 6 1932
5 1933 4 1934
4 1935 5
1936 1 1937 8
1938 11 1939
7
Number of Tree length replicates
------------------------- 1940 6
1941 7 1942
4 1943 2 1944
1 1945 1
1946 1 1947 1
1950 3 1952
1 1953 1 1955
1 1958 1
9The PTP test is slow
- Hillis and Huelsenbeck (1991) observed a
difference between the shape of the tree length
distribution as a function of phylogenetic signal
10A data set without signal
mean599.182107 sd4.944738 g1-0.150922 582.0000
0 /-----------------------------------------------
------------------------- 583.80000
(5) 585.60000 (25) 587.40000
(71) 589.20000 (209) 591.00000
(161) 592.80000
(521) 594.60000
(883) 596.40000
(1132) 598.20000
(1469) 600.00000
(788) 601.80000
(1631) 603.60000
(1486) 605.40000
(1047) 607.20000
(567) 609.00000
(157) 610.80000
(171) 612.60000 (57) 614.40000
(11) 616.20000 (3) 618.00000 (1)
\-------------------------------------------------
-----------------------
11A data set with signal
mean611.572872 sd31.049455 g1-0.942643 501.000
00 /----------------------------------------------
-------------------------- 508.65000
(15) 516.30000 (60) 523.95000
(84) 531.60000 (135) 539.25000
(21) 546.90000 (26) 554.55000
(96) 562.20000 (166) 569.85000
(290) 577.50000
(737) 585.15000
(1118) 592.80000
(665) 600.45000 (120) 608.10000
(268) 615.75000
(497) 623.40000
(796) 631.05000
(1337) 638.70000
(2031) 646.35000
(1610) 654.00000 (323)
\--------------------------------------------
----------------------------
12Skewness test for phylogenetic signal
- Hillis and Huelsenbeck (1991) generated random
data for different numbers of taxa/characters to
find the null distribution of g1 scores - One can compare observed g1 statistics with this
null distribution
13Tests for phylogenetic signal (g1 and PTP)
- Are sensitive to any signal in the data
- For example
- g1 of permuted data -0.04 (ns)
- Duplicate one taxon and g1 -1.56
- Useful for identifying truly useless data (very
rare) - But otherwise does not tell you much about data
quality
14Tests of signal
- PTP and g1 tests seek to determine overall data
quality - We can, instead, evaluate particular results
- Clade support measures bootstrap/decay
- Statistical tests of alternative hypotheses