Title: A phylogenetic application of the combinatorial graph Laplacian
1A phylogenetic application of the combinatorial
graph Laplacian
- Eric A. Stone
- Department of Statistics
- Bioinformatics Research Center
- North Carolina State University
2My motivation for this project
- Trees in statistics or biology
- Often a latent branching structure relating some
observed data - Trees in mathematics
- Always a connected graph with no cycles
3My motivation for this project
- Trees in statistics or biology
- PROBLEM Recover properties of latent branching
structure - Trees in mathematics
- Always a connected graph with no cycles
4My motivation for this project
- Trees in statistics or biology
- PROBLEM Recover properties of latent branching
structure - Trees in mathematics
- Characterization of observed structure by
spectral graph theory
5My motivation for this project
- Trees in statistics or biology
- PROBLEM Recover properties of latent branching
structure - Trees in mathematics
- Characterization of observed structure by
spectral graph theory
6Bridging the gap
- Rectifying trees and trees
- Can we use some powerful tools of spectral graph
theory to recover latent structure? - Natural relationship between trees and complete
graphs?!?
7Tree and distance matrices
- The tree with vertex set 1,,8 has distance
matrix D - The phylogenetic tree can only be observed at
1,,5 - We can only observe (estimate) the phylogenetic
portion D
The phylogenetic portion D
8More motivation for this project
- Trees in statistics or biology
- PROBLEM Recover properties of latent branching
structure - Given D only, recover latent branching structure
- This is the problem of phylogenetic
reconstruction (w/o error!)
The phylogenetic portion D
9NJ finds (2,n-2) splits from D
- A split is a bipartition of the leaf set (e.g.
1,2,3,4,5) that can be induced by cutting a
branch on the tree - e.g. 1,2,3,4,5 or 1,2,5,3,4
- Neighbor-joining criterion identifies (2,n-2)
splits through
1,2,3,4,5
1,2,5,3,4
10A recipe for tree reconstruction from D
- Find a split
- NJ relies on theorem that guarantees (2,n-2)
split from Q matrix - Use knowledge of split to reduce dimension
- NJ prunes the cherry (neighboring taxa) to reduce
leaves by one - Iterate until tree has been fully reconstructed
- Tree topology specified by its split set
11Our narrow goal
- Find a split
- NJ relies on theorem that guarantees (2,n-2)
split from Q matrix - Hypothesize criterion that identifies deeper
splits -
- and prove that it actually works
12Our solution
The phylogenetic portion D
13Our solution
The phylogenetic portion D
- Let H be the centering matrix
- Find eigenvector Y of HDH with the smallest
eigenvalue - The signs of the entries of Y identify a split of
the tree
14About the matrix HDH
- Entries of HDH are Dij Di. D.j D..
- HDH is negative semidefinite
- Zero is a simple eigenvalue with unit eigenvector
- Entries of remaining eigenvalues have both and
- entries - HDH appears prominently in
- Multidimensional scaling
- Principal coordinate analysis
15Example of our solution
- Find eigenvector Y of HDH with the smallest
eigenvalue - Signs of Y identify the split 1,2,3,4,5
-0.0564
0.5793
-0.5011
-0.4636
0.4418
16A real example (data from ToL)
17Our solution
- Find a split
- NJ relies on theorem that guarantees (2,n-2)
split from Q matrix - Hypothesize criterion that identifies deep splits
-
- and prove that it actually works
18Affinity and distance
- In phylogenetics, common to consider pairwise
distances - In graph theory, common to consider pairwise
affinities
Affinity-based
Distance-based
19Distance matrix ? Laplacian matrix
20The genius of Miroslav Fiedler
- G connected ? smallest eigenvalue of L, zero, is
simple - Smallest positive eigenvalue, ?, called algebraic
connectivity of G - Fiedler vectors Y satisfy LY?Y
- Fiedler cut is the sign-induced bipartition
-0.4277
-0.0223
0.4840
-0.0158
-0.3653
0.3449
0.4038
-0.4047
21The genius of Miroslav Fiedler
- G connected ? smallest eigenvalue of L, zero, is
simple - Smallest positive eigenvalue, ?, called algebraic
connectivity of G - Fiedler vectors Y satisfy LY?Y
- Fiedler cut is the sign-induced bipartition
- Fiedler cut here is
- 1,2,6,3,4,5,7,8
- Note that the cut implies a leaf split
- 1,2,3,4,5
-0.4277
-0.0223
0.4840
-0.0158
-0.3653
0.3449
0.4038
-0.4047
22Is this relevant here?
- We do not observe an 8x8 Laplacian matrix L
- All we get is a 5x5 matrix of between-leaf
pairwise distances D - Where is the connection to graph theory?
The phylogenetic portion D
23Recall Our solution
- Let H be the centering matrix
- Find eigenvector Y of HDH with the smallest
eigenvalue - The signs of the entries of Y identify a split of
the tree
The phylogenetic portion D
24An extremely useful relationship
- Recall the centering matrix H
- The (Moore-Penrose) pseudoinverse of HDH is in
fact -2L - We have shown in the context of this formula
- Principal submatrices of D relate to Schur
complements of L - In particular, (HDH) -2L -2(L/Z) -2(W
XZTY), where
W
X
Y
Z
25Recall Our solution
- Find eigenvector Y of HDH with the smallest
eigenvalue - The signs of the entries of Y identify a split of
the tree - The smallest eigenvalue of HDH (negative
semidefinite) is the smallest positive eigenvalue
of L - In fact, L can be seen as a graph Laplacian
- And our solution, Y, is the Fiedler vector of
that graph! - But what does this graph look like?
26Schur complementation of a vertex
- The vertices adjacent to 8 become adjacent to
each other
27Schur complementation of the interior
- The graph described by L is fully connected
- All cuts yield connected subgraphs ? No help from
Fiedler
28Recap thus far
- Given matrix D of pairwise distances between
leaves - Find eigenvector Y of HDH with the smallest
eigenvalue - Claim The signs of the entries of Y identify a
split of the tree - Y shown to be a Fiedler vector of the Laplacian
L - But graph of L is fully connected, has no
apparent structure - Thus Fiedler says nothing about signs of entries
of Y - But claim requires signs to be consistent with
structure of the tree
29Recap thus far
- Thus Fiedler says nothing about signs of entries
of Y - But claim requires signs to be consistent with
structure of the tree - How does L inherit the structure of the tree?
NO
NO
YES
30The quotient rule inspires a Schur tower
31The quotient rule inspires a Schur tower
32Cutpoints and connected components
- A point of articulation (or cutpoint) is a point
r?G whose deletion yields a subgraph with ?2
connected components - Cutpoints 6,7,8
- Shown 1, 2, 3,4,5,7,8 are
connectedcomponents at 6 - The cutpoints of a tree are its internal nodes
33The key observation (i.e. theorem)
- Let L be the Laplacian of a graph G with some
cutpoint v - Let Lv be the Laplacian of Gv obtained by
Schur complement at v - Then the Fiedler cut Gv identifies a split of G
- Here the Fiedler cut of G6 is
1,2,5,8,3,4,7 - Including 6 in 1,2,5,8 defines two connected
components in G
0.0570
-
-0.4129
0.5828
0.0380
-
?
-0.3439
G
G6
0.4660
-0.3870
-
34The quotient rule inspires a Schur tower
L
L
- How does this help?
- ? Look at Schur paths to graph with Laplacian L
35The punch line
- The graph with Laplacian L can be obtained in
three ways - The Fiedler cut of G6,7,8 must split G6,7 and
G6,8 and G7,8
36The punch line
- The graph with Laplacian L can be obtained in
three ways - The Fiedler cut of G6,7,8 must split G6,7 and
G6,8 and G7,8
37Recall Example
- Find eigenvector Y of HDH with the smallest
eigenvalue - Signs of Y identify the split 1,2,3,4,5
-0.0564
0.5793
-0.5011
-0.4636
0.4418
38The punch line
- The graph with Laplacian L can be obtained in
three ways - The Fiedler cut of G6,7,8 must split G6,7 and
G6,8 and G7,8 - This implies that the cut splits the progenitor
graph G!
1,2,6,3,4,5,7,8
39Our solution actually works
- Let H be the centering matrix
- Find eigenvector Y of HDH with the smallest
eigenvalue - The signs of the entries of Y identify a split of
the tree
The phylogenetic portion D
40A recipe for tree reconstruction
- Find a split
- NJ relies on theorem that guarantees (2,n-2)
split from Q matrix - We have a theorem that guarantees splits from
HDH matrix - Use knowledge of split to reduce dimension
- NJ prunes the cherry (neighboring taxa) to reduce
leaves by one - We use a divisive method that reduces to pairs of
subtrees - Iterate until tree has been fully reconstructed
- Tree topology specified by its split set
41Reconstruction from the inside out
42Connections with Classical MDS and PCoA
- Classical solution to multidimensional scaling
- a.k.a. Principal coordinate analysis
- Recipe for dimension reduction given distance
matrix D - Construct matrix A from D entrywise x ? -x2/2
- Double centering B HAH
- Find k largest eigenvalues ?i of B with
corresponding eigenvectors Xi - Coordinates of point Pr given by row r of
eigenvector entries - ? k 1 with sqrt of tree distance equivalent to
our approach
43Phylogenetic ordination
- PCoA on sequence data with k 3
- For appropriate distance, C1 (x-axis) guaranteed
to split taxa at 0 - Our results support popular use of PCoA
- Provided that the right distance is considered
44Conclusion I
- Natural connection between matrix of pairwise
distances and the Laplacian of a complete graph
45Conclusion II
- Structure of tree embedded in complete graph and
recoverable via spectral theory - Notion of Fiedler cut extends concept to
Fiedler split - Inheritance propagated through Schur tower
NO
NO
YES
46Conclusion III
- Results inspire fast divisive tree reconstruction
method
47Conclusion IV
- Provides guidance and justification for
ordination approach
48Acknowledgements
- Alex Griffing (NCSU Bioinformatics)
- Carl Meyer (NCSU Math)
- Amy Langville (CoC Math)
49Cutpoints and Perron components
- Each connected component identifies a principal
submatrix - Each such principal submatrix is inverse positive
- Implies that the inverse has a Perron value that
is simple - The Perron component is that with the largest
Perron value
50Cutpoints and Perron components
INVERSE PRINCIPAL SUBMATRICES
? 1
? .5
PERRON COMPONENT
? 7.49
51The key observation
- Take Schur complement of L at cutpoint, e.g. 6
- Consider Fiedler vector of derived Laplacian
- Signs of entries outside Perron component are
positive () - Signs of entries inside Perron component
indeterminate (/-)
/-
/-
INVERSE PRINCIPAL SUBMATRICES
/-
? 1
/-
SCHUR GRAPH AT 6
? .5
/-
PERRON COMPONENT
? 7.49