A phylogenetic application of the combinatorial graph Laplacian - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

A phylogenetic application of the combinatorial graph Laplacian

Description:

All cuts yield connected subgraphs No help from Fiedler. Recap thus far ... Notion of 'Fiedler cut' extends concept to 'Fiedler split' ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:1.0/5.0
Slides: 49
Provided by: ericala
Category:

less

Transcript and Presenter's Notes

Title: A phylogenetic application of the combinatorial graph Laplacian


1
A phylogenetic application of the combinatorial
graph Laplacian
  • Eric A. Stone
  • Department of Statistics
  • Bioinformatics Research Center
  • North Carolina State University

2
My motivation for this project
  • Trees in statistics or biology
  • Often a latent branching structure relating some
    observed data
  • Trees in mathematics
  • Always a connected graph with no cycles

3
My motivation for this project
  • Trees in statistics or biology
  • PROBLEM Recover properties of latent branching
    structure
  • Trees in mathematics
  • Always a connected graph with no cycles

4
My motivation for this project
  • Trees in statistics or biology
  • PROBLEM Recover properties of latent branching
    structure
  • Trees in mathematics
  • Characterization of observed structure by
    spectral graph theory

5
My motivation for this project
  • Trees in statistics or biology
  • PROBLEM Recover properties of latent branching
    structure
  • Trees in mathematics
  • Characterization of observed structure by
    spectral graph theory

6
Bridging the gap
  • Rectifying trees and trees
  • Can we use some powerful tools of spectral graph
    theory to recover latent structure?
  • Natural relationship between trees and complete
    graphs?!?

7
Tree and distance matrices
  • The tree with vertex set 1,,8 has distance
    matrix D
  • The phylogenetic tree can only be observed at
    1,,5
  • We can only observe (estimate) the phylogenetic
    portion D

The phylogenetic portion D
8
More motivation for this project
  • Trees in statistics or biology
  • PROBLEM Recover properties of latent branching
    structure
  • Given D only, recover latent branching structure
  • This is the problem of phylogenetic
    reconstruction (w/o error!)

The phylogenetic portion D
9
NJ finds (2,n-2) splits from D
  • A split is a bipartition of the leaf set (e.g.
    1,2,3,4,5) that can be induced by cutting a
    branch on the tree
  • e.g. 1,2,3,4,5 or 1,2,5,3,4
  • Neighbor-joining criterion identifies (2,n-2)
    splits through

1,2,3,4,5
1,2,5,3,4
10
A recipe for tree reconstruction from D
  • Find a split
  • NJ relies on theorem that guarantees (2,n-2)
    split from Q matrix
  • Use knowledge of split to reduce dimension
  • NJ prunes the cherry (neighboring taxa) to reduce
    leaves by one
  • Iterate until tree has been fully reconstructed
  • Tree topology specified by its split set

11
Our narrow goal
  • Find a split
  • NJ relies on theorem that guarantees (2,n-2)
    split from Q matrix
  • Hypothesize criterion that identifies deeper
    splits
  • and prove that it actually works

12
Our solution
The phylogenetic portion D
13
Our solution
The phylogenetic portion D
  • Let H be the centering matrix
  • Find eigenvector Y of HDH with the smallest
    eigenvalue
  • The signs of the entries of Y identify a split of
    the tree

14
About the matrix HDH
  • Entries of HDH are Dij Di. D.j D..
  • HDH is negative semidefinite
  • Zero is a simple eigenvalue with unit eigenvector
  • Entries of remaining eigenvalues have both and
    - entries
  • HDH appears prominently in
  • Multidimensional scaling
  • Principal coordinate analysis

15
Example of our solution
  • Find eigenvector Y of HDH with the smallest
    eigenvalue
  • Signs of Y identify the split 1,2,3,4,5

-0.0564
0.5793
-0.5011
-0.4636
0.4418
16
A real example (data from ToL)
  • Two iterations

17
Our solution
  • Find a split
  • NJ relies on theorem that guarantees (2,n-2)
    split from Q matrix
  • Hypothesize criterion that identifies deep splits
  • and prove that it actually works

18
Affinity and distance
  • In phylogenetics, common to consider pairwise
    distances
  • In graph theory, common to consider pairwise
    affinities

Affinity-based
Distance-based
19
Distance matrix ? Laplacian matrix
20
The genius of Miroslav Fiedler
  • G connected ? smallest eigenvalue of L, zero, is
    simple
  • Smallest positive eigenvalue, ?, called algebraic
    connectivity of G
  • Fiedler vectors Y satisfy LY?Y
  • Fiedler cut is the sign-induced bipartition

-0.4277
-0.0223
0.4840
-0.0158
-0.3653
0.3449
0.4038
-0.4047
21
The genius of Miroslav Fiedler
  • G connected ? smallest eigenvalue of L, zero, is
    simple
  • Smallest positive eigenvalue, ?, called algebraic
    connectivity of G
  • Fiedler vectors Y satisfy LY?Y
  • Fiedler cut is the sign-induced bipartition
  • Fiedler cut here is
  • 1,2,6,3,4,5,7,8
  • Note that the cut implies a leaf split
  • 1,2,3,4,5

-0.4277
-0.0223
0.4840
-0.0158
-0.3653
0.3449
0.4038
-0.4047
22
Is this relevant here?
  • We do not observe an 8x8 Laplacian matrix L
  • All we get is a 5x5 matrix of between-leaf
    pairwise distances D
  • Where is the connection to graph theory?

The phylogenetic portion D
23
Recall Our solution
  • Let H be the centering matrix
  • Find eigenvector Y of HDH with the smallest
    eigenvalue
  • The signs of the entries of Y identify a split of
    the tree

The phylogenetic portion D
24
An extremely useful relationship
  • Recall the centering matrix H
  • The (Moore-Penrose) pseudoinverse of HDH is in
    fact -2L
  • We have shown in the context of this formula
  • Principal submatrices of D relate to Schur
    complements of L
  • In particular, (HDH) -2L -2(L/Z) -2(W
    XZTY), where

W
X
Y
Z
25
Recall Our solution
  • Find eigenvector Y of HDH with the smallest
    eigenvalue
  • The signs of the entries of Y identify a split of
    the tree
  • The smallest eigenvalue of HDH (negative
    semidefinite) is the smallest positive eigenvalue
    of L
  • In fact, L can be seen as a graph Laplacian
  • And our solution, Y, is the Fiedler vector of
    that graph!
  • But what does this graph look like?

26
Schur complementation of a vertex
  • The vertices adjacent to 8 become adjacent to
    each other

27
Schur complementation of the interior
  • The graph described by L is fully connected
  • All cuts yield connected subgraphs ? No help from
    Fiedler

28
Recap thus far
  • Given matrix D of pairwise distances between
    leaves
  • Find eigenvector Y of HDH with the smallest
    eigenvalue
  • Claim The signs of the entries of Y identify a
    split of the tree
  • Y shown to be a Fiedler vector of the Laplacian
    L
  • But graph of L is fully connected, has no
    apparent structure
  • Thus Fiedler says nothing about signs of entries
    of Y
  • But claim requires signs to be consistent with
    structure of the tree

29
Recap thus far
  • Thus Fiedler says nothing about signs of entries
    of Y
  • But claim requires signs to be consistent with
    structure of the tree
  • How does L inherit the structure of the tree?

NO
NO
YES
30
The quotient rule inspires a Schur tower
31
The quotient rule inspires a Schur tower
  • How does this help?

32
Cutpoints and connected components
  • A point of articulation (or cutpoint) is a point
    r?G whose deletion yields a subgraph with ?2
    connected components
  • Cutpoints 6,7,8
  • Shown 1, 2, 3,4,5,7,8 are
    connectedcomponents at 6
  • The cutpoints of a tree are its internal nodes

33
The key observation (i.e. theorem)
  • Let L be the Laplacian of a graph G with some
    cutpoint v
  • Let Lv be the Laplacian of Gv obtained by
    Schur complement at v
  • Then the Fiedler cut Gv identifies a split of G
  • Here the Fiedler cut of G6 is
    1,2,5,8,3,4,7
  • Including 6 in 1,2,5,8 defines two connected
    components in G

0.0570

-
-0.4129

0.5828
0.0380

-
?
-0.3439
G
G6
0.4660

-0.3870
-
34
The quotient rule inspires a Schur tower
L
L
  • How does this help?
  • ? Look at Schur paths to graph with Laplacian L

35
The punch line
  • The graph with Laplacian L can be obtained in
    three ways
  • The Fiedler cut of G6,7,8 must split G6,7 and
    G6,8 and G7,8

36
The punch line
  • The graph with Laplacian L can be obtained in
    three ways
  • The Fiedler cut of G6,7,8 must split G6,7 and
    G6,8 and G7,8

37
Recall Example
  • Find eigenvector Y of HDH with the smallest
    eigenvalue
  • Signs of Y identify the split 1,2,3,4,5

-0.0564
0.5793
-0.5011
-0.4636
0.4418
38
The punch line
  • The graph with Laplacian L can be obtained in
    three ways
  • The Fiedler cut of G6,7,8 must split G6,7 and
    G6,8 and G7,8
  • This implies that the cut splits the progenitor
    graph G!

1,2,6,3,4,5,7,8
39
Our solution actually works
  • Let H be the centering matrix
  • Find eigenvector Y of HDH with the smallest
    eigenvalue
  • The signs of the entries of Y identify a split of
    the tree

The phylogenetic portion D
40
A recipe for tree reconstruction
  • Find a split
  • NJ relies on theorem that guarantees (2,n-2)
    split from Q matrix
  • We have a theorem that guarantees splits from
    HDH matrix
  • Use knowledge of split to reduce dimension
  • NJ prunes the cherry (neighboring taxa) to reduce
    leaves by one
  • We use a divisive method that reduces to pairs of
    subtrees
  • Iterate until tree has been fully reconstructed
  • Tree topology specified by its split set

41
Reconstruction from the inside out
42
Connections with Classical MDS and PCoA
  • Classical solution to multidimensional scaling
  • a.k.a. Principal coordinate analysis
  • Recipe for dimension reduction given distance
    matrix D
  • Construct matrix A from D entrywise x ? -x2/2
  • Double centering B HAH
  • Find k largest eigenvalues ?i of B with
    corresponding eigenvectors Xi
  • Coordinates of point Pr given by row r of
    eigenvector entries
  • ? k 1 with sqrt of tree distance equivalent to
    our approach

43
Phylogenetic ordination
  • PCoA on sequence data with k 3
  • For appropriate distance, C1 (x-axis) guaranteed
    to split taxa at 0
  • Our results support popular use of PCoA
  • Provided that the right distance is considered

44
Conclusion I
  • Natural connection between matrix of pairwise
    distances and the Laplacian of a complete graph

45
Conclusion II
  • Structure of tree embedded in complete graph and
    recoverable via spectral theory
  • Notion of Fiedler cut extends concept to
    Fiedler split
  • Inheritance propagated through Schur tower

NO
NO
YES
46
Conclusion III
  • Results inspire fast divisive tree reconstruction
    method

47
Conclusion IV
  • Provides guidance and justification for
    ordination approach

48
Acknowledgements
  • Alex Griffing (NCSU Bioinformatics)
  • Carl Meyer (NCSU Math)
  • Amy Langville (CoC Math)

49
Cutpoints and Perron components
  • Each connected component identifies a principal
    submatrix
  • Each such principal submatrix is inverse positive
  • Implies that the inverse has a Perron value that
    is simple
  • The Perron component is that with the largest
    Perron value

50
Cutpoints and Perron components
INVERSE PRINCIPAL SUBMATRICES
? 1
? .5
PERRON COMPONENT
? 7.49
51
The key observation
  • Take Schur complement of L at cutpoint, e.g. 6
  • Consider Fiedler vector of derived Laplacian
  • Signs of entries outside Perron component are
    positive ()
  • Signs of entries inside Perron component
    indeterminate (/-)

/-
/-
INVERSE PRINCIPAL SUBMATRICES

/-
? 1
/-
SCHUR GRAPH AT 6
? .5

/-
PERRON COMPONENT
? 7.49
Write a Comment
User Comments (0)
About PowerShow.com