Efficiently Answering Reachability Queries on Large Directed Graphs - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Efficiently Answering Reachability Queries on Large Directed Graphs

Description:

Efficiently Answering Reachability Queries on Large Directed Graphs – PowerPoint PPT presentation

Number of Views:214
Avg rating:3.0/5.0
Slides: 33
Provided by: ksu7
Category:

less

Transcript and Presenter's Notes

Title: Efficiently Answering Reachability Queries on Large Directed Graphs


1
Efficiently Answering Reachability Queries on
Large Directed Graphs
  • Ruoming Jin
  • Kent State University
  • Joint work with Yang Xiang (KSU), Ning Ruan
    (KSU), and Haixun Wang (IBM T.J. Watson)

2
Reachability Query
The problem Given two vertices u and v in a
directed graph G, is there a path from u to v ?
15
  • ?Query(1,11)
  • Yes
  • ?Query(3,9)
  • No

14
11
13
10
12
6
7
8
9
3
4
5
1
2
Directed Graph ? DAG (directed acyclic graph) by
coalescing the strongly connected components
3
Applications
  • XML
  • Biological networks
  • Ontology
  • Knowledge representation (Lattice operation)
  • Object programming (Class relationship)
  • Distributed systems (Reachable states)

Graph Databases
4
Prior Work
2-HOP (O(nm1/2), and O(n4)), HOPI, and heuristic
algorithms
5
Limitation of Tree-based approaches
  • Finding a good tree cover is expensive
  • Tree cover cannot represent some common types of
    DAGs, like Grid
  • Compression limitations
  • Chain (1-parent, 1-child)
  • Tree (1-parent, multiple children)
  • Most existing methods which utilize the tree
    cover are greatly affected by how many edges are
    left uncovered

6
Overview of Path-Tree
  • Chain-gtTree-gtPath-Tree (2 parents / multiple
    children)
  • Path-tree cover is a spanning subgraph of G in a
    tree shape (T)
  • A node in the tree T corresponds to a path in G
    and an edge in T corresponds to the edges between
    two paths in G
  • 3-tuple labeling exists for any path-tree to
    answer reachability query in O(1)

7
Path-Tree in a Nutshell
15
14
P4
11
13
10
12
P2
6
7
8
9
P4
P1
P3
3
4
5
P3
1
2
P2
P1
Path-Graph is not necessarily a planar graph The
reachability between any two nodes can be
answered in O(1)
8
Key Problems
  • How to construct a path-tree?
  • Algorithm
  • How can a path-tree help with reachability
    queries?
  • Labeling
  • Transitive Closure Compression
  • How does path-tree compare with the existing
    methods?
  • Optimality

9
Constructing Path-Tree
  • Step 1 Path-Decomposition of DAG
  • Step 2 Minimal Equivalent Edge Set between any
    two paths
  • Step 3 Path-Graph Construction
  • Step 4 Path-Tree Cover Extraction

10
Step 1 Path-Decomposition
15
(PID,SID) (2, 5)
14
11
For any two nodes (u, v) in the same path, u ?
v if and only if (u.sid ? v.sid)
13
10
12
6
7
8
9
P4
3
4
5
P3
1
2
P2
P1
Simple linear algorithm based on topological sort
can achieve a path-decomposition
11
Step 2 Minimal equivalent edge set
  • The reachability between any two paths can be
    captured by a unique minimal set of edges

15
15
14
14
11
11
13
10
13
10
6
7
P1? P2
P1 ? P2
6
7
3
4
3
4
1
2
1
2
P2
P2
P1
P1
The edges in the minimal equivalent edge set do
not cross (always parallel)!
12
Step 3 Path-Graph Construction
Weight reflects the cost we have to pay for the
transitive closure computation if we exclude this
path-tree edge
15
14
P2
11
2
4
13
10
12
5
P4
P1
2
2
1
1
6
7
8
9
1
P4
P3
3
4
5
P3
Weighted Directed Path-Graph
1
2
P2
P1
13
Step 4 Extracting Path-Tree Cover
P2
P2
2
2
4
5
5
P4
P4
P1
P1
2
2
2
1
1
1
P3
P3
Weighted Directed Path-Graph
Maximal Directed Spanning Tree
Chu-Liu/Edmonds algorithm, O(m k logk)
14
Key Problems
  • How to construct a path-tree?
  • Algorithm
  • How can path-tree help with reachability queries?
  • Labeling
  • Transitive Closure Compression
  • How does path-tree compare with the existing
    methods?
  • Optimality

15
3-Tuple Labeling for Reachability
15
1,3
P2
14
11
1,4
P4
13
10
12
P1
1,1
2,2
6
7
8
9
P3
P4
3
4
5
Interval labeling (2-tuple) High-level
description about paths Pi ? Pj ?
P3
1
2
P2
P1
DFS labeling (1-tuple)
16
DFS labeling
4
15
14
10
2
1
9
7
P3
P1
5
13
15
1
3
6
8
14
6
11
3
13
8
P2
4
10
11
2
7
12
5
P4
9
12
  • Starting from the first vertex in the root-path
  • Always try to visit the next vertex in the same
    path
  • Label a node when all its neighbors has been
    visited
  • L(v)N-x, x is the of nodes has been
    labeled

17
3-Tuple Labeling for Reachability
4
15
14
10
2
1
9
7
P3
P1
5
13
15
1
3
6
8
14
6
11
3
13
8
P2
4
10
11
2
7
12
5
P4
1,3
9
12
P2
u?v if and only if 1) Interval label I(u) ??
I(v) 2) DFS label L(u) ? L(v)
?Query(9,15) P41,4 ?? P11,1 and 5 lt
15 Yes ?Query(9,2)?Query(5,9)
1,4
P4
P1
1,1
2,2
P3
18
Transitive Closure Compression
15
Path-tree cover (including labeling) can be
constructed in O(m n logn)
14
11
13
10
12
6
7
8
9
3
4
5
1
2
An efficient procedure can compute and compress
the transitive closure in O(mk), k is number of
paths in path-tree
19
Key Problems
  • How to construct a path-tree?
  • Algorithm
  • How can path-tree help with reachability query?
  • Labeling
  • Transitive Closure Compression
  • How does path-tree compare with the existing
    methods?
  • Optimality

20
Theoretical Analysis
  • Optimal Path-Tree Cover (OPTC) Problem
  • Given a path-decomposition, what is the optimal
    path-tree cover to maximally compress the
    transitive closure?
  • OptIndex weight assignment based on computing the
    predecessor set
  • Optimal Path-Decomposition (OPD) Problem
  • Assuming we only use path-decomposition to
    compress the transitive closure, what is the
    optimal path-decomposition to maximally compress
    the transitive closure?
  • Minimal-cost flow problem
  • What is the overall optimal path-decomposition?

21
Superiority of Path-Tree Cover
  • The optimal tree cover is a special case of
    path-tree cover when each vertex corresponds to a
    single path and the weight is based on OptIndex.
  • The path-tree cover approach can compress the
    transitive closure with size being smaller than
    or equal to the optimal tree cover approach (and
    consequently optimal chain cover approach).

22
Experimental Evaluation
  • Implementation in C
  • 12 Real datasets used in Dual-labeling paper and
    GRIPP paper
  • Synthetic datasets
  • Sparse DAG with edge density 2
  • AMD Opteron 2.0GHz/ 2GB/ Linux
  • PTree1 (OptIndex) and PTree2
  • Mainly compare with Optimal Tree Cover

23
Real Datasets
24
Experimental Result (Real Data)
On average 10 times better than Tree
On average 3 times better than Tree
25
Experimental Result (Synthetic Data)
26
Experimental Result (Synthetic Data)
27
Experimental Result (Synthetic Data)
28
Conclusion
  • A novel Path-Tree structure is proposed to assist
    the compression of transitive closure and
    answering reachability query
  • Path-tree has potential to integrate with other
    existing methods to further improve the
    efficiency of reachability query processing

29
Thanks!!
30
Step 3 Path-Graph Construction
Weight reflects the penalty if we exclude this
path-tree edge
15
14
P2
11
2
4
13
10
12
5
P4
P1
2
2
1
1
6
7
8
9
1
P4
P3
3
4
5
P3
Weighted Directed Path-Graph
1
2
P2
P1
31
Step 2 Constructing Minimal Equivalent Edge Set
(Pi?Pj)
  • Ordering the vertices in Pi and Pj by decreasing
    order
  • Finding the first vertex v in P_j that P_i can
    reach
  • Finding the last vertex u in P_i that reach v
  • Removing all the edges cross (u,v) and
  • repeat 2-4

32
3-Tuple Labeling for Reachability
15
1,3
P2
14
11
1,4
P4
13
10
12
P1
1,1
2,2
6
7
8
9
P3
P4
3
4
5
Interval labeling (2-tuple) High-level
description about paths Pi ? Pj ?
P3
1
2
P2
P1
DFS labeling (1-tuple)
Write a Comment
User Comments (0)
About PowerShow.com