Efficiently Answering Reachability Queries on Large Directed Graphs - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Efficiently Answering Reachability Queries on Large Directed Graphs

Description:

Efficiently Answering Reachability Queries on Large Directed Graphs – PowerPoint PPT presentation

Number of Views:218

Avg rating:3.0/5.0

Slides: 33

Provided by: ksu7

Category:

more less

Transcript and Presenter's Notes

Title: Efficiently Answering Reachability Queries on Large Directed Graphs

1
Efficiently Answering Reachability Queries on
Large Directed Graphs

Ruoming Jin
Kent State University
Joint work with Yang Xiang (KSU), Ning Ruan
(KSU), and Haixun Wang (IBM T.J. Watson)

2
Reachability Query
The problem Given two vertices u and v in a
directed graph G, is there a path from u to v ?
15

?Query(1,11)
Yes
?Query(3,9)
No

14
11
13
10
12
6
7
8
9
3
4
5
1
2
Directed Graph ? DAG (directed acyclic graph) by
coalescing the strongly connected components
3
Applications

XML
Biological networks
Ontology
Knowledge representation (Lattice operation)
Object programming (Class relationship)
Distributed systems (Reachable states)

Graph Databases
4
Prior Work
2-HOP (O(nm1/2), and O(n4)), HOPI, and heuristic
algorithms
5
Limitation of Tree-based approaches

Finding a good tree cover is expensive
Tree cover cannot represent some common types of
DAGs, like Grid
Compression limitations
Chain (1-parent, 1-child)
Tree (1-parent, multiple children)
Most existing methods which utilize the tree
cover are greatly affected by how many edges are
left uncovered

6
Overview of Path-Tree

Chain-gtTree-gtPath-Tree (2 parents / multiple
children)
Path-tree cover is a spanning subgraph of G in a
tree shape (T)
A node in the tree T corresponds to a path in G
and an edge in T corresponds to the edges between
two paths in G
3-tuple labeling exists for any path-tree to
answer reachability query in O(1)

7
Path-Tree in a Nutshell
15
14
P4
11
13
10
12
P2
6
7
8
9
P4
P1
P3
3
4
5
P3
1
2
P2
P1
Path-Graph is not necessarily a planar graph The
reachability between any two nodes can be
answered in O(1)
8
Key Problems

How to construct a path-tree?
Algorithm
How can a path-tree help with reachability
queries?
Labeling
Transitive Closure Compression
How does path-tree compare with the existing
methods?
Optimality

9
Constructing Path-Tree

Step 1 Path-Decomposition of DAG
Step 2 Minimal Equivalent Edge Set between any
two paths
Step 3 Path-Graph Construction
Step 4 Path-Tree Cover Extraction

10
Step 1 Path-Decomposition
15
(PID,SID) (2, 5)
14
11
For any two nodes (u, v) in the same path, u ?
v if and only if (u.sid ? v.sid)
13
10
12
6
7
8
9
P4
3
4
5
P3
1
2
P2
P1
Simple linear algorithm based on topological sort
can achieve a path-decomposition
11
Step 2 Minimal equivalent edge set

The reachability between any two paths can be
captured by a unique minimal set of edges

15
15
14
14
11
11
13
10
13
10
6
7
P1? P2
P1 ? P2
6
7
3
4
3
4
1
2
1
2
P2
P2
P1
P1
The edges in the minimal equivalent edge set do
not cross (always parallel)!
12
Step 3 Path-Graph Construction
Weight reflects the cost we have to pay for the
transitive closure computation if we exclude this
path-tree edge
15
14
P2
11
2
4
13
10
12
5
P4
P1
2
2
1
1
6
7
8
9
1
P4
P3
3
4
5
P3
Weighted Directed Path-Graph
1
2
P2
P1
13
Step 4 Extracting Path-Tree Cover
P2
P2
2
2
4
5
5
P4
P4
P1
P1
2
2
2
1
1
1
P3
P3
Weighted Directed Path-Graph
Maximal Directed Spanning Tree
Chu-Liu/Edmonds algorithm, O(m k logk)
14
Key Problems

How to construct a path-tree?
Algorithm
How can path-tree help with reachability queries?
Labeling
Transitive Closure Compression
How does path-tree compare with the existing
methods?
Optimality

15
3-Tuple Labeling for Reachability
15
1,3
P2
14
11
1,4
P4
13
10
12
P1
1,1
2,2
6
7
8
9
P3
P4
3
4
5
Interval labeling (2-tuple) High-level
description about paths Pi ? Pj ?
P3
1
2
P2
P1
DFS labeling (1-tuple)
16
DFS labeling
4
15
14
10
2
1
9
7
P3
P1
5
13
15
1
3
6
8
14
6
11
3
13
8
P2
4
10
11
2
7
12
5
P4
9
12

Starting from the first vertex in the root-path
Always try to visit the next vertex in the same
path
Label a node when all its neighbors has been
visited
L(v)N-x, x is the of nodes has been
labeled

17
3-Tuple Labeling for Reachability
4
15
14
10
2
1
9
7
P3
P1
5
13
15
1
3
6
8
14
6
11
3
13
8
P2
4
10
11
2
7
12
5
P4
1,3
9
12
P2
u?v if and only if 1) Interval label I(u) ??
I(v) 2) DFS label L(u) ? L(v)
?Query(9,15) P41,4 ?? P11,1 and 5 lt
15 Yes ?Query(9,2)?Query(5,9)
1,4
P4
P1
1,1
2,2
P3
18
Transitive Closure Compression
15
Path-tree cover (including labeling) can be
constructed in O(m n logn)
14
11
13
10
12
6
7
8
9
3
4
5
1
2
An efficient procedure can compute and compress
the transitive closure in O(mk), k is number of
paths in path-tree
19
Key Problems

How to construct a path-tree?
Algorithm
How can path-tree help with reachability query?
Labeling
Transitive Closure Compression
How does path-tree compare with the existing
methods?
Optimality

20
Theoretical Analysis

Optimal Path-Tree Cover (OPTC) Problem
Given a path-decomposition, what is the optimal
path-tree cover to maximally compress the
transitive closure?
OptIndex weight assignment based on computing the
predecessor set
Optimal Path-Decomposition (OPD) Problem
Assuming we only use path-decomposition to
compress the transitive closure, what is the
optimal path-decomposition to maximally compress
the transitive closure?
Minimal-cost flow problem
What is the overall optimal path-decomposition?

21
Superiority of Path-Tree Cover

The optimal tree cover is a special case of
path-tree cover when each vertex corresponds to a
single path and the weight is based on OptIndex.
The path-tree cover approach can compress the
transitive closure with size being smaller than
or equal to the optimal tree cover approach (and
consequently optimal chain cover approach).

22
Experimental Evaluation

Implementation in C
12 Real datasets used in Dual-labeling paper and
GRIPP paper
Synthetic datasets
Sparse DAG with edge density 2
AMD Opteron 2.0GHz/ 2GB/ Linux
PTree1 (OptIndex) and PTree2
Mainly compare with Optimal Tree Cover

23
Real Datasets
24
Experimental Result (Real Data)
On average 10 times better than Tree
On average 3 times better than Tree
25
Experimental Result (Synthetic Data)
26
Experimental Result (Synthetic Data)
27
Experimental Result (Synthetic Data)
28
Conclusion

A novel Path-Tree structure is proposed to assist
the compression of transitive closure and
answering reachability query
Path-tree has potential to integrate with other
existing methods to further improve the
efficiency of reachability query processing

29
Thanks!!
30
Step 3 Path-Graph Construction
Weight reflects the penalty if we exclude this
path-tree edge
15
14
P2
11
2
4
13
10
12
5
P4
P1
2
2
1
1
6
7
8
9
1
P4
P3
3
4
5
P3
Weighted Directed Path-Graph
1
2
P2
P1
31
Step 2 Constructing Minimal Equivalent Edge Set
(Pi?Pj)

Ordering the vertices in Pi and Pj by decreasing
order
Finding the first vertex v in P_j that P_i can
reach
Finding the last vertex u in P_i that reach v
Removing all the edges cross (u,v) and
repeat 2-4

32
3-Tuple Labeling for Reachability
15
1,3
P2
14
11
1,4
P4
13
10
12
P1
1,1
2,2
6
7
8
9
P3
P4
3
4
5
Interval labeling (2-tuple) High-level
description about paths Pi ? Pj ?
P3
1
2
P2
P1
DFS labeling (1-tuple)

Write a Comment

User Comments (0)