Title: Distance-Constraint Reachability Computation in Uncertain Graphs
1Distance-Constraint Reachability Computation in
Uncertain Graphs
- Ruoming Jin, Lin Liu Kent
State University
Bolin Ding UIUC
Haixun Wang MSRA
2Why Uncertain Graphs?
Increasing importance of graph/network data
Social Network,
Biological Network, Traffic/Transportation
Network, Peer-to-Peer Network
Probabilistic perspective gets more and more
attention recently.
Uncertainty is ubiquitous!
Protein-Protein Interaction Networks
Social Networks
Probabilistic Trust/Influence Model
False Positive gt 45
3Uncertain Graph Model
Edge Independence
Existence Probability
G2
G1
Weight of G2 Pr(G2)
0.5
(1-0.5)
0.2
0.6
0.7
(1-0.3)
(1-0.1)
(1-0.4)
(1-0.9)
0.0007938
4Distance-Constraint Reachability (DCR) Problem
Given distance constraint d and two vertices s
and t,
Target
Source
- What is the probability that s can
reach t within distance d? - A generalization of the two-terminal network
reliability problem, which has no distance
constraint.
5Important Applications
- Peer-to-Peer (P2P) Networks
- Communication happens only when node distance is
limited. - Social Networks
- Trust/Influence can only be propagated only
through small number of hops. - Traffic Networks
- Travel distance (travel time) query
- What is the probability that we can reach the
airport within one hour?
6Example Exact Computation
First Step Enumerate all possible worlds (29),
Pr(G1)
Pr(G2)
Pr(G3)
Pr(G4)
Second Step Check for distance-constraint
connectivity,
Pr(G1)
0
Pr(G2)
Pr(G3)
Pr(G4)
1
0
1
7Approximating Distance-Constraint Reachability
Computation
- Hardness
- Two-terminal network reliability is P-Complete.
- DCR is a generalization.
- Our goal is to approximate through Sampling
- Unbiased estimator
- Minimal variance
- Low computational cost
8- Start from the most intuitive estimators, right?
9Direct Sampling Approach
- Sampling Process
- Sample n graphs
- Sample each graph according to edge probability
10Direct Sampling Approach (Cont)
- Estimator
- Unbiased
- Variance
1, s reach t within d 0, otherwise.
Indicator function
11Path-Based Approach
- Generate Path Set
- Enumerate all paths from s to t with length d
- Enumeration methods
- E.g., DFS
12Path-Based Approach (Cont)
- Path set
-
- Exactly computed by Inclusion-Exclusion principle
- Approximated by Monte-Carlo Algorithm by R. M.
Karp and M. G. Luby ( ) - Unbiased
- Variance
13 14Divide-and-Conquer Methodology
(s,a)
-(s,a)
-(s,b)
(s,b)
-(a,t)
(a,t)
15Divide and Conquer (Cont)
Summarize
- of leaf nodes is smaller than 2E .
- Each possible world exists only in one leaf
node. - Reachability is the sum of the weights of blue
nodes. - Leaf nodes form a nice sample space.
all possible worlds
Graphs having e1
Graphs not Having e1
s can reach t.
s can not reach t.
16How do we sample?
Start from here
Pri Sample Unit Weight Sum of possible worlds
probabilities in the node. qi sampling
probability, determined by properties of coins
along the way.
- Unequal probability sampling
- Hansen-Hurwitz (HH) estimator
- Horvitz-Thomson (HT) estimator
Sample Unit
17Hansen-Hurwitz (HH) Estimator
sample size
1, blue node 0, red node
- Estimator
- Unbiased
- Variance
Weight
Sampling probability
To minimize the variance above, we have Pri qi
Pri p(e1)p(e2)(1-p(e3))
Pri the leaf node weight qi the sampling
probability
P(e1)
1-P(e1)
p(e1) 1 p(e1)
P(e2)
1-P(e2)
1-P(e4)
P(e4)
1-P(e3)
P(e3)
p(e2) 1 p(e2)
p(e3) 1 p(e3)
18Horvitz-Thomson (HT) Estimator
of Unique sample units
- Estimator
- Unbiased
- Variance
- To minimize vairance, we find
- Pri qi
- Smaller variance than HH estimator
19-
- Can we further reduce the variance and
computational cost?
20Recursive Estimator
- Unbiased
- Variance
n1 n2 n
Sample the entire space n times
Sample the sub-space n1 times
Sample the sub-space n2 times
We can not minimize the variance without knowing
t1 and t2. Then what can we do?
21Sample Allocation
- We guess What if
- n1 np(e)
- n2 n(1-p(e))?
- We find Variance reduced!
- HH Estimator
- HT Estimator
22Sample Allocation (Cont)
Sample size n
Directly allocate samples
n1np(e1)
n2n(1-p(e1))
n3n1p(e2)
n4n1(1-p(e2))
Toss coin when sample size is small
23Experimental Setup
- Experiment setting
- Goal
- Relative Error
- Variance
- Computational Time
- System Specification
- 2.0GHz Dual Core AMD Opteron CPU
- 4.0GB RAM
- Linux
24Experimental Results
- Synthetic datasets
- Erdös-Rényi random graphs
- Vertex 5000, edge density 10, Sample size
1000 - Categorized by extracted-subgraph size (edge)
- For each category, 1000 queries
25Experimental Results
- Real datasets
- DBLP 226,000 vertices, 1,400,000 edges
- Yeast PPIN 5499 vertices, 63796 edges
- Fly PPIN 7518 vertices, 51660 edges
- Extracted subgraphs size 20 50 edges
26Conclusions
- We first propose a novel s-t distance-constraint
reachability problem in uncertain graphs. - One efficient exact computation algorithm is
developed based on a divide-and-conquer scheme. - Compared with two classic reachability
estimators, two significant unequal probability
sampling estimators Hansen-Hurwitz (HH) estimator
and Horvitz-Thomson (HT) estimator. - Based on the enumeration tree framework, two
recursive estimators Recursive HH, and Recursive
HT are constructed to reduce estimation variance
and time. - Experiments demonstrate the accuracy and
efficiency of our estimators.
27