Distance-Constraint Reachability Computation in Uncertain Graphs - PowerPoint PPT Presentation

About This Presentation
Title:

Distance-Constraint Reachability Computation in Uncertain Graphs

Description:

Title: PowerPoint Presentation Last modified by: Sissi Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 28
Provided by: vldbOrg2
Learn more at: https://www.vldb.org
Category:

less

Transcript and Presenter's Notes

Title: Distance-Constraint Reachability Computation in Uncertain Graphs


1
Distance-Constraint Reachability Computation in
Uncertain Graphs
  • Ruoming Jin, Lin Liu Kent
    State University

Bolin Ding UIUC
Haixun Wang MSRA
2
Why Uncertain Graphs?
Increasing importance of graph/network data
Social Network,
Biological Network, Traffic/Transportation
Network, Peer-to-Peer Network
Probabilistic perspective gets more and more
attention recently.
Uncertainty is ubiquitous!
Protein-Protein Interaction Networks
Social Networks
Probabilistic Trust/Influence Model
False Positive gt 45
3
Uncertain Graph Model
Edge Independence
Existence Probability
  • Possible worlds (2Edge)

G2
G1
Weight of G2 Pr(G2)
0.5
(1-0.5)
0.2
0.6
0.7




(1-0.3)
(1-0.1)
(1-0.4)
(1-0.9)
0.0007938




4
Distance-Constraint Reachability (DCR) Problem
Given distance constraint d and two vertices s
and t,
Target
Source
  • What is the probability that s can
    reach t within distance d?
  • A generalization of the two-terminal network
    reliability problem, which has no distance
    constraint.

5
Important Applications
  • Peer-to-Peer (P2P) Networks
  • Communication happens only when node distance is
    limited.
  • Social Networks
  • Trust/Influence can only be propagated only
    through small number of hops.
  • Traffic Networks
  • Travel distance (travel time) query
  • What is the probability that we can reach the
    airport within one hour?

6
Example Exact Computation
  • d 2,
    ?

First Step Enumerate all possible worlds (29),
Pr(G1)
Pr(G2)
Pr(G3)
Pr(G4)
Second Step Check for distance-constraint
connectivity,

Pr(G1)
0

Pr(G2)
Pr(G3)
Pr(G4)
1
0
1




7
Approximating Distance-Constraint Reachability
Computation
  • Hardness
  • Two-terminal network reliability is P-Complete.
  • DCR is a generalization.
  • Our goal is to approximate through Sampling
  • Unbiased estimator
  • Minimal variance
  • Low computational cost

8
  • Start from the most intuitive estimators, right?

9
Direct Sampling Approach
  • Sampling Process
  • Sample n graphs
  • Sample each graph according to edge probability

10
Direct Sampling Approach (Cont)
  • Estimator
  • Unbiased
  • Variance

1, s reach t within d 0, otherwise.
Indicator function
11
Path-Based Approach
  • Generate Path Set
  • Enumerate all paths from s to t with length d
  • Enumeration methods
  • E.g., DFS

12
Path-Based Approach (Cont)
  • Path set
  • Exactly computed by Inclusion-Exclusion principle
  • Approximated by Monte-Carlo Algorithm by R. M.
    Karp and M. G. Luby ( )
  • Unbiased
  • Variance

13
  • Can we do better?

14
Divide-and-Conquer Methodology
  • Example

(s,a)
-(s,a)
-(s,b)
(s,b)
-(a,t)
(a,t)






15
Divide and Conquer (Cont)
Summarize
  1. of leaf nodes is smaller than 2E .
  2. Each possible world exists only in one leaf
    node.
  3. Reachability is the sum of the weights of blue
    nodes.
  4. Leaf nodes form a nice sample space.

all possible worlds
Graphs having e1
Graphs not Having e1
s can reach t.
s can not reach t.
16
How do we sample?
Start from here
Pri Sample Unit Weight Sum of possible worlds
probabilities in the node. qi sampling
probability, determined by properties of coins
along the way.
  • Unequal probability sampling
  • Hansen-Hurwitz (HH) estimator
  • Horvitz-Thomson (HT) estimator

Sample Unit
17
Hansen-Hurwitz (HH) Estimator
sample size
1, blue node 0, red node
  • Estimator
  • Unbiased
  • Variance

Weight
Sampling probability
To minimize the variance above, we have Pri qi
Pri p(e1)p(e2)(1-p(e3))
Pri the leaf node weight qi the sampling
probability
P(e1)
1-P(e1)
p(e1) 1 p(e1)
P(e2)
1-P(e2)
1-P(e4)
P(e4)
1-P(e3)
P(e3)
p(e2) 1 p(e2)
p(e3) 1 p(e3)
18
Horvitz-Thomson (HT) Estimator
of Unique sample units
  • Estimator
  • Unbiased
  • Variance
  • To minimize vairance, we find
  • Pri qi
  • Smaller variance than HH estimator

19
  • Can we further reduce the variance and
    computational cost?

20
Recursive Estimator
  1. Unbiased
  2. Variance

n1 n2 n
Sample the entire space n times
Sample the sub-space n1 times
Sample the sub-space n2 times
We can not minimize the variance without knowing
t1 and t2. Then what can we do?
21
Sample Allocation
  • We guess What if
  • n1 np(e)
  • n2 n(1-p(e))?
  • We find Variance reduced!
  • HH Estimator
  • HT Estimator

22
Sample Allocation (Cont)
  • Sampling Time Reduced!!

Sample size n
Directly allocate samples
n1np(e1)
n2n(1-p(e1))
n3n1p(e2)
n4n1(1-p(e2))
Toss coin when sample size is small
23
Experimental Setup
  • Experiment setting
  • Goal
  • Relative Error
  • Variance
  • Computational Time
  • System Specification
  • 2.0GHz Dual Core AMD Opteron CPU
  • 4.0GB RAM
  • Linux

24
Experimental Results
  • Synthetic datasets
  • Erdös-Rényi random graphs
  • Vertex 5000, edge density 10, Sample size
    1000
  • Categorized by extracted-subgraph size (edge)
  • For each category, 1000 queries

25
Experimental Results
  • Real datasets
  • DBLP 226,000 vertices, 1,400,000 edges
  • Yeast PPIN 5499 vertices, 63796 edges
  • Fly PPIN 7518 vertices, 51660 edges
  • Extracted subgraphs size 20 50 edges

26
Conclusions
  • We first propose a novel s-t distance-constraint
    reachability problem in uncertain graphs.
  • One efficient exact computation algorithm is
    developed based on a divide-and-conquer scheme.
  • Compared with two classic reachability
    estimators, two significant unequal probability
    sampling estimators Hansen-Hurwitz (HH) estimator
    and Horvitz-Thomson (HT) estimator.
  • Based on the enumeration tree framework, two
    recursive estimators Recursive HH, and Recursive
    HT are constructed to reduce estimation variance
    and time.
  • Experiments demonstrate the accuracy and
    efficiency of our estimators.

27
  • Thank you !
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com