Frequent Subgraph Pattern Mining on Uncertain Graph Data - PowerPoint PPT Presentation

About This Presentation
Title:

Frequent Subgraph Pattern Mining on Uncertain Graph Data

Description:

Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM 09, Hong Kong – PowerPoint PPT presentation

Number of Views:218
Avg rating:3.0/5.0
Slides: 30
Provided by: eduh75
Category:

less

Transcript and Presenter's Notes

Title: Frequent Subgraph Pattern Mining on Uncertain Graph Data


1
Frequent Subgraph Pattern Miningon Uncertain
Graph Data
  • Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang
  • Harbin Institute of Technology, China
  • CIKM09, Hong Kong
  • Nov 4, 2009

2
Outline
  • Background
  • Problem Definition
  • Algorithm
  • Experimental Results
  • Conclusions

3
Background
  • Graph mining has played an important role in a
    range of real world applications.
  • medicines structures of molecules
  • bioinformatics biological networks
  • technologies WWW
  • social science social networks
  • many others

4
Directions of Graph Mining
Models of graphse.g. Leskovec et al. KDD05
Patterns of graphse.g., Yan et al. ICDM02
Uncertainties of graphs
Privacy of graphse.g., Zou et al. VLDB09
Evolution of graphse.g., Faloutsos et al.
SIGMOD07
5
Uncertainties of Graphs Example I
  • Protein-Protein Interaction (PPI) Networks
  • Vertices proteins
  • Edges interactions between proteins
  • Uncertainties probabilities of interactions
    really existing

TIF34
0.375
0.639
0.867
0.651
0.651
FET3
0.147
0.639
0.698
NTG1
SMT3
RAD59
RPC40
The data are taken from the STRING Database
(http//string-db.org).
6
Uncertainties of Graphs Example II
  • Topologies of wireless sensor networks (WSNs)
  • Vertices sensor nodes
  • Edges wireless links between sensor nodes
  • Uncertainties probabilities of wireless links
    functioning at any given time

0.75
0.95
0.88
0.92
0.69
7
The Goal of This Paper
Models of graphse.g. Leskovec et al. KDD05
Patterns of graphse.g., Yan et al. ICDM02
Uncertainties of graphs
Privacy of graphse.g., Zou et al. VLDB09
Evolution of graphse.g., Faloutsos et al.
SIGMOD07
8
Outline
  • Background
  • Problem Definition
  • Algorithm
  • Experimental Results
  • Conclusions

9
Preliminaries
Graph Database
Subgraph Pattern
support 1.0
support 0.5
The support of S the number of graphs
containing S
the total number of graphs
10
Frequent Subgraph Pattern Mining Problem
  • Input a graph database D, and a support
    threshold minsup
  • Output all subgraph patterns with support no
    less than minsup
  • FSP mining on biological networks (e.g., PPI
    networks) is an important tool for discovering
    functional modules Koyutürk et al.
    Bioinformatics 04, Turanalp et al. BMC
    Bioinformatics 08.
  • PPI networks are subject to uncertainties.
  • How do we define support?

11
Model of Uncertain Graphs
(1 0.5) 0.6 0.7 0.8 0.168
Uncertain Graph
0.5 (1 0.6) 0.7 0.8 0.112
12
Model of Uncertain Graphs (Contd)
Theorem An uncertain graph represents a
probability distribution over all its implicated
graphs.
13
Uncertain Graph Databases
Theorem An uncertain graph DB represents a
probability distribution over all its implicated
graph DBs.
Totally, 24 23 128 implicated graph databases.
Implicated Graph Database
((1 0.5) 0.6 0.7 0.8) (0.8 0.1 (1
0.7)) 4.032 10-3
14
Expected Support
D
uncertain graph DB
p1 Pr(D implicates d1)
p2 Pr(D implicates d2)
pn Pr(D implicates dn)
s1 support of S in d1
s2 support of S in d2
sn support of S in dn
The expected support of S is
15
FSP Mining Problem on Uncertain Graphs
  • Input an uncertain graph database D, and an
    expected support threshold minsup
  • Output all subgraph patterns with expected
    support no less than minsup
  • It is P-hard to count the number of frequent
    subgraph patterns.
  • Reduction from the problem of counting the number
    of satisfying truth assignments of a monotone
    k-CNF formula.
  • The FSP mining problem on uncertain graphs is
    NP-hard.

16
Outline
  • Background
  • Problem Definition
  • Algorithm
  • Experimental Results
  • Conclusions

17
Approximation Method
  • It is P-hard to compute the expected support of
    a subgraph pattern.
  • We develop an approximation method to find an
    approximate set of frequent subgraph patterns.
  • Let e (0 lt e lt 1) be a relative error tolerance.

Output
Discard
Arbitrary
expected support
1
0
minsup
(1-e) minsup
18
Objective I
  • Difficulty I of frequent subgraph patterns is
    exponentially large.
  • Objective I Examine subgraph patterns as
    efficiently as possible to find all frequent ones.

19
Method for Objectives I
  • Step 1 Build a search tree T of subgraph
    patterns.
  • Step 2 Examine subgraph patterns in T in
    depth-first order
  • If S is infrequent, then all its descendents can
    be pruned.

20
Objective II
  • Difficulty II It is P-hard to compute the
    expected support esup(S) of a subgraph pattern S.
  • Objective II Make the following judgments
    without computing esup(S) exactly.
  • If esup(S) is surely not in the green region,
    then discard.
  • If esup(S) is probable to be in the green region
    and surely not in the red region, then output.

21
Method for Objective II
  • Step 1 Approximate esup(S) by an interval l, u
    such that esup(S)?l, u.
  • Step 2 Decide whether S can be output or not by
    testing the following conditions.

Output
Discard
Shrink
22
Approximating esup(S) by l,u
A subgraph pattern S occurs in an uncertain graph
G if S is contained in at least one implicated
graph of G.

Algorithm Approximate esup(S) by l,u Step 1
For each uncertain graph Gi in D, approximate
Pr(S occurs in Gi) by an interval li, ui of
width at most eminsup. Step 2
23
Approximate Pr(S occurs in Gi) by li, ui
Step 1 Find all embeddings of S in Gi.
4 embeddings Step 2 Assign boolean
variables to the edges in the embeddings. Pr(x1)
0.5, Pr(x2) 0.6, Pr(x3) 0.7, Pr(x4)
0.8. Step 3 Construct a conjunctive formula for
each embedding. C1 (x1 x2), C2 (x1 x4),
C3 (x2 x3), C4 (x3 x4). Step 4 Construct
a DNF formula. F C1 V C2 V C3 V C4. Step 5
Estimate Pr(F TRUE) by p using Karp Lubys
Markov-Chain Monte-Carlo
method with absolute error eminsup/2 and
confidence d (d ?0,1). Step 6 li, ui p -
eminsup/2, p eminsup/2.
24
Outline
  • Background
  • Problem Definition
  • Algorithm
  • Experimental Results
  • Conclusions

25
Experimental Results
  • Data
  • The STRING Database (http//string-db.org)

26
Time Efficiency
27
Approximation Quality
28
Scalability
29
Conclusions
  • A new model of uncertain graph data has been
    proposed.
  • The frequent subgraph pattern mining problem on
    uncertain graph data has been formalized.
  • The computational complexity of the problem has
    been formally proved to be NP-hard.
  • An approximate mining algorithm has been
    proposed.
  • The proposed algorithm has high efficiency, high
    approximation quality, and high scalability.

30
Thank you
Write a Comment
User Comments (0)
About PowerShow.com