Association Analysis (7) (Mining Graphs) - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Association Analysis (7) (Mining Graphs)

Description:

Extend association rule mining to finding frequent subgraphs ... A graph G1 is isomorphic to another graph G2, if G1 is topologically equivalent to G2 ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 24
Provided by: alext8
Category:

less

Transcript and Presenter's Notes

Title: Association Analysis (7) (Mining Graphs)


1
Association Analysis (7)(Mining Graphs)
2
Frequent Subgraph Mining
  • Extend association rule mining to finding
    frequent subgraphs
  • Useful for Web Mining, computational chemistry,
    spatial data sets, etc

Homepage
Teaching
Databases
Data Mining
3
Bio/Chem-Informatics
  • Each year, new chemical compounds are designed.
  • We know that structure of a compound plays a big
    role in its chemical properties.
  • However, it is difficult to establish their exact
    relationship.
  • Frequent subgraph mining can aid by identifying
    the substructures commonly associated with
    certain properties of known compounds.

4
Web mining
  • E.g. Mining the DBLP Web Graph

Two examples of matches
5
Graph Definitions
6
Mining Subgraphs
7
The Exhaustive WayListing all...
8
Apriori-Like Approach
  • Support
  • number of graphs that contain a particular
    subgraph
  • Apriori principle still holds
  • Level-wise (Apriori-like) approach
  • Vertex growing
  • k is the number of vertices
  • Edge growing
  • k is the number of edges

9
Apriori-Like Algorithm
  • Generate candidate
  • Merge pairs of frequent (k - 1)-subgraphs to
    obtain a candidate k-subgraphs.
  • Prune candidates
  • Discard all candidate k-subgraphs that contain
    infrequent (k - l)-subgraphs.
  • Count support
  • Counting the number of graphs in DB that contain
    each candidate.
  • Discard all candidate subgraphs whose support
    counts are less than minsup.

10
Vertex Growing
r
The resulting matrix is the first matrix,
appended with the last row and last column of the
second matrix. The remaining entries of the new
matrix are either zero or replaced by all valid
edge labels connecting the pair of vertices.
11
Edge Growing
Edge growing inserts a new edge to an existing
frequent subgraph during candidate
generation. Doesnt necessarily increase the
number of vertices in the original graphs.
12
Topological equivalence
  • Two vertexes are topologically equivalent if they
    have
  • The same label and
  • The same number and label of edges incident to
    them.

v1,v4 are topologically equivalent v2,v3 are
topologically equivalent
No topologically equivalent vertexes
v1,v2,v3,v4 are topologically equivalent
13
Multiplicity of Candidates
Case 1a v ? v , v1?v2 (Topologically in the
(k-2)-graphs)
Core The (k-2)-edge subgraph that is common
between the joint graphs
We try to map the cores.
14
Multiplicity of Candidates
Case 1b v ? v , v1v2 (Topologically in the
(k-2)-graphs)
15
Multiplicity of Candidates
Case 2a v ? v , v1?v2 (Topologically in the
(k-2)-graphs)
16
Multiplicity of Candidates
Case 2b v ? v , v1v2 (Topologically in the
(k-2)-graphs)
17
Multiplicity of Candidates
Case 2c v ? v (Topologically in the
(k-2)-graphs)
We try to map the cores, and there two ways to do
this.
18
Multiplicity of Candidates
Case 2d v ? v (Topologically in the
(k-2)-graphs)
We try to map the cores, and there two ways to do
this.
19
Multiplicity of Candidates
  • More than two topologically equivalent vertexes

b
a
c
a
a
a
a
a
b
c
a
b
c

a
a
a
a
a
a
a
a
a
a
c
a
Core The (k-2) subgraph that is common
between the joint graphs
a
b
a
20
Adjacency Matrix Representation
  • The same graph can be represented in many ways

21
Graph Isomorphism
  • A graph G1 is isomorphic to another graph G2, if
    G1 is topologically equivalent to G2
  • Test for graph isomorphism is needed
  • During candidate generation, to determine whether
    a candidate can be generated
  • During candidate pruning, to check whether its
    (k-1)-subgraphs are frequent
  • During candidate counting, to check whether a
    candidate is contained within another graph, we
    should use more specialized algorithms (possibly
    using indexes with each frequent (k-1) sub-graph)

22
Codes
Code 1 10 011 1000 01001 001010 0001011
Code 1011010010100000100110001110
23
Graph Isomorphism
  • Use canonical labeling to handle isomorphism
  • Map each graph into an ordered string
    representation (known as its code) such that two
    isomorphic graphs will be mapped to the same
    canonical encoding
  • Example
  • Choose the string representation with the lowest
  • Lexicographical value
  • Then, the graph isomorphism problem can be solved
    by string matching.
Write a Comment
User Comments (0)
About PowerShow.com