Title: Mining Graphs
1Mining Graphs
2Frequent Subgraph Mining
- Extend association rule mining to finding
frequent subgraphs - Useful for computational chemistry, Web Mining,
spatial data sets, etc.
Homepage
Teaching
Databases
Data Mining
3Example
- In drug discovery, the goal is to identify common
parts in molecules sharing similar chemical
properties. - Use the two dimensional atom-bond structure of
molecules. - The database is searched for subgraphs that
appear at least in a certain number of molecules. - A famous example for a frequent molecular
fragment is the so called AZT, which is a
well-known HIV-1 inhibitor (see Figure on the
right)
4Graph Definitions
5Mining Subgraphs
6Apriori-Like Approach
- Support
- number of graphs that contain a particular
subgraph - Apriori principle still holds
- Level-wise (Apriori-like) approach
- Vertex growing
- k is the number of vertices
- Edge growing
- k is the number of edges
7Apriori-Like Algorithm
- Generate candidate
- Merge pairs of frequent (k - 1)-subgraphs to
obtain a candidate k-subgraphs. - Prune candidates
- Discard all candidate k-subgraphs that contain
infrequent (k - l)-subgraphs. - Count support
- Counting the number of graphs in DB that contain
each candidate. - Discard all candidate subgraphs whose support
counts are less than minsup.
8Vertex Growing
9Edge Growing
Edge growing inserts a new edge to an existing
frequent subgraph during candidate
generation. Doesnt necessarily increase the
number of vertices in the original graphs.
10Topological equivalence
- Two vertices are topologically equivalent if they
have - The same label and
- The same number and label of edges incident to
them.
v1,v4 are topologically equivalent v2,v3 are
topologically equivalent
No topologically equivalent vertices
v1,v2,v3,v4 are topologically equivalent
11Multiplicity of Candidates
Case 1a v ? v , v1?v2 (Topologically in the
(k-2)-graphs)
Core The (k-2)-edge subgraph that is common
between the joint graphs
We try to map the cores.
12Multiplicity of Candidates
Case 1b v ? v , v1v2 (Topologically in the
(k-2)-graphs)
13Multiplicity of Candidates
Case 2a v ? v , v1?v2 (Topologically in the
(k-2)-graphs)
14Multiplicity of Candidates
Case 2b v ? v , v1v2 (Topologically in the
(k-2)-graphs)
15Multiplicity of Candidates
Case 2c v ? v (Topologically in the
(k-2)-graphs)
We try to map the cores, and there two ways to do
this.
16Multiplicity of Candidates
Case 2d v ? v (Topologically in the
(k-2)-graphs)
We try to map the cores, and there two ways to do
this.
17Multiplicity of Candidates
- More than two topologically equivalent vertexes
Core The (k-2) subgraph that is common
between the joint graphs
18Adjacency Matrix Representation
- The same graph can be represented in many ways
19Graph Isomorphism
- A graph G1 is isomorphic to another graph G2, if
G1 is topologically equivalent to G2 - Test for graph isomorphism is needed
- During candidate generation, to determine whether
a candidate can be generated - During candidate pruning, to check whether its
(k-1)-subgraphs are frequent - During candidate counting, to check whether a
candidate is contained within another graph, we
should use more specialized algorithms (possibly
using indexes with each frequent (k-1) sub-graph)
20Codes
Code 1 10 011 1000 01001 001010 0001011
Code 1011010010100000100110001110
21Graph Isomorphism
- Use canonical labeling to handle isomorphism
- Map each graph into an ordered string
representation (known as its code) such that two
isomorphic graphs will be mapped to the same
canonical encoding - Example
- Choose the string representation with the lowest
- Lexicographical value
- Then, the graph isomorphism problem can be solved
by string matching.