Title: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence
1COM (Co-Occurrence Miner)Graph Classification
Based on Pattern Co-occurrence
- Ning Jin, Calvin Young, Wei Wang
- University of North Carolina at Chapel Hill
- 11/04/2009
2What Are Graphs?
- Graph
- a set of nodes connected by a set of edges
- nodes and edges can have labels
- edges can have directions
1
2
1
2
3Graph Classification Example
Negative set
Positive set
4Graph Classification Example
Negative set
Positive set
5Graph Classification Example
Negative set
Positive set
6Graph Representation
Represented by
Represented by
graphs
7Interesting Properties in Data
Determined by structure
Determined by structure
most
some
some
most
most
some
8Graph Classification
positive
negative
positive
negative
Classify
Function is determined by structure
becomes
Classify graphs
9Graph ClassificationUsing Frequent Subgraph
Patterns
The positive graphs should have Some common
subgraph patterns that negative graphs dont
have Generate classifiers
Frequent subgraph mining in the positive set
(frequency gt threshold)
Feature selection
High dimensional data points classification
10Graph ClassificationUsing Frequent Subgraph
Patterns
The positive graphs should have Some common
subgraph patterns that negative graphs dont
have Generate classifiers
Frequent subgraph mining in the positive set
Feature selection
High dimensional data points classification
11Graph ClassificationUsing Discriminative
Subgraph Patterns
Frequent subgraph mining in the positive set
Mining discriminative/significant subgraph
patterns
merge
Feature selection
Scoring function
Pattern redundancy Pattern 1 found in positive
graphs P1, P2 and in negative graphs N1,
N2 Pattern 2 found in positive graphs P1, P2, P3
and in negative graphs N1 Pattern 1 is redundant
given pattern 2
12Graph ClassificationUsing Discriminative
Subgraph Patterns
Frequent subgraph mining in the positive set
Mining discriminative/significant subgraph
patterns
merge
Feature selection
Scoring function
Pattern redundancy Pattern 1 found in positive
graphs P1, P2 and in negative graphs N1,
N2 Pattern 2 found in positive graphs P1, P2, P3
and in negative graphs N1 Pattern 1 is redundant
given pattern 2
13Previous Discriminative Pattern Mining Methods
- Each tree node represents a subgraph pattern
- Each node is a supergraph of its parent node,
with one more edge - One subgraph pattern corresponds to only one node
Pattern redundancy Pattern 1 found in positive
graphs G1, G2 and in negative graphs G4,
G5 Pattern 2 found in positive graphs G1, G2, G3
and in negative graphs G4 Pattern 1 is redundant
given pattern 2
Scoring function
141. Heuristic Exploration Order
Pattern 1
Pattern 2
Pattern redundancy Pattern 1 found in positive
graphs G1, G2 and in negative graphs G4,
G5 Pattern 2 found in positive graphs G1, G2, G3
and in negative graphs G4 Pattern 1 is redundant
given pattern 2
15Heuristic Exploration Order Delta Score
Pattern p
Large absolute value
Pattern p
Large derivative
Delta score of p score of p score of p
Its like looking for maximum of a function
16Heuristic Exploration Order Delta Score
Pattern p
Pattern p
Delta score of p score of p score of p
17Workflow of Pattern Exploration
Collect frequent edges in the positive set and
insert into a heap H
A frequency threshold tp is needed
If H not empty
terminate
Pop from H the pattern p with the highest delta
score
Extend pattern p and insert new non-redundant
patterns into H
182. Use Co-occurrences of Patterns
D
D
B
B
A
A
A
Can be approximated by
C
C
D
D
Co-occurrence
D
D
B
B
Graph G
A
A
A
Graph G
C
C
D
D
19When Co-occurrence Is Superior
Separately A-B N1, N2, P1, P2, P3, P4 B-C N3,
N4, P1, P2, P3, P4 Co-occurrence of A-B and
B-C P1, P2, P3, P4 No negative graphs
20Co-occurrence Generation
Candidate co-occurrence 1
For each new pattern p
Candidate co-occurrence 2
Pattern p
Candidate co-occurrence 3
Candidate co-occurrence 4
insert
Union of pattern p and candidate co-occurrence k
insert
Candidate co-occurrence n
merging candidate k and pattern p can improve the
score of p most significantly
A co-occurrence is a set of subgraph patterns
p1, p2, , pm
213. Use Association Rules to Classify
Association Rule p1, p2, p3, , pn ?
positive Input of COM (Co-Occurrence rule
Miner) Positive graph set, negative graph
set Frequency threshold tp of classification rule
in the positive set frequency threshold tn in
the negative set Output of COM A set of
association rules
22Association Rule Generation
Terminate when each positive graph is covered
If a rule satisfies gttp and lttn, it is a
resulting rule
Each candidate co-occurrence corresponds to a
candidate association rule
Remove redundant rules
23Experiments Datasets
Protein datasets Six SCOP families
Chemical datasets Six PubChem bioassays
24Experiments Parameters Evaluation
Protein datasets tp 30, tn 0 Chemical
datasets tp 1, tn 0.4
25Experimental Results Protein Datasets
26Experimental Results Chemical Datasets
27Conclusions
- Using heuristic pattern exploration order and
co-occurrences can improve runtime efficiency of
mining discriminative patterns - Using association rules can achieve competitive
classification accuracy
28Questions Suggestions