Direct Discriminative Pattern Mining for Effective Classification - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

Direct Discriminative Pattern Mining for Effective Classification

Description:

But frequent pattern mining is not based on discriminative power ... When a frequent pattern is generated, the transaction id list T( ) is computed ... – PowerPoint PPT presentation

Number of Views:262

Avg rating:3.0/5.0

Slides: 41

Provided by: mehedy

Category:

more less

Transcript and Presenter's Notes

Title: Direct Discriminative Pattern Mining for Effective Classification

1
Direct Discriminative Pattern Mining for
Effective Classification

Hong Cheng, Jiawei Han - UIUC
Xifeng Yan, Philip S. Yu IBM T. J. Watson
ICDE 2008

2
Presentation Overview

Introduction
Frequent Pattern-based Classification
Direct Discriminative Pattern Mining
Experimental Results
Conclusion

3
Introduction

A frequent itemset (pattern)
is a set of items in a dataset with min_sup
frequency
have been explored widely in classification
tasks
Studies achieve very good classification accuracy
associative classification was found to be
competitive
with traditional classification methods SVM,
C4.5
frequent patterns are also promising for
classifying complex structures with high accuracy
such as strings and graphs

4
Introduction (continued)

However, most of these studies make a two-step
process
1. Mine all frequent patterns or association
rules that satisfy min_sup
2. Perform feature selection or rule ranking
procedure

5
Introduction (continued)

Although the two-step process is
straightforward and achieves high accuracy
it could incur high computational cost
Mining could take long time
due to the exponential combinations among items
which is common for dense datasets or
high dimensional data
It could also take very long time if min_sup is
low

6
Introduction (continued)

Classification tasks depend on the frequent
patterns
that are highly discriminative w.r.t. the class
label
But frequent pattern mining is not based on
discriminative power
So, a large number of indiscriminative item sets
can be generated during the mining step
much time is wasted on generating patterns that
are not useful
Feature selection on this large set (e.g.
millions) of patterns will be too costly too

7
Introduction (continued)

Solution?
Instead of generating the complete set of
frequent patterns
directly mine highly discriminative patterns for
classification
This leads to the
Direct Discriminative Pattern Mining approach or
DDPMine.
It integrates the feature selection mechanism
into the mining framework

8
Introduction (continued)

DDPMine mining methodology
Transform data into a compact FP-tree
Search discriminative patterns directly
Outperforms two-step method with significant
speedup

9
Frequent Pattern-based Classification

Learning a classification model
in the feature space of single features
and frequent patterns
Consists of three steps
1. frequent itemset mining
2. feature selection
3. model learning
Both steps 1 and 2 can be a bottleneck
due to exponential combination among items

10
Frequent Pattern-based Classification (continued)

Computational bottleneck
Could take a very long time to generate the
complete set of frequent patterns
It is inefficient to wait forever for the mining
algorithm to finish and then apply feature
selection
because most of the patterns are not useful from
classification (i.e., indiscriminative)
Feature selection can be a bottleneck because of
the explosive number (millions) of frequent
patterns
Even if a linear algorithm (usually it is
polynomial) is employed for feature selection, it
could run slowly

11
Frequent Pattern-based Classification (continued)

Feature selection by sequential coverage

12
Frequent Pattern-based Classification (continued)

Feature selection by sequential coverage (cont)
Input
D a set of training instances
F a set of features
Iteratively applies feature selection
At each step
it selects the feature with the highest
discriminative measure (e.g. info gain)
then all the training instances containing this
feature are eliminated from D

13
Direct Discriminative Pattern Mining

Objectives
Directly mine a set of highly discriminative
patterns (for efficiency)
Impose a feature coverage constraint (for
accuracy)
every training instance has to be covered by one
or multiple features
Modules to meet these objectives
A branch-and-bound search method
to identify the most discriminative pattern in
the data set
An instance elimination process
to remove the training instances that are covered
by the patterns selected so far

14
Branch-and-Bound Search

Upper_boundinfo-gain f(pattern-freq)
Upper_boundinfo-gain monotonically increases with
pattern frequency
So, discriminative power of low-frequency
patterns is upper bounded by a small value
Basic mining method
FP-growth as described in
J. Han, J. Pei, and Y. Yin Mining frequent
patterns without candidate generation. SIGMOD 2000

15
Branch-and-Bound Search

Main idea
Record the most discriminative itemset discovered
so far in a global variable (say, maxIG)
Before constructing a conditional FP-tree
Estimate the Upper_boundinfo-gain , given the
size of the conditional database
Support of any itemset lt the size of conditional
database
So, info-gain of any itemset lt
Upper_boundinfo-gain of conditional database
So, if Upper_boundinfo-gain lt maxIG, then do not
generate the conditional FP-tree

16
Branch-and-Bound Search (contd)
17
Branch-and-Bound Search (contd)

Computing IGub
Assume we have an itemset ?
whose absence or presence is represented by a
random variable X, X ? 0,1
Assume C 0,1
Let P(x 1) ? (? is the frequency of the
itemset ?), P(c 1) p and P(c 1x 1)
q.
The upper bound function shown below assumes ?
ltp
since ? gtp is a symmetric case.
Then when q 0 or q 1, IG(CX) reaches the
upper bound

18
Branch-and-Bound Search (contd)

When q 1, the upper bound is
When q 0, the upper bound is

19
Branch-and-Bound Search (contd)

Illustration
Table 1 shows a training database which contains
8 instances and 2 classes
Let min_sup 2

20
Branch-and-Bound Search (contd)

Illustration (contd)
The global FP-tree is illustrated in figure-2

21
Branch-and-Bound Search (contd)

Illustration (contd)
The first frequent itemset is d
Info-gain(d) 0.016
maxID 0.016, size of conditional database 3
IGub(3) 0.467 gt maxIG, so cant prune

22
Branch-and-Bound Search (contd)

Illustration (contd)
Therefore, perform recursive mining on the
conditional FP-tree
Get ad, bd, cd, and abd
IG(ad) 0.123, IG(BD) 0.123, IG(cd) 0.074
and IG(abd) 0.123

23
Branch-and-Bound Search (contd)

Illustration (contd)
Mining proceeds to frequent itemset a
Info-gain(a) 0.811
maxIG 0.811, size of conditional database 6
IGub(6) 0.811 lt maxIG, so prune

24
Training Instance Elimination

Every training instance must be covered by one or
multiple features
Two approaches
Transaction centered approach
Feature-centered approach

25
Transaction Centered Approach

Mine a set of discriminative features to satisfy
the feature coverage constraint
Can be built on FP-growth mining
Keep for each transaction the best feature
When a frequent pattern ? is generated, the
transaction id list T(?) is computed
For every instance t ? T(?), we check whether ?
is the best feature for t
Total cost Tminingm.Tcheck_db
May be too high if m (total of frequent
itemsets) is high (e.g. millions)

26
Transaction Centered Approach (continued)
27
Feature Centered Approach

Basic idea
Branch-and-bound produces the most discriminative
itemset ?
But the mining process does not compute any
transaction id
After mining, the transaction id list T(?) is
computed
Then eliminate the transactions in T(?) from the
FP-tree
Repeat the branch-and-bound on the modified tree
check_db is performed only on the best feature
produced by the mining process (not on all freq
patts)

28
Feature Centered Approach (continued)

DDPMine algorithm

29
Feature Centered Approach (continued)

DDPMine algorithm (continued)
Inputs
min_sup
FP-Tree P
Operations
Branch-and-bound searches the most discriminative
feature ?
Transaction set T(?) is computed and removed from
P
The resulting FP-tree is P
Invoke DDPMine recursively until the FP-tree
becomes empty
Cost n (Tmining Tcheck_db Tupdate)
n is the number of iterations (usually very
small)

30
Shrinking the FP-tree

When inserting a training instance into a FP-tree
We register the transaction id of this instance
at the node which corresponds to the last item in
the instance
Global FP-tree carries training instance id-list
When an itemset ? generated, the training
instances T(?) have to be removed
Perform a traversal of the FP-tree and examine
the id-lists associated with the tree nodes
When an id in a node appears in T(?), it is
removed
The count on the node is reduced by 1, as well as
the count on all the ancestor nodes up to the
root
When count reaches 0, delete the node

31
Shrinking the FP-tree (continued)

Example
After itemset a is discovered with T(a)
100,200,300,400,700,800
All the shaded nodes will be deleted

32
Feature Coverage

In DDPMine algorithm
when a feature is generated,
the transactions containing this feature are
removed
But in real classification task
We may want to generate multiple features to
represent a transaction
So, a feature coverage parameter ? is introduced
A transaction is eliminated only when it is
covered by at least ? features.
Keep a counter for each transaction
When a feature is generated, the counter for each
transaction containing the feature is incremented

33
Feature Coverage (Continued)