Title: Direct Discriminative Pattern Mining for Effective Classification
1Direct Discriminative Pattern Mining for
Effective Classification
- Hong Cheng, Jiawei Han - UIUC
- Xifeng Yan, Philip S. Yu IBM T. J. Watson
- ICDE 2008
2Presentation Overview
- Introduction
- Frequent Pattern-based Classification
- Direct Discriminative Pattern Mining
- Experimental Results
- Conclusion
3Introduction
- A frequent itemset (pattern)
- is a set of items in a dataset with min_sup
frequency - have been explored widely in classification
tasks - Studies achieve very good classification accuracy
- associative classification was found to be
competitive - with traditional classification methods SVM,
C4.5 - frequent patterns are also promising for
classifying complex structures with high accuracy - such as strings and graphs
4Introduction (continued)
- However, most of these studies make a two-step
process - 1. Mine all frequent patterns or association
rules that satisfy min_sup - 2. Perform feature selection or rule ranking
procedure
5Introduction (continued)
- Although the two-step process is
- straightforward and achieves high accuracy
- it could incur high computational cost
- Mining could take long time
- due to the exponential combinations among items
- which is common for dense datasets or
- high dimensional data
- It could also take very long time if min_sup is
low
6Introduction (continued)
- Classification tasks depend on the frequent
patterns - that are highly discriminative w.r.t. the class
label - But frequent pattern mining is not based on
discriminative power - So, a large number of indiscriminative item sets
can be generated during the mining step - much time is wasted on generating patterns that
are not useful - Feature selection on this large set (e.g.
millions) of patterns will be too costly too
7Introduction (continued)
- Solution?
- Instead of generating the complete set of
frequent patterns - directly mine highly discriminative patterns for
classification - This leads to the
- Direct Discriminative Pattern Mining approach or
DDPMine. - It integrates the feature selection mechanism
into the mining framework
8Introduction (continued)
- DDPMine mining methodology
- Transform data into a compact FP-tree
- Search discriminative patterns directly
- Outperforms two-step method with significant
speedup
9Frequent Pattern-based Classification
- Learning a classification model
- in the feature space of single features
- and frequent patterns
- Consists of three steps
- 1. frequent itemset mining
- 2. feature selection
- 3. model learning
- Both steps 1 and 2 can be a bottleneck
- due to exponential combination among items
10Frequent Pattern-based Classification (continued)
- Computational bottleneck
- Could take a very long time to generate the
complete set of frequent patterns - It is inefficient to wait forever for the mining
algorithm to finish and then apply feature
selection - because most of the patterns are not useful from
classification (i.e., indiscriminative) - Feature selection can be a bottleneck because of
the explosive number (millions) of frequent
patterns - Even if a linear algorithm (usually it is
polynomial) is employed for feature selection, it
could run slowly
11Frequent Pattern-based Classification (continued)
- Feature selection by sequential coverage
12Frequent Pattern-based Classification (continued)
- Feature selection by sequential coverage (cont)
- Input
- D a set of training instances
- F a set of features
- Iteratively applies feature selection
- At each step
- it selects the feature with the highest
discriminative measure (e.g. info gain) - then all the training instances containing this
feature are eliminated from D
13Direct Discriminative Pattern Mining
- Objectives
- Directly mine a set of highly discriminative
patterns (for efficiency) - Impose a feature coverage constraint (for
accuracy) - every training instance has to be covered by one
or multiple features - Modules to meet these objectives
- A branch-and-bound search method
- to identify the most discriminative pattern in
the data set - An instance elimination process
- to remove the training instances that are covered
by the patterns selected so far
14Branch-and-Bound Search
- Upper_boundinfo-gain f(pattern-freq)
- Upper_boundinfo-gain monotonically increases with
pattern frequency - So, discriminative power of low-frequency
patterns is upper bounded by a small value - Basic mining method
- FP-growth as described in
- J. Han, J. Pei, and Y. Yin Mining frequent
patterns without candidate generation. SIGMOD 2000
15Branch-and-Bound Search
- Main idea
- Record the most discriminative itemset discovered
so far in a global variable (say, maxIG) - Before constructing a conditional FP-tree
- Estimate the Upper_boundinfo-gain , given the
size of the conditional database - Support of any itemset lt the size of conditional
database - So, info-gain of any itemset lt
Upper_boundinfo-gain of conditional database - So, if Upper_boundinfo-gain lt maxIG, then do not
generate the conditional FP-tree
16Branch-and-Bound Search (contd)
17Branch-and-Bound Search (contd)
- Computing IGub
- Assume we have an itemset ?
- whose absence or presence is represented by a
random variable X, X ? 0,1 - Assume C 0,1
- Let P(x 1) ? (? is the frequency of the
itemset ?), P(c 1) p and P(c 1x 1)
q. - The upper bound function shown below assumes ?
ltp - since ? gtp is a symmetric case.
- Then when q 0 or q 1, IG(CX) reaches the
upper bound
18Branch-and-Bound Search (contd)
- When q 1, the upper bound is
- When q 0, the upper bound is
19Branch-and-Bound Search (contd)
- Illustration
- Table 1 shows a training database which contains
8 instances and 2 classes - Let min_sup 2
20Branch-and-Bound Search (contd)
- Illustration (contd)
- The global FP-tree is illustrated in figure-2
21Branch-and-Bound Search (contd)
- Illustration (contd)
- The first frequent itemset is d
- Info-gain(d) 0.016
- maxID 0.016, size of conditional database 3
- IGub(3) 0.467 gt maxIG, so cant prune
22Branch-and-Bound Search (contd)
- Illustration (contd)
- Therefore, perform recursive mining on the
conditional FP-tree - Get ad, bd, cd, and abd
- IG(ad) 0.123, IG(BD) 0.123, IG(cd) 0.074
and IG(abd) 0.123
23Branch-and-Bound Search (contd)
- Illustration (contd)
- Mining proceeds to frequent itemset a
- Info-gain(a) 0.811
- maxIG 0.811, size of conditional database 6
- IGub(6) 0.811 lt maxIG, so prune
24Training Instance Elimination
- Every training instance must be covered by one or
multiple features - Two approaches
- Transaction centered approach
- Feature-centered approach
25Transaction Centered Approach
- Mine a set of discriminative features to satisfy
the feature coverage constraint - Can be built on FP-growth mining
- Keep for each transaction the best feature
- When a frequent pattern ? is generated, the
transaction id list T(?) is computed - For every instance t ? T(?), we check whether ?
is the best feature for t - Total cost Tminingm.Tcheck_db
- May be too high if m (total of frequent
itemsets) is high (e.g. millions)
26Transaction Centered Approach (continued)
27Feature Centered Approach
- Basic idea
- Branch-and-bound produces the most discriminative
itemset ? - But the mining process does not compute any
transaction id - After mining, the transaction id list T(?) is
computed - Then eliminate the transactions in T(?) from the
FP-tree - Repeat the branch-and-bound on the modified tree
- check_db is performed only on the best feature
produced by the mining process (not on all freq
patts)
28Feature Centered Approach (continued)
29Feature Centered Approach (continued)
- DDPMine algorithm (continued)
- Inputs
- min_sup
- FP-Tree P
- Operations
- Branch-and-bound searches the most discriminative
feature ? - Transaction set T(?) is computed and removed from
P - The resulting FP-tree is P
- Invoke DDPMine recursively until the FP-tree
becomes empty - Cost n (Tmining Tcheck_db Tupdate)
- n is the number of iterations (usually very
small)
30Shrinking the FP-tree
- When inserting a training instance into a FP-tree
- We register the transaction id of this instance
- at the node which corresponds to the last item in
the instance - Global FP-tree carries training instance id-list
- When an itemset ? generated, the training
instances T(?) have to be removed - Perform a traversal of the FP-tree and examine
the id-lists associated with the tree nodes - When an id in a node appears in T(?), it is
removed - The count on the node is reduced by 1, as well as
the count on all the ancestor nodes up to the
root - When count reaches 0, delete the node
31Shrinking the FP-tree (continued)
- Example
- After itemset a is discovered with T(a)
100,200,300,400,700,800 - All the shaded nodes will be deleted
32Feature Coverage
- In DDPMine algorithm
- when a feature is generated,
- the transactions containing this feature are
removed - But in real classification task
- We may want to generate multiple features to
represent a transaction - So, a feature coverage parameter ? is introduced
- A transaction is eliminated only when it is
covered by at least ? features. - Keep a counter for each transaction
- When a feature is generated, the counter for each
transaction containing the feature is incremented
33Feature Coverage (Continued)
- When a counter reaches ?
- The corresponding transaction is deleted
- Two tables are inctroduced
- CTable here the counters are stored
- HTable keeps track of features already
generated - Example assume ? 2
34Efficiency Analysis
- ?0 min_sup
- D0 the initial dataset (at iteration 0)
- Tmining time for FP-growth mining
- Tcheck_db time to get T(?) for a frequent
itemset ? - Tupdate O(V D)
- Where V is the of nodes in the FP-tree
- D is the of training instances in the dataset
35Experimental Results
- Dataset UCI Machine Learning Repository
- Adult
- Chess
- Hypo
- Sick
- Efficiency comparison
- HARMONY instance-centric rule-based classifier
- PatClass frequent pattern-based classifier
- Branch and bound search
- Effectiveness of pruning
36Efficiency Tests
37Branch and Bound Search
38Problem Size Reduction
39Efficiency and Accuracy
40Conclusion
- Frequent pattern-based classification is very
effective - But the main bottleneck is mining feature
selection - DDPMine is an approach that
- Directly mines discriminative patterns
- Integrates feature selection into the mining
framewrk - A branch and bound search is imposed
- on the FP-growth mining process
- Pruning the search space significantly
- DDPMine achieves significant speedup over the
two-step methods