Direct Discriminative Pattern Mining for Effective Classification - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Direct Discriminative Pattern Mining for Effective Classification

Description:

Direct Discriminative Pattern Mining for Effective Classification ... Using information gain indicate an upper bound of discriminative measures ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 20
Provided by: ox0
Category:

less

Transcript and Presenter's Notes

Title: Direct Discriminative Pattern Mining for Effective Classification


1
Direct Discriminative Pattern Mining for
Effective Classification
  • Hong Cheng, Xifeng Yan, Jiawei, Philip S. Yu
  • ICDE 2008

2
Outline
  • Introduction
  • DDPMine approach
  • Experimental results
  • Conclusion

3
Introduction
  • Frequent Pattern-Based Classification
  • Frequent itemset mining
  • Feature selection
  • Model learning
  • The cost is too expensive
  • Generating the complete set of frequent patterns
  • Feature selection is in post-processing

4
(Cont.)
  • To avoid generating the complete set of frequent
    patterns
  • To directly mine highly discriminative patterns
    for classification

5
DDPMine
  • For efficiency concerns
  • To directly mine a set of highly discriminative
    patterns
  • Branch-and-bound search
  • For accuracy consideration
  • To impose a feature coverage constraint
  • An instance elimination process

6
Branch-and-Bound Search
  • Using information gain indicate an upper bound of
    discriminative measures
  • To adopt FP-growth and IG to facilitate a
    branch-and-bound

7
(cont.)
  • Example
  • IG(d)0.016 ? maxIG0.016
  • IGub(3)0.467gt maxIG ? cannot prune the
    conditional FP-tree
  • recursive mining on conditional FP-tree and
    get ad,bd,cd,abd
  • IG(ad)0.123,IG(bd)0.123,IG(cd)0.074,IG(abd
    )0.123

8
(cont.)
  • IG(a)0.811 ?maxIG
  • IGub(6)0.811maxIG ?the conditional FP-tree can
    be pruned without any mining

9
(cont.)
  • Information gain upper bound

10
Training instance Elimination
  • Goal To ensure every training instance is
    covered by one or multiple features
  • Transaction-centered
  • feature-centered
  • Transaction-centered To mine a set of
    discriminative features to satisfy the feature
    coverage constraint for each training instance
  • Cost Tmining m.Tcheck_db

The number of frequent itemsets generated
11
(Cont.)
  • Feature-Centered Approach
  • Goal it want to directly mine discriminative
    features while reducing the number of check_db
    operations

12
Example

c7
b1
0
2
d1
b6
a1
0
1
500
600
d1
a5
0
0
100,300,700,800
400
d1
0
200
13
Feature coverage
  • For accuracy consideration, it generate multiple
    features to represent a transaction.
  • d a transaction is eliminated from further
    consideration when it is covered by at least d
    features
  • Keep a counter for each transaction when a
    counter reach d, the corresponding transaction is
    removed from the tree
  • Ctable the counters are stored in an array of
    integers
  • Htable keep track of the features that are
    already discovered

14
(Cont.)
  • DDPMine works in an iterative way and terminates
    when the training DB becomes empty.

15
Experimental Results
  • Two other methods
  • HARMONY
  • instance-centric rule-based classifier.
  • Pruning methods
  • PatClass
  • Frequent pattern-based classifier
  • Two-step procedure

16
(Cont.)
  • Efficiency Comparison

17
(Cont.)
  • Accuracy Comparison

18
(Cont.)
  • Problem Size Reduction

19
Conclusion
  • Effective for classify categorical or high
    dimensional spare datasets
  • Reduce the search space without downgrade the
    accuracy and efficiency
Write a Comment
User Comments (0)
About PowerShow.com