Direct Discriminative Pattern Mining for Effective Classification - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Direct Discriminative Pattern Mining for Effective Classification

Description:

But frequent pattern mining is not based on discriminative power ... When a frequent pattern is generated, the transaction id list T( ) is computed ... – PowerPoint PPT presentation

Number of Views:262
Avg rating:3.0/5.0
Slides: 41
Provided by: mehedy
Category:

less

Transcript and Presenter's Notes

Title: Direct Discriminative Pattern Mining for Effective Classification


1
Direct Discriminative Pattern Mining for
Effective Classification
  • Hong Cheng, Jiawei Han - UIUC
  • Xifeng Yan, Philip S. Yu IBM T. J. Watson
  • ICDE 2008

2
Presentation Overview
  • Introduction
  • Frequent Pattern-based Classification
  • Direct Discriminative Pattern Mining
  • Experimental Results
  • Conclusion

3
Introduction
  • A frequent itemset (pattern)
  • is a set of items in a dataset with min_sup
    frequency
  • have been explored widely in classification
    tasks
  • Studies achieve very good classification accuracy
  • associative classification was found to be
    competitive
  • with traditional classification methods SVM,
    C4.5
  • frequent patterns are also promising for
    classifying complex structures with high accuracy
  • such as strings and graphs

4
Introduction (continued)
  • However, most of these studies make a two-step
    process
  • 1. Mine all frequent patterns or association
    rules that satisfy min_sup
  • 2. Perform feature selection or rule ranking
    procedure

5
Introduction (continued)
  • Although the two-step process is
  • straightforward and achieves high accuracy
  • it could incur high computational cost
  • Mining could take long time
  • due to the exponential combinations among items
  • which is common for dense datasets or
  • high dimensional data
  • It could also take very long time if min_sup is
    low

6
Introduction (continued)
  • Classification tasks depend on the frequent
    patterns
  • that are highly discriminative w.r.t. the class
    label
  • But frequent pattern mining is not based on
    discriminative power
  • So, a large number of indiscriminative item sets
    can be generated during the mining step
  • much time is wasted on generating patterns that
    are not useful
  • Feature selection on this large set (e.g.
    millions) of patterns will be too costly too

7
Introduction (continued)
  • Solution?
  • Instead of generating the complete set of
    frequent patterns
  • directly mine highly discriminative patterns for
    classification
  • This leads to the
  • Direct Discriminative Pattern Mining approach or
    DDPMine.
  • It integrates the feature selection mechanism
    into the mining framework

8
Introduction (continued)
  • DDPMine mining methodology
  • Transform data into a compact FP-tree
  • Search discriminative patterns directly
  • Outperforms two-step method with significant
    speedup

9
Frequent Pattern-based Classification
  • Learning a classification model
  • in the feature space of single features
  • and frequent patterns
  • Consists of three steps
  • 1. frequent itemset mining
  • 2. feature selection
  • 3. model learning
  • Both steps 1 and 2 can be a bottleneck
  • due to exponential combination among items

10
Frequent Pattern-based Classification (continued)
  • Computational bottleneck
  • Could take a very long time to generate the
    complete set of frequent patterns
  • It is inefficient to wait forever for the mining
    algorithm to finish and then apply feature
    selection
  • because most of the patterns are not useful from
    classification (i.e., indiscriminative)
  • Feature selection can be a bottleneck because of
    the explosive number (millions) of frequent
    patterns
  • Even if a linear algorithm (usually it is
    polynomial) is employed for feature selection, it
    could run slowly

11
Frequent Pattern-based Classification (continued)
  • Feature selection by sequential coverage

12
Frequent Pattern-based Classification (continued)
  • Feature selection by sequential coverage (cont)
  • Input
  • D a set of training instances
  • F a set of features
  • Iteratively applies feature selection
  • At each step
  • it selects the feature with the highest
    discriminative measure (e.g. info gain)
  • then all the training instances containing this
    feature are eliminated from D

13
Direct Discriminative Pattern Mining
  • Objectives
  • Directly mine a set of highly discriminative
    patterns (for efficiency)
  • Impose a feature coverage constraint (for
    accuracy)
  • every training instance has to be covered by one
    or multiple features
  • Modules to meet these objectives
  • A branch-and-bound search method
  • to identify the most discriminative pattern in
    the data set
  • An instance elimination process
  • to remove the training instances that are covered
    by the patterns selected so far

14
Branch-and-Bound Search
  • Upper_boundinfo-gain f(pattern-freq)
  • Upper_boundinfo-gain monotonically increases with
    pattern frequency
  • So, discriminative power of low-frequency
    patterns is upper bounded by a small value
  • Basic mining method
  • FP-growth as described in
  • J. Han, J. Pei, and Y. Yin Mining frequent
    patterns without candidate generation. SIGMOD 2000

15
Branch-and-Bound Search
  • Main idea
  • Record the most discriminative itemset discovered
    so far in a global variable (say, maxIG)
  • Before constructing a conditional FP-tree
  • Estimate the Upper_boundinfo-gain , given the
    size of the conditional database
  • Support of any itemset lt the size of conditional
    database
  • So, info-gain of any itemset lt
    Upper_boundinfo-gain of conditional database
  • So, if Upper_boundinfo-gain lt maxIG, then do not
    generate the conditional FP-tree

16
Branch-and-Bound Search (contd)
17
Branch-and-Bound Search (contd)
  • Computing IGub
  • Assume we have an itemset ?
  • whose absence or presence is represented by a
    random variable X, X ? 0,1
  • Assume C 0,1
  • Let P(x 1) ? (? is the frequency of the
    itemset ?), P(c 1) p and P(c 1x 1)
    q.
  • The upper bound function shown below assumes ?
    ltp
  • since ? gtp is a symmetric case.
  • Then when q 0 or q 1, IG(CX) reaches the
    upper bound

18
Branch-and-Bound Search (contd)
  • When q 1, the upper bound is
  • When q 0, the upper bound is

19
Branch-and-Bound Search (contd)
  • Illustration
  • Table 1 shows a training database which contains
    8 instances and 2 classes
  • Let min_sup 2

20
Branch-and-Bound Search (contd)
  • Illustration (contd)
  • The global FP-tree is illustrated in figure-2

21
Branch-and-Bound Search (contd)
  • Illustration (contd)
  • The first frequent itemset is d
  • Info-gain(d) 0.016
  • maxID 0.016, size of conditional database 3
  • IGub(3) 0.467 gt maxIG, so cant prune

22
Branch-and-Bound Search (contd)
  • Illustration (contd)
  • Therefore, perform recursive mining on the
    conditional FP-tree
  • Get ad, bd, cd, and abd
  • IG(ad) 0.123, IG(BD) 0.123, IG(cd) 0.074
    and IG(abd) 0.123

23
Branch-and-Bound Search (contd)
  • Illustration (contd)
  • Mining proceeds to frequent itemset a
  • Info-gain(a) 0.811
  • maxIG 0.811, size of conditional database 6
  • IGub(6) 0.811 lt maxIG, so prune

24
Training Instance Elimination
  • Every training instance must be covered by one or
    multiple features
  • Two approaches
  • Transaction centered approach
  • Feature-centered approach

25
Transaction Centered Approach
  • Mine a set of discriminative features to satisfy
    the feature coverage constraint
  • Can be built on FP-growth mining
  • Keep for each transaction the best feature
  • When a frequent pattern ? is generated, the
    transaction id list T(?) is computed
  • For every instance t ? T(?), we check whether ?
    is the best feature for t
  • Total cost Tminingm.Tcheck_db
  • May be too high if m (total of frequent
    itemsets) is high (e.g. millions)

26
Transaction Centered Approach (continued)
27
Feature Centered Approach
  • Basic idea
  • Branch-and-bound produces the most discriminative
    itemset ?
  • But the mining process does not compute any
    transaction id
  • After mining, the transaction id list T(?) is
    computed
  • Then eliminate the transactions in T(?) from the
    FP-tree
  • Repeat the branch-and-bound on the modified tree
  • check_db is performed only on the best feature
    produced by the mining process (not on all freq
    patts)

28
Feature Centered Approach (continued)
  • DDPMine algorithm

29
Feature Centered Approach (continued)
  • DDPMine algorithm (continued)
  • Inputs
  • min_sup
  • FP-Tree P
  • Operations
  • Branch-and-bound searches the most discriminative
    feature ?
  • Transaction set T(?) is computed and removed from
    P
  • The resulting FP-tree is P
  • Invoke DDPMine recursively until the FP-tree
    becomes empty
  • Cost n (Tmining Tcheck_db Tupdate)
  • n is the number of iterations (usually very
    small)

30
Shrinking the FP-tree
  • When inserting a training instance into a FP-tree
  • We register the transaction id of this instance
  • at the node which corresponds to the last item in
    the instance
  • Global FP-tree carries training instance id-list
  • When an itemset ? generated, the training
    instances T(?) have to be removed
  • Perform a traversal of the FP-tree and examine
    the id-lists associated with the tree nodes
  • When an id in a node appears in T(?), it is
    removed
  • The count on the node is reduced by 1, as well as
    the count on all the ancestor nodes up to the
    root
  • When count reaches 0, delete the node

31
Shrinking the FP-tree (continued)
  • Example
  • After itemset a is discovered with T(a)
    100,200,300,400,700,800
  • All the shaded nodes will be deleted

32
Feature Coverage
  • In DDPMine algorithm
  • when a feature is generated,
  • the transactions containing this feature are
    removed
  • But in real classification task
  • We may want to generate multiple features to
    represent a transaction
  • So, a feature coverage parameter ? is introduced
  • A transaction is eliminated only when it is
    covered by at least ? features.
  • Keep a counter for each transaction
  • When a feature is generated, the counter for each
    transaction containing the feature is incremented

33
Feature Coverage (Continued)
  • When a counter reaches ?
  • The corresponding transaction is deleted
  • Two tables are inctroduced
  • CTable here the counters are stored
  • HTable keeps track of features already
    generated
  • Example assume ? 2

34
Efficiency Analysis
  • ?0 min_sup
  • D0 the initial dataset (at iteration 0)
  • Tmining time for FP-growth mining
  • Tcheck_db time to get T(?) for a frequent
    itemset ?
  • Tupdate O(V D)
  • Where V is the of nodes in the FP-tree
  • D is the of training instances in the dataset

35
Experimental Results
  • Dataset UCI Machine Learning Repository
  • Adult
  • Chess
  • Hypo
  • Sick
  • Efficiency comparison
  • HARMONY instance-centric rule-based classifier
  • PatClass frequent pattern-based classifier
  • Branch and bound search
  • Effectiveness of pruning

36
Efficiency Tests
37
Branch and Bound Search
38
Problem Size Reduction
39
Efficiency and Accuracy
40
Conclusion
  • Frequent pattern-based classification is very
    effective
  • But the main bottleneck is mining feature
    selection
  • DDPMine is an approach that
  • Directly mines discriminative patterns
  • Integrates feature selection into the mining
    framewrk
  • A branch and bound search is imposed
  • on the FP-growth mining process
  • Pruning the search space significantly
  • DDPMine achieves significant speedup over the
    two-step methods
Write a Comment
User Comments (0)
About PowerShow.com