Mining HighUtility Itemsets from Databases - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Mining HighUtility Itemsets from Databases

Description:

Performances by Varying ms% (a) German credit dataset. (b) Heart disease dataset. ... Found a weaker but anti-monotonic condition based on utility that helped us to ... – PowerPoint PPT presentation

Number of Views:283
Avg rating:3.0/5.0
Slides: 33
Provided by: drraymo
Category:

less

Transcript and Presenter's Notes

Title: Mining HighUtility Itemsets from Databases


1
Mining High-Utility Itemsetsfrom Databases
  • Raymond Chan,
  • Qiang Yang
  • Hong Kong University of Science Technology
  • and
  • Yi-Dong Shen
  • Institute of Software, Chinese Academy of Sciences

2
Background Frequent Patterns and Association
Rules
  • Itemset Xx1, , xk
  • Find all the rules X?Y with min confidence and
    support
  • support, s, probability that a transaction
    contains X?Y
  • confidence, c, conditional probability that a
    transaction having X also contains Y.

Let min_support 50, min_conf 50 A ? C
(50, 66.7) C ? A (50, 100)
3
Background
  • Existing approaches to association mining are
    itemset-correlation-oriented
  • I1, , Im ? Im1(s, c)
  • But,
  • There are too many such rules!
  • people are more interested to find out how useful
    the association rules are
  • We introduce the concept of objective and utility
    Shen, et al., ICDM 2002
  • I1, , Im ? Obj(s, c, u)

4
Utility as a function
  • Support
  • Confidence
  • Utility of objective item
  • U ?(A  v) ? ?

5
(No Transcript)
6
Deg
Deg
7
(No Transcript)
8
Definitions (contd)
  • Utility of record r in DB
  • Total utility of I over DB
  • Expected utility of OOA rule I1, , Im ? Obj(s,
    c, u)

9
(No Transcript)
10
Han et al.
11
(No Transcript)
12
Mining OOA Rules
  • A Pruning Strategy by estimating upper bounds

negative
Records satisfying I
All positive (but not sufficient for mu Thus,
must include negative items)
Utility Upper Bound for (I) Sum of all
positive items
A better Upper Bound for (I) Sum of All
positive Items Lower Bound of Some
Negative Lower bound for Some Negative
height(negative) lowerbound of all single
negative items
13
(No Transcript)
14
Problems with OOA Mining
  • Two problems
  • The minimum utility mu is not easy to set
  • Solution is to mine the top-K utility patterns
  • Still too many patterns due to the downward
    closure property
  • Solution mine frequent closed patterns
  • Removes redundancy (see later)
  • Lets look at them one at a time

15
Related Work Top-K Patterns
  • Liu, et al. 2000, Silberschatz, et al. 1996
    focused on finding interesting patterns by
    matching them against a given set of user beliefs
  • Sese, et al. 2002 studied mining N most
    correlated association rules
  • Han, et al. 2002 proposed a new mining task to
    mine top-k frequent closed patterns of length no
    less than min_l

16
Top-K with Upper and Lower Bound Utilities
  • Utility can be a positive or negative value
  • Utility constraint becomes neither monotone nor
    anti-monotone
  • Strategy
  • Look for upper bound and lower bound of utility
    to satisfy the anti-monotone restriction
  • Pruning Opportunities

upper1
lt lower2
Utility
value1
value2
17
Top-n Utility and Bottom-n Utility
  • Top-n utility
  • Bottom-n utility
  • where n 1, , N
  • and u(.) sorted u(r1) ? u(r2) ? ? u(rN)
  • Choose n ms ? DB, they become the tightest
    upper bound and lower bound

18
Mining Top-K Utility Frequent Closed Patterns
  • L the set of top-K utilities in u I ?
    Obj(s, c, u) is an OOA rule
  • ? variable to store minimum utility
  • Apriori based, with enhancement to handle top-K
    objective utility and closed patterns (see next
    page)

19
Closed Patterns
  • A closed itemset X has no superset X such that
    every transaction containing X also contains X
  • a, b, a, b, d, a, b, c are frequent closed
    patterns
  • A closed pattern may not be a max pattern (e.g.
    a,b is not a max pattern)
  • Concise rep. of freq pats
  • Reduce of patterns and rules
  • N. Pasquier et al. In ICDT99

Min_sup2
20
4 Pruning Strategies I
  • Strategy 1 Similar to the Apriori algorithm
    pruning strategy,
  • for every (i  1)-generator I in Gi1,
  • if there exists a i-sub-itemset J ? I such that
    J ? Gi,
  • I is pruned from Gi1.
  • This strategy prunes all supersets of infrequent
    generators
  • because Gk only contains frequent generators.

21
Pruning Strategies II
  • Strategy 2
  • For all the remaining (i  1)-generators in Gi1,
  • we prune those generators that do not satisfy the
    user specified minimum support constraint.
  • This strategy removes all infrequent generators.
  • These infrequent generators cannot produce
    frequent closed itemsets because of the
    anti-monotone property of the support constraint.

22
Pruning Strategies Using Upper and Lower Bounds
  • Strategy 3
  • For the rest of the frequent (i  1)-generators
    in Gi1,
  • we prune those generators that has a top-K
    utility (utility upper bound) less than ? ( the
    temporary Kth highest utility value during
    computation).
  • This strategy applies the anti-monotone property
    of the utility upper-bound constraint
  • removes all generators that cannot produce closed
    itemsets with a utility high enough to be in the
    top-K list.

23
Pruning Strategy 4 closed itemsets
  • Strategy 4 For every remaining ( i
     1)-generators I in Gi1,
  • if there exists a i-sub-itemset J ? I such that I
    and J have the same support,
  • I is pruned from Gi1.
  • This strategy removes redundant generators since
    the closed itemset from I has already been
    generated by J previously.

24
Generate Closed Patterns
  • One database scan of DB is required to generate
    the closed itemsets from the generators.
  • For each database record r of DB and for each
    generator I in Gk1,
  • the corresponding closed itemset I.closure is
    updated.
  • If r is the first record that contains I,
    I.closure is empty so we put all non-objective
    items into I.closure.
  • If r is not the first record that contains I,
  • I.closure is non-empty so we perform an
    intersection between I.closure and r (i.e.
    I.closure ? r) and put the resulting itemset back
    to I.closure.
  • At the end of the database scan, I.closure will
    contain a closed itemset generated from generator
    I.

25
Apriori Closed Algorithm
26
Experimental Evaluation
  • We expect more useful and fewer patterns than
    standard Apriori
  • Real datasets from UCI Machine Learning Archive
  • German Credit dataset (ftp//ftp.ics.uci.edu/pub/m
    achine-learning-databases/statlog/german/)
  • 1000 customer records, 21 attributes
  • Heart Disease dataset (ftp//ftp.ics.uci.edu/pub/m
    achine-learning-databases/statlog/heart)
  • 270 patient records, 14 attributes

27
Performances by Varying ms
(a) German credit dataset.
(b) Heart disease dataset.
28
Performances by Varying K
(a) German credit dataset.
(b) Heart disease dataset.
29
Effect of Prune Strategies 1 to 4
(a) German credit dataset.
(b) Heart disease dataset.
30
Frequent Itemsets (German)
31
Frequent Itemsets (heart)
32
Conclusions and Future Work
  • We developed a new approach to modeling
    association mining, OOA mining, which is
    objective oriented
  • Developed an algorithm to mine the OOA frequent
    closed patterns and the top-K utility OOA rules
  • Found a weaker but anti-monotonic condition based
    on utility that helped us to prune the search
    space
  • Our algorithm can produce the desired results
    without too much overhead, with the added
    advantage to specify the number of rules
  • Future
  • More pruning strategies
  • More sophisticated knowledge than rules
Write a Comment
User Comments (0)
About PowerShow.com