Mining Multiple-level Association Rules in Large Databases - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Mining Multiple-level Association Rules in Large Databases

Description:

First, large support is more likely to exist at high levels of abstraction. ... 1) Data at multiple levels of abstraction, and ... – PowerPoint PPT presentation

Number of Views:254
Avg rating:3.0/5.0
Slides: 35
Provided by: yli
Learn more at: http://www.cs.uvm.edu
Category:

less

Transcript and Presenter's Notes

Title: Mining Multiple-level Association Rules in Large Databases


1
Mining Multiple-level Association Rules in Large
Databases
IEEE Transactions on Knowledge and Data
Engineering, 1999
Authors JIAWEI HAN, Simon Fraser University,
British Columbia. YONGJIAN FU, University of
Missouri-Rolla, Missouri. Presenter Zhenyu
Lu (based on Mohammeds previous slides, with
some changes)
2
Outline
  • Introduction
  • Algorithm
  • Performance studies
  • Cross-level association
  • Filtering of uninteresting association rules
  • Conclusions

3
IntroductionWhy Multiple-Level Association
Rules?
Frequent itemset b2
TID items
T1 m1, b2
T2 m2, b1
T3 b2
A.A rules none
Is this database useless?
4
IntroductionWhy Multiple-Level Association
Rules?
What if we have this abstraction tree?
food
TID items
T1 milk, bread
T2 milk, bread
T3 bread
milk
bread
m1
m2
b1
b2
minisup 50 miniconf 50
Frequent itemset milk, bread
A.A rules milk ltgt bread
5
IntroductionWhy Multiple-Level Association
Rules?
  • Sometimes, at primitive data level, data does
    not show
  • any significant pattern. But there are useful
    information
  • hiding behind.
  • The goal of Multiple-Level Association
    Analysis is to
  • find the hidden information in or between
    levels of
  • abstraction

6
IntroductionRequirements in Multiple-Level
Association Analysis
Two general requirements to do multiple-level
association rule mining 1) Provide data at
multiple levels of abstraction. (a common
practice now) 2) Find efficient methods for
multiple-level rule mining. (our focus)
7
Outline
  • Introduction
  • Algorithm
  • Performance studies
  • Cross-level association
  • Filtering of uninteresting association rules
  • Conclusions

8
Algorithm observation
Frequent itemset milk, bread A.A rule milk
ltgt bread
TID items
T1 milk, bread
T2 milk, bread
T3 bread
T4 milk, bread
T5 milk
food
Level 1
milk
bread
What about m2, b1?
TID items
T1 m1, b2
T2 m2, b1
T3 b2
T4 m2, b1
T5 m2
m1
m2
b1
b2
Level 2
One minisup for all levels?
Frequent itemset m2 A.A rule none
minisup 50 miniconf 50
9
Algorithm observation
Frequent itemset milk, bread A.A rule milk
ltgt bread
food
Level 1 minisup 50
milk
bread
makes more sense now
Level 2 minisup 40
m1
m2
b1
b2
miniconf 50
Frequent itemset m2, b1, b2 A.A rule m2 ltgt
b1
10
Algorithm observation
  • Drawbacks to use only one minisup
  • If the minisup is too high, we are losing
    information from lower levels
  • If the minisup is too low, we are gaining too
    many rules from higher levels, many of them are
    useless

food
Approach ascending minisup on each level
milk
bread
m1
m2
b1
b2
minisup
11
Algorithm An Example
An entry of sales_transaction Table
Transaction_id Bar_code_set
351428 17325,92108,55349,88157,
A sales_item Description Relation
Bar_code category Brand Content Size Storage_pd price
17325 Milk Foremost 2 1ga. 14(days) 3.89
12
Algorithm An Example
Encode the database with layer information
GID bar_code category content brand
112 17325 Milk 2 Foremost
food
First 1 implies milk
bread
milk
Second 1 implies 2 content
chocolate
white
wheat
2
2 implies Foremost brand
Dairyland
Foremost
13
Encoded Transaction TableT1
Algorithm An Example
TID Items
T1 111,121,211,221
T2 111,211,222,323
T3 112,122,221,411
T4 111,121
T5 111,122,211,221,413
T6 211,323,524
T7 323,411,524,713
14
T2
Algorithm An Example
The frequent 1-itemset on level 1
Level-1 minsup 4
L1,1
Itemset Support
1 5
2 5
TID Items
T1 111,121,211,221
T2 111,211,222
T3 112,122,221
T4 111,121
T5 111,122,211,221
T6 211
only keep items in L1,1 from T1
L1,2
Itemset Support
1,2 4
Use Apriori on each level
15
Algorithm An Example
L2,2
Level-2 minsup 3
Itemset Support
11,12 4
11,21 3
11,22 4
12,22 3
21,22 3
L2,1
Itemset Support
11 5
12 4
21 4
22 4
L2,3
Itemset Support
11,12,22 3
11,21,22 3
16
Frequent Item Sets at Level 3
Level-3 minsup 3
L3,1
L3,2
Itemset Support
111 4
211 4
221 3
Itemset Support
111,211 3
Only generate T1 T2, all frequent itemsets
after level 2 is generated from T2
E.g. Level-1 80 of customers that purchase
milk also purchase bread. milk ? bread
with Confidence 80 Level-2 75 of people who
buy 2 milk also buy wheat bread. 2
milk ? wheat bread with Confidence 75
17
Algorithm ML_T2L1
  • Purpose To find multiple-level frequent item
    sets for mining strong association rules in a
    transaction database
  • Input
  • T1 a hierarchy-information encoded transaction
    table of form ltTID,Item-setgt
  • minisup threshold for each level L in the form
    (minsupL)
  • Output Multiple-level frequent item sets

18
Algorithm variations
  • Algorithm ML_T1LA
  • Use only the first encoded transaction table
    T1.
  • Support for the candidate sets at all levels
    computed at the same time.
  • pros Only one table and maximum k-scans
  • cons May consist of infrequent items and
    requires large space.
  • Algorithm ML_TML1
  • Generate multiple encoded transaction tables
    T1,,Tmax_l1
  • Pros May save substantial amount of processing
  • Cons Can be inefficient if only a few items are
    filtered out at each level processed.
  • Algorithm ML_T2LA
  • Uses 2 encoded transaction tables as in ML_T2L1
    algorithm.
  • Support for the candidate sets at all levels
    computed at the same time.
  • Pros Potentially efficient if T2 consists of
    much fewer items than T1.
  • Cons ?

19
Outline
  • Introduction
  • Algorithm
  • Performance studies
  • Cross-level association
  • Filtering of uninteresting association rules
  • Conclusions

20
Performance Study
  • Assumptions
  • The maximal level in concept hierarchy is 3
  • Use two data sets DB1 (Average frequent item
    length 4 and Average transaction size 10) and
    DB2 (Average frequent item length 6 and
    Average transaction size 20)
  • Conclusions
  • Relative performance of the four algorithms is
    highly relevant to the threshold setting (i.e.,
    the power of a filter at each level).
  • Parallel derivation of L(l,k) is useful and
    deriving a transaction table T(2) is usually
    beneficial.
  • ML_T1LA is found to be the BEST or the second
    best algorithm.

21
Performance Study
Average frequent item length 4 Average
transaction size 10
Average frequent item length 6 Average
transaction size 20
minisup2 3 minisup3 1
minisup2 2 minisup3 0.75
22
Performance Study
Average frequent item length 4 Average
transaction size 10
Average frequent item length 6 Average
transaction size 20
minisup1 55 minisup3 1
minisup1 60 minisup3 0.75
23
Performance Study
minisup1 60 minisup2 2
minisup1 55 minisup2 3
24
Performance Study
  • Two interesting performance features
  • The performance of algorithm is highly relative
    to
  • minisup, especially minisup1
    minisup2.
  • T2 is beneficial

25
Outline
  • Introduction
  • Algorithm
  • Performance studies
  • Cross-level association
  • Filtering of uninteresting association rules
  • Conclusions

26
Cross-level association
food
food
expand
milk
bread
milk
bread
m1
m2
b1
b2
m1
m2
b1
b2
mine rules like milk gt bread and m2 gt b1
mine rules like milk gt b1
27
Cross-level association
  • Two adjustments
  • A single minisup is used at all levels
  • When the frequent k-itemsets are generated,
    items
  • at all levels are considered, itemsets which
    contain
  • an item and its ancestor are excluded

28
Outline
  • Introduction
  • Algorithm
  • Performance studies
  • Cross-level association
  • Filtering of uninteresting association rules
  • Conclusions

29
Filtering of uninteresting association rules
  • Removal of redundant rules
  • To remove redundant rules, when a rule R passes
    the minimum confidence test, it is checked
    against every strong rule R' , of which R is a
    descendant. If the confidence of R, ?(R), falls
    in the range of the expected confidence with the
    variation of ?, it is removed.
  • Example
  • milk ?bread(12 sup, 85 con)
  • Chocolate milk ?bread(1 sup, 84 con)
  • Not interesting if 8 of milk is chocolate milk
  • Can reduce rules by 30 to 60

30
Filtering of uninteresting association rules
(continued)
  • Removal of unnecessary rules
  • To filter out unnecessary association rules, for
    each strong rule R A gt B, we test every such
    rule R A - C gt B, where C belongs to A. If the
    confidence of R, ?(R), is not significantly
    different from that of R' ,?(R' ), R is removed.
  • Example
  • 80 customer buy milk gt bread
  • 80 customer buy milk butter gt bread
  • Reduces rules by 50 to 80

31
Conclusions
  • Extended the association rules from single-level
    to multiple-level.
  • A top-down progressive deepening technique is
    developed for mining multiple-level association
    rules.
  • Filtering of uninteresting association rules.

32
Exams Questions
  • Q1 Give an example of multilevel association
    rules?
  • A Besides finding the 80 of customers that
    purchase milk may also purchase bread, it is
    interesting to allow users to drill-down and show
    that 75 of people buy wheat bread if they buy 2
    percent milk.

33
Exams Questions
  • Q2 What are the problems in using normal Apiori
    methods??
  • A One may apply the Apriori algorithm to examine
    data items at multiple levels of abstraction
    under the same minimum support and minimum
    confidence thresholds. This direction is simple,
    but it may lead to some undesirable results.
  • First, large support is more likely to exist
    at high levels of abstraction. If one wants to
    find strong associations at relatively low levels
    of abstraction, the minimum support threshold
    must be reduced substantially this may lead to
    the generation of many uninteresting associations
    at high or intermediate levels.
  • Second, since it is unlikely to find many
    strong association rules at a primitive concept
    level, mining strong associations should be
    performed at a rather high concept level, which
    is actually the case in many studies. However,
    mining association rules at high concept levels
    may often lead to the rules corresponding to
    prior knowledge and expectations, such as milk
    gt bread, (which could be common sense), or lead
    to some uninteresting attribute combinations if
    the minimum support is allowed to be rather
    small, such as toy gt milk, (which may just
    happen together by chance).

34
Exams Questions
  • Q3 What are the 2 general steps to do
    multiple-level association rule mining?
  • A To explore multiple-level association rule
    mining, one needs to provide
  • 1) Data at multiple levels of abstraction, and
  • 2) Efficient methods for multiple-level rule
    mining.
Write a Comment
User Comments (0)
About PowerShow.com