Mining Multiple-level Association Rules in Large Databases - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Mining Multiple-level Association Rules in Large Databases

Description:

First, large support is more likely to exist at high levels of abstraction. ... 1) Data at multiple levels of abstraction, and ... – PowerPoint PPT presentation

Number of Views:254

Avg rating:3.0/5.0

Slides: 35

Provided by: yli

Learn more at: http://www.cs.uvm.edu

Category:

more less

Transcript and Presenter's Notes

Title: Mining Multiple-level Association Rules in Large Databases

1
Mining Multiple-level Association Rules in Large
Databases
IEEE Transactions on Knowledge and Data
Engineering, 1999
Authors JIAWEI HAN, Simon Fraser University,
British Columbia. YONGJIAN FU, University of
Missouri-Rolla, Missouri. Presenter Zhenyu
Lu (based on Mohammeds previous slides, with
some changes)
2
Outline

Introduction
Algorithm
Performance studies
Cross-level association
Filtering of uninteresting association rules
Conclusions

3
IntroductionWhy Multiple-Level Association
Rules?
Frequent itemset b2
TID items
T1 m1, b2
T2 m2, b1
T3 b2
A.A rules none
Is this database useless?
4
IntroductionWhy Multiple-Level Association
Rules?
What if we have this abstraction tree?
food
TID items
T1 milk, bread
T2 milk, bread
T3 bread
milk
bread
m1
m2
b1
b2
minisup 50 miniconf 50
Frequent itemset milk, bread
A.A rules milk ltgt bread
5
IntroductionWhy Multiple-Level Association
Rules?

Sometimes, at primitive data level, data does
not show
any significant pattern. But there are useful
information
hiding behind.

The goal of Multiple-Level Association
Analysis is to
find the hidden information in or between
levels of
abstraction

6
IntroductionRequirements in Multiple-Level
Association Analysis
Two general requirements to do multiple-level
association rule mining 1) Provide data at
multiple levels of abstraction. (a common
practice now) 2) Find efficient methods for
multiple-level rule mining. (our focus)
7
Outline

Introduction
Algorithm
Performance studies
Cross-level association
Filtering of uninteresting association rules
Conclusions

8
Algorithm observation
Frequent itemset milk, bread A.A rule milk
ltgt bread
TID items
T1 milk, bread
T2 milk, bread
T3 bread
T4 milk, bread
T5 milk
food
Level 1
milk
bread
What about m2, b1?
TID items
T1 m1, b2
T2 m2, b1
T3 b2
T4 m2, b1
T5 m2
m1
m2
b1
b2
Level 2
One minisup for all levels?
Frequent itemset m2 A.A rule none
minisup 50 miniconf 50
9
Algorithm observation
Frequent itemset milk, bread A.A rule milk
ltgt bread
food
Level 1 minisup 50
milk
bread
makes more sense now
Level 2 minisup 40
m1
m2
b1
b2
miniconf 50
Frequent itemset m2, b1, b2 A.A rule m2 ltgt
b1
10
Algorithm observation

Drawbacks to use only one minisup
If the minisup is too high, we are losing
information from lower levels
If the minisup is too low, we are gaining too
many rules from higher levels, many of them are
useless

food
Approach ascending minisup on each level
milk
bread
m1
m2
b1
b2
minisup
11
Algorithm An Example
An entry of sales_transaction Table
Transaction_id Bar_code_set
351428 17325,92108,55349,88157,
A sales_item Description Relation
Bar_code category Brand Content Size Storage_pd price
17325 Milk Foremost 2 1ga. 14(days) 3.89
12
Algorithm An Example
Encode the database with layer information
GID bar_code category content brand
112 17325 Milk 2 Foremost
food
First 1 implies milk
bread
milk
Second 1 implies 2 content
chocolate
white
wheat
2
2 implies Foremost brand
Dairyland
Foremost
13
Encoded Transaction TableT1
Algorithm An Example
TID Items
T1 111,121,211,221
T2 111,211,222,323
T3 112,122,221,411
T4 111,121
T5 111,122,211,221,413
T6 211,323,524
T7 323,411,524,713
14
T2
Algorithm An Example
The frequent 1-itemset on level 1
Level-1 minsup 4
L1,1
Itemset Support
1 5
2 5
TID Items
T1 111,121,211,221
T2 111,211,222
T3 112,122,221
T4 111,121
T5 111,122,211,221
T6 211
only keep items in L1,1 from T1
L1,2
Itemset Support
1,2 4
Use Apriori on each level
15
Algorithm An Example
L2,2
Level-2 minsup 3
Itemset Support
11,12 4
11,21 3
11,22 4
12,22 3
21,22 3
L2,1
Itemset Support
11 5
12 4
21 4
22 4
L2,3
Itemset Support
11,12,22 3
11,21,22 3
16
Frequent Item Sets at Level 3
Level-3 minsup 3
L3,1
L3,2
Itemset Support
111 4
211 4
221 3
Itemset Support
111,211 3
Only generate T1 T2, all frequent itemsets
after level 2 is generated from T2
E.g. Level-1 80 of customers that purchase
milk also purchase bread. milk ? bread
with Confidence 80 Level-2 75 of people who
buy 2 milk also buy wheat bread. 2
milk ? wheat bread with Confidence 75
17
Algorithm ML_T2L1

Purpose To find multiple-level frequent item
sets for mining strong association rules in a
transaction database
Input
T1 a hierarchy-information encoded transaction
table of form ltTID,Item-setgt
minisup threshold for each level L in the form
(minsupL)
Output Multiple-level frequent item sets

18
Algorithm variations

Algorithm ML_T1LA
Use only the first encoded transaction table
T1.
Support for the candidate sets at all levels
computed at the same time.
pros Only one table and maximum k-scans
cons May consist of infrequent items and
requires large space.
Algorithm ML_TML1
Generate multiple encoded transaction tables
T1,,Tmax_l1
Pros May save substantial amount of processing
Cons Can be inefficient if only a few items are
filtered out at each level processed.
Algorithm ML_T2LA
Uses 2 encoded transaction tables as in ML_T2L1
algorithm.
Support for the candidate sets at all levels
computed at the same time.
Pros Potentially efficient if T2 consists of
much fewer items than T1.
Cons ?

19
Outline

Introduction
Algorithm
Performance studies
Cross-level association
Filtering of uninteresting association rules
Conclusions

20
Performance Study

Assumptions
The maximal level in concept hierarchy is 3
Use two data sets DB1 (Average frequent item
length 4 and Average transaction size 10) and
DB2 (Average frequent item length 6 and
Average transaction size 20)
Conclusions
Relative performance of the four algorithms is
highly relevant to the threshold setting (i.e.,
the power of a filter at each level).
Parallel derivation of L(l,k) is useful and
deriving a transaction table T(2) is usually
beneficial.
ML_T1LA is found to be the BEST or the second
best algorithm.

21
Performance Study
Average frequent item length 4 Average
transaction size 10
Average frequent item length 6 Average
transaction size 20
minisup2 3 minisup3 1
minisup2 2 minisup3 0.75
22
Performance Study
Average frequent item length 4 Average
transaction size 10
Average frequent item length 6 Average
transaction size 20
minisup1 55 minisup3 1
minisup1 60 minisup3 0.75
23
Performance Study
minisup1 60 minisup2 2
minisup1 55 minisup2 3
24
Performance Study

Two interesting performance features
The performance of algorithm is highly relative
to
minisup, especially minisup1
minisup2.
T2 is beneficial

25
Outline

Introduction
Algorithm
Performance studies
Cross-level association
Filtering of uninteresting association rules
Conclusions

26
Cross-level association
food
food
expand
milk
bread
milk
bread
m1
m2
b1
b2
m1
m2
b1
b2
mine rules like milk gt bread and m2 gt b1
mine rules like milk gt b1
27
Cross-level association

Two adjustments
A single minisup is used at all levels
When the frequent k-itemsets are generated,
items
at all levels are considered, itemsets which
contain
an item and its ancestor are excluded

28
Outline

Introduction
Algorithm
Performance studies
Cross-level association
Filtering of uninteresting association rules
Conclusions

29
Filtering of uninteresting association rules

Removal of redundant rules
To remove redundant rules, when a rule R passes
the minimum confidence test, it is checked
against every strong rule R' , of which R is a
descendant. If the confidence of R, ?(R), falls
in the range of the expected confidence with the
variation of ?, it is removed.
Example
milk ?bread(12 sup, 85 con)
Chocolate milk ?bread(1 sup, 84 con)
Not interesting if 8 of milk is chocolate milk
Can reduce rules by 30 to 60

30
Filtering of uninteresting association rules
(continued)

Removal of unnecessary rules
To filter out unnecessary association rules, for
each strong rule R A gt B, we test every such
rule R A - C gt B, where C belongs to A. If the
confidence of R, ?(R), is not significantly
different from that of R' ,?(R' ), R is removed.
Example
80 customer buy milk gt bread
80 customer buy milk butter gt bread
Reduces rules by 50 to 80

31
Conclusions

Extended the association rules from single-level
to multiple-level.
A top-down progressive deepening technique is
developed for mining multiple-level association
rules.
Filtering of uninteresting association rules.

32
Exams Questions

Q1 Give an example of multilevel association
rules?
A Besides finding the 80 of customers that
purchase milk may also purchase bread, it is
interesting to allow users to drill-down and show
that 75 of people buy wheat bread if they buy 2
percent milk.

33
Exams Questions

Q2 What are the problems in using normal Apiori
methods??
A One may apply the Apriori algorithm to examine
data items at multiple levels of abstraction
under the same minimum support and minimum
confidence thresholds. This direction is simple,
but it may lead to some undesirable results.
First, large support is more likely to exist
at high levels of abstraction. If one wants to
find strong associations at relatively low levels
of abstraction, the minimum support threshold
must be reduced substantially this may lead to
the generation of many uninteresting associations
at high or intermediate levels.
Second, since it is unlikely to find many
strong association rules at a primitive concept
level, mining strong associations should be
performed at a rather high concept level, which
is actually the case in many studies. However,
mining association rules at high concept levels
may often lead to the rules corresponding to
prior knowledge and expectations, such as milk
gt bread, (which could be common sense), or lead
to some uninteresting attribute combinations if
the minimum support is allowed to be rather
small, such as toy gt milk, (which may just
happen together by chance).