Title: Daniel Delic, HansJ' Lenz, Mattis Neiling
1Mining Association Rules with Rough Sets and
Large Itemsets - A Comparative Study
- Daniel Delic, Hans-J. Lenz, Mattis Neiling
- Free University of Berlin
- Institute of Applied Computer Science
- Garystr. 21, D-14195 Berlin, Germany
2Two different methods for the extraction of
association rules
- Large itemset method (e.g. Apriori)
- Rough set method
1 INTRODUCTION
3- Introduction
- Large Itemset Method
- Rough Set Method
- Comparison of the Procedures
- Hybrid Procedure Apriori
- Summary
- Outlook
- References
4LARGE ITEMSET METHOD
2 LARGE ITEMSET METHOD
5- Type of analyzable data
- "Market basket data" ? Attributes with boolean
domains - Stored in table ? Each row representing a market
basket
2 LARGE ITEMSET METHOD
6- Large k-Itemset generation with Apriori
- Minimum support 40
2 LARGE ITEMSET METHOD
72 LARGE ITEMSET METHOD
8Step 3
- Large 2-Itemsets
- Spaghetti, Tomato Sauce
- Spaghetti, Bread
- Tomato Sauce, Bread
- Candidate 3-Itemsets
- Spaghetti, Tomato Sauce,Bread ? support 1
20
2 LARGE ITEMSET METHOD
92 LARGE ITEMSET METHOD
10ROUGH SET METHOD
11- Type of analyseable data
- Attributes which can have more than two values
- Predefined set of condition attributes and
decision attribute(s) - Stored in table ? each row containing values of
the predefined attributes
3 ROUGH SET METHOD
12Deriving association rules with rough sets
Step 1
Creating partitions over U Partition U divided
into subsets (equivalence classes) induced by
equivalence relations
3 ROUGH SET METHOD
13Examples of Equivalence relations R1 (u,
v)u and v have the same temperature R2 (u,
v)u and v have the same blood pressure R3
(u, v)u and v have the same temperature and
blood pressure R4 (u, v)u and v have the
same heart problem
3 ROUGH SET METHOD
14- Partition R3
- Induced by equivalence relation R3 (based on
condition attributes) - R3 (u, v)u and v have the same temperature
and blood pressure - R3 ? R3 X1, X2, X3 with
- X1 Adams, Brown, X2 Ford, X3 Gill,
Bellows
3 ROUGH SET METHOD
15Partition R4 Induced by equivalence relation
R4 (based on decision attribute(s)) R4 (u,
v)u and v have the same heart problem R4 ? R4
Y1, Y2 with Y1 Adams, Brown, Gill, Y2
Ford, Bellows
3 ROUGH SET METHOD
16Step 2
- Defining the approximation space
- overlapping the partitions created by the
equivalence relations - Result 3 distinct regions in the approximation
space - Positive region POSS(Yj) Uxi?Yj Xi X1
- Boundary region BNDS(Yj) Uxi?Yj?? Xi X3
- Negative region NEGS(Yj) Uxi?Yj? Xi X2
3 ROUGH SET METHOD
17X1
Y1
- Rules from positive region (POSS(Yj) Uxi?Yj Xi
) - Example for POSS(Y1)
- X1 Adams, Brown ? Y1 Adams, Brown, Gill
- ? Clear rule (confidence 100, support 40)
- If temperature normal and blood pressure low then
heart problem no
3 ROUGH SET METHOD
18Y1
X3
- Rules from boundary region (BNDS(Yj) Uxi?Yj??
Xi ) - Example for BNDS(Y1)
- X3 Gill, Bellows ? Y1 Adams, Brown, Gill
- ? possible rule (confidence ?, support 20)
- If temperature high and blood pressure high then
heart problem no - ? confidence c Xi ? Yj / Xj X3 ? Y1 /
X3 1 / 2 0,5 50
3 ROUGH SET METHOD
19Y1
X2
- Negative region (NEGS(Yj) Uxi?Yj? Xi )
- Example for NEGS(Y1)
- X2 Ford ? Y1 Adams, Brown, Gill
- ? since X2 ? Y1 ?, no rule derivable from the
negative region
3 ROUGH SET METHOD
20Reducts ? Simplification of rules by removal of
unecessary attributes
?
Original rule If temperature normal and blood
pressure low then heart problem no Simplified
(more precise) rule If blood pressure low then
heart problem no
3 ROUGH SET METHOD
21COMPARISON OF THE PROCEDURES
22- Prerequisites for comparison of both methods
- modification of rough set method (RS-Rules)
- ? no fixed decision attribute required
(RS-Rules) - Compatible data structure ? Bitmaps
4 DATA TRANSFORMATION
23- Benchmark data sets1
- Car Evaluation Database 1728 tuples, 25 bitmap
attributes - Mushroom Database 8416 tuples, 12 original
attributes selected, - 68 bitmap attributes
- Adult 32561 tuples, 12 original attributes
selected, 61 bitmap attributes
- Results
- almost similar results for all examined tables
- exceptions reducts
- ? Quality of rough set rules better (more
precise rules)
1 UCI Repository of Machine Learning Database and
Domain Theories (URL ftp.ics.uci.edu/pub/machine-
learning-databases 2 Algorithms written in Visual
Basic 6.0, executed on Win98 PC with AMD K6-2/400
processor
5 COMPARISON OF THE PROCEDURES
24HYBRID PROCEDURE Apriori
6 HYBRID PROCEDURE Apriori
25- Hybrid Method Apriori
- based on Apriori
- capable of extracting reducts
- capable of deriving rules based on predefined
decision attribute
- Comparison Results (Apriori compared to
RS-Rules) - identical rules
6 HYBRID PROCEDURE Apriori
26SUMMARY
27- creation of a compatible datatype for both
methods - comparison of both methods
- RS-Rules derived rules that were more precise
(due to reducts) than those derived by Apriori - Apriori derived same rules as RS-Rules
- Computing times in favor of the large itemset
methods - Conclusion Combination of both original methods
best solution
7 CONCLUSION
28OUTLOOK
29- More Interesting Capabilities of Rough Sets
- Analysing dependencies between rules
- Analysing the impact of one special condition
attribute on the - decision attribute(s)
- Idea
- Enhancing the data mining capabilities of
Apriori by those further - rough set features
- ? Result A powerful and efficient data mining
application (?)
8 OUTLOOK
30References
- Agrawal, R. and Srikant, S. (1994). Fast
Algorithms for Mining Association Rules in Large
Databases. In VLDB94, 487499. Morgan Kaufmann. - Düntsch, I. and Gediga G. (1999). Rough set data
analysis. - Munakata, T. (1998). Rough Sets. In Fundamentals
of the New Artificial Intelligence, 140182. New
York Springer-Verlag.