Title: Using Classification to Evaluate the Output of ConfidenceBased Association Rule Mining
1Using Classification to Evaluate the Output of
Confidence-Based Association Rule Mining
2Motivation
- Previous work
- Association rule mining
- Run time used to compare mining algorithms
- Lack of accuracy-based comparisons
- Associative classification
- Focus on accurate classifiers
Side effect Comparison of a standard
associative classifier to standard techniques
Idea
- Think backwards
- Using the resulting classifiers as basis for
comparisons of confidence-based rule miners
3Overview
- Motivation
- Basics
- Definitions
- Associative classification
- (Class) Association Rule Mining
- Apriori vs. predictive Apriori (by Scheffer)
- Pruning
- Classification
- Quality measures and Experiments
- Results
- Conclusions
evaluate the sort order of rules using
properties of associative classifiers
4Basics Definitions
- A table over n attributes (item
attribute-value pair) - Class association rule implication
where -
class attribute - X body of rule, Y head of the rule
- Confidence of a (class) association rule
- (support s(X) the number of database records
that satisfy X ) - Relative frequency of a correct prediction in
the (training) table of instances.
5Basics Mining - Apriori
rules sorted according to confidence generates
all (class) association rules with support and
confidence larger than predefined values.
- Mines all item sets above minimum support
(frequent item sets) - Divide frequent item sets in rule body and head.
Check if confidence of the rule is above minimum
confidence
Adaptations to mine class association rules as
described by Liu et al (CBA)
- divide training set into classes one for each
class - mine frequent item set separately in each subset
- take frequent item set as body and class label
as head
6Basics Mining predictive Apriori
- Predictive accuracy of a rule r
- support based correction of the confidence value
- Inherent pruning strategy
- Output the best n rules according to
- Expected pred. accuracy among n best
- Rule not subsumed by a rule with at least the
same expected pred. accuracy
prefers more general rules
- Adaptations to mine class association rules
- Generate frequent item sets from all data (class
attribute deleted) as rule body - Generate rule for each class label
7Basics Pruning
- Number of rules too big for direct use in a
classifier - Simple strategy
- Bound the number of rules
- Sort order of mining algorithm remains
- CBA Optional pruning step pessimistic
error-rate-based pruning - A rule is pruned if removing a single item from a
rule results in a reduction of the pessimistic
error rate - CBA Obligatory pruning database coverage
method - Rule that classifies at least one instance
correctly (is highest ranked) belongs to
intermediate classifier - Delete all covered instances
- Take intermediate classifier with lowest number
of errors
8Overview
- Motivation
- Basics
- Definitions
- Associative classification
- (Class) Association Rule Mining
- Apriori vs. predictive Apriori (by Scheffer)
- Pruning
- Classification
- Quality measures
- Results
- Conclusions
Think backwards Use the properties of different
classifiers to obtain accuracy-based measures for
a set of (class) association rules
9Classification
- Input
- Pruned, sorted list of class association rules
- Two different approaches
- Weighted vote algorithm
- Majority vote
- Inversely weighted
- Decision list classifier, e.g. CBA
- Use first rule that covers test instance for
classification
Think backwards Mining algorithm preferable if
resulting classifier is more accurate,
compact, and built in an efficient way
10Quality measures and Experiments
Measures to evaluate confidence-based mining
algorithms
- Accuracy on a test set (2 slides)
- Average rank of the first rule that covers and
correctly predicts a test instance - Number of mined rules and number of rules after
pruning - Time required for mining and for pruning
Comparative study for Apriori and pred. Apriori
12 UCI datasets balance,breast-w, ecoli, glass,
heart-h, iris, labor led7, lenses, pima,
tic-tac-toe,wine One 10 fold cross
validation Discretisation
111a. Accuracy and Ranking
- Inversely weighted
- Emphasises top ranked rules
- Shows importance of good rule ranking
Mining algorithm preferable if resulting
classifier is more accurate.
12 1b. How many rules are necessary to be accurate?
Majority vote classifier
Mining algorithm preferable if resulting
classifier is more compact.
Similar results for CBA
13Comparison CBA to standard techniques
14Conclusions
- Use classification to evaluate the quality of
confidence-based association rule miners - Test evaluation
- Pred. Apriori mines a higher quality set of rules
- Pred. Apriori needs fewer rules
- But pred. Apriori is slower than Apriori
- Comparison of standard associative classifier
(CBA) to standard ML techniques - CBA comparable accuracy to standard techniques
- CBA mines more rules and is slower
- All algorithms are implemented in WEKA or an
add-on to WEKA available from http//www.cs.waikat
o.ac.nz/ml
15The End
Thank you for your attention. Questions...
Contact stefan_mutter_at_directbox.com mhall_at_cs.wai
kato.ac.nz eibe_at_cs.waikato.ac.nz