Using Classification to Evaluate the Output of ConfidenceBased Association Rule Mining

1 / 15

About This Presentation

Title:

Using Classification to Evaluate the Output of ConfidenceBased Association Rule Mining

Description:

class attribute. X body of rule, Y head of the rule. Confidence of a ... Generate frequent item sets from all data (class attribute deleted) as rule body ... –

Number of Views:41

Avg rating:3.0/5.0

Slides: 16

Provided by: stefan97

Category:

more less

Transcript and Presenter's Notes

Title: Using Classification to Evaluate the Output of ConfidenceBased Association Rule Mining

1
Using Classification to Evaluate the Output of
Confidence-Based Association Rule Mining
2
Motivation

Previous work
Association rule mining
Run time used to compare mining algorithms
Lack of accuracy-based comparisons
Associative classification
Focus on accurate classifiers

Side effect Comparison of a standard
associative classifier to standard techniques
Idea

Think backwards
Using the resulting classifiers as basis for
comparisons of confidence-based rule miners

3
Overview

Motivation
Basics
Definitions
Associative classification
(Class) Association Rule Mining
Apriori vs. predictive Apriori (by Scheffer)
Pruning
Classification
Quality measures and Experiments
Results
Conclusions

evaluate the sort order of rules using
properties of associative classifiers
4
Basics Definitions

A table over n attributes (item
attribute-value pair)
Class association rule implication
where
class attribute
X body of rule, Y head of the rule
Confidence of a (class) association rule
(support s(X) the number of database records
that satisfy X )
Relative frequency of a correct prediction in
the (training) table of instances.

5
Basics Mining - Apriori
rules sorted according to confidence generates
all (class) association rules with support and
confidence larger than predefined values.

Mines all item sets above minimum support
(frequent item sets)
Divide frequent item sets in rule body and head.
Check if confidence of the rule is above minimum
confidence

Adaptations to mine class association rules as
described by Liu et al (CBA)

divide training set into classes one for each
class
mine frequent item set separately in each subset
take frequent item set as body and class label
as head

6
Basics Mining predictive Apriori

Predictive accuracy of a rule r
support based correction of the confidence value
Inherent pruning strategy
Output the best n rules according to
Expected pred. accuracy among n best
Rule not subsumed by a rule with at least the
same expected pred. accuracy

prefers more general rules

Adaptations to mine class association rules
Generate frequent item sets from all data (class
attribute deleted) as rule body
Generate rule for each class label

7
Basics Pruning

Number of rules too big for direct use in a
classifier
Simple strategy
Bound the number of rules
Sort order of mining algorithm remains
CBA Optional pruning step pessimistic
error-rate-based pruning
A rule is pruned if removing a single item from a
rule results in a reduction of the pessimistic
error rate
CBA Obligatory pruning database coverage
method
Rule that classifies at least one instance
correctly (is highest ranked) belongs to
intermediate classifier
Delete all covered instances
Take intermediate classifier with lowest number
of errors

8
Overview

Motivation
Basics
Definitions
Associative classification
(Class) Association Rule Mining
Apriori vs. predictive Apriori (by Scheffer)
Pruning
Classification
Quality measures
Results
Conclusions

Think backwards Use the properties of different
classifiers to obtain accuracy-based measures for
a set of (class) association rules
9
Classification

Input
Pruned, sorted list of class association rules
Two different approaches
Weighted vote algorithm
Majority vote
Inversely weighted
Decision list classifier, e.g. CBA
Use first rule that covers test instance for
classification

Think backwards Mining algorithm preferable if
resulting classifier is more accurate,
compact, and built in an efficient way
10
Quality measures and Experiments
Measures to evaluate confidence-based mining
algorithms

Accuracy on a test set (2 slides)
Average rank of the first rule that covers and
correctly predicts a test instance
Number of mined rules and number of rules after
pruning
Time required for mining and for pruning

Comparative study for Apriori and pred. Apriori
12 UCI datasets balance,breast-w, ecoli, glass,
heart-h, iris, labor led7, lenses, pima,
tic-tac-toe,wine One 10 fold cross
validation Discretisation
11
1a. Accuracy and Ranking

Inversely weighted
Emphasises top ranked rules
Shows importance of good rule ranking

Mining algorithm preferable if resulting
classifier is more accurate.
12
1b. How many rules are necessary to be accurate?
Majority vote classifier
Mining algorithm preferable if resulting
classifier is more compact.
Similar results for CBA
13
Comparison CBA to standard techniques
14
Conclusions

Use classification to evaluate the quality of
confidence-based association rule miners
Test evaluation
Pred. Apriori mines a higher quality set of rules
Pred. Apriori needs fewer rules
But pred. Apriori is slower than Apriori
Comparison of standard associative classifier
(CBA) to standard ML techniques
CBA comparable accuracy to standard techniques
CBA mines more rules and is slower
All algorithms are implemented in WEKA or an
add-on to WEKA available from http//www.cs.waikat
o.ac.nz/ml