Using Classification to Evaluate the Output of ConfidenceBased Association Rule Mining

1 / 15
About This Presentation
Title:

Using Classification to Evaluate the Output of ConfidenceBased Association Rule Mining

Description:

class attribute. X body of rule, Y head of the rule. Confidence of a ... Generate frequent item sets from all data (class attribute deleted) as rule body ... –

Number of Views:41
Avg rating:3.0/5.0
Slides: 16
Provided by: stefan97
Category:

less

Transcript and Presenter's Notes

Title: Using Classification to Evaluate the Output of ConfidenceBased Association Rule Mining


1
Using Classification to Evaluate the Output of
Confidence-Based Association Rule Mining
2
Motivation
  • Previous work
  • Association rule mining
  • Run time used to compare mining algorithms
  • Lack of accuracy-based comparisons
  • Associative classification
  • Focus on accurate classifiers

Side effect Comparison of a standard
associative classifier to standard techniques
Idea
  • Think backwards
  • Using the resulting classifiers as basis for
    comparisons of confidence-based rule miners

3
Overview
  • Motivation
  • Basics
  • Definitions
  • Associative classification
  • (Class) Association Rule Mining
  • Apriori vs. predictive Apriori (by Scheffer)
  • Pruning
  • Classification
  • Quality measures and Experiments
  • Results
  • Conclusions

evaluate the sort order of rules using
properties of associative classifiers
4
Basics Definitions
  • A table over n attributes (item
    attribute-value pair)
  • Class association rule implication
    where

  • class attribute
  • X body of rule, Y head of the rule
  • Confidence of a (class) association rule
  • (support s(X) the number of database records
    that satisfy X )
  • Relative frequency of a correct prediction in
    the (training) table of instances.

5
Basics Mining - Apriori
rules sorted according to confidence generates
all (class) association rules with support and
confidence larger than predefined values.
  • Mines all item sets above minimum support
    (frequent item sets)
  • Divide frequent item sets in rule body and head.
    Check if confidence of the rule is above minimum
    confidence

Adaptations to mine class association rules as
described by Liu et al (CBA)
  • divide training set into classes one for each
    class
  • mine frequent item set separately in each subset
  • take frequent item set as body and class label
    as head

6
Basics Mining predictive Apriori
  • Predictive accuracy of a rule r
  • support based correction of the confidence value
  • Inherent pruning strategy
  • Output the best n rules according to
  • Expected pred. accuracy among n best
  • Rule not subsumed by a rule with at least the
    same expected pred. accuracy

prefers more general rules
  • Adaptations to mine class association rules
  • Generate frequent item sets from all data (class
    attribute deleted) as rule body
  • Generate rule for each class label

7
Basics Pruning
  • Number of rules too big for direct use in a
    classifier
  • Simple strategy
  • Bound the number of rules
  • Sort order of mining algorithm remains
  • CBA Optional pruning step pessimistic
    error-rate-based pruning
  • A rule is pruned if removing a single item from a
    rule results in a reduction of the pessimistic
    error rate
  • CBA Obligatory pruning database coverage
    method
  • Rule that classifies at least one instance
    correctly (is highest ranked) belongs to
    intermediate classifier
  • Delete all covered instances
  • Take intermediate classifier with lowest number
    of errors

8
Overview
  • Motivation
  • Basics
  • Definitions
  • Associative classification
  • (Class) Association Rule Mining
  • Apriori vs. predictive Apriori (by Scheffer)
  • Pruning
  • Classification
  • Quality measures
  • Results
  • Conclusions

Think backwards Use the properties of different
classifiers to obtain accuracy-based measures for
a set of (class) association rules
9
Classification
  • Input
  • Pruned, sorted list of class association rules
  • Two different approaches
  • Weighted vote algorithm
  • Majority vote
  • Inversely weighted
  • Decision list classifier, e.g. CBA
  • Use first rule that covers test instance for
    classification

Think backwards Mining algorithm preferable if
resulting classifier is more accurate,
compact, and built in an efficient way
10
Quality measures and Experiments
Measures to evaluate confidence-based mining
algorithms
  • Accuracy on a test set (2 slides)
  • Average rank of the first rule that covers and
    correctly predicts a test instance
  • Number of mined rules and number of rules after
    pruning
  • Time required for mining and for pruning

Comparative study for Apriori and pred. Apriori
12 UCI datasets balance,breast-w, ecoli, glass,
heart-h, iris, labor led7, lenses, pima,
tic-tac-toe,wine One 10 fold cross
validation Discretisation
11
1a. Accuracy and Ranking
  • Inversely weighted
  • Emphasises top ranked rules
  • Shows importance of good rule ranking

Mining algorithm preferable if resulting
classifier is more accurate.
12
1b. How many rules are necessary to be accurate?
Majority vote classifier
Mining algorithm preferable if resulting
classifier is more compact.
Similar results for CBA
13
Comparison CBA to standard techniques
14
Conclusions
  • Use classification to evaluate the quality of
    confidence-based association rule miners
  • Test evaluation
  • Pred. Apriori mines a higher quality set of rules
  • Pred. Apriori needs fewer rules
  • But pred. Apriori is slower than Apriori
  • Comparison of standard associative classifier
    (CBA) to standard ML techniques
  • CBA comparable accuracy to standard techniques
  • CBA mines more rules and is slower
  • All algorithms are implemented in WEKA or an
    add-on to WEKA available from http//www.cs.waikat
    o.ac.nz/ml

15
The End
Thank you for your attention. Questions...
Contact stefan_mutter_at_directbox.com mhall_at_cs.wai
kato.ac.nz eibe_at_cs.waikato.ac.nz
Write a Comment
User Comments (0)
About PowerShow.com