Optimal rule discovery and applications

About This Presentation

Title:

Optimal rule discovery and applications

Description:

Rule based classification systems are competitive to many other systems, such as ... some approaches, e.g. the nearest neighbour substitution (Batista & Monard 2003) ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 64

Provided by: sciUs

Category:

more less

Transcript and Presenter's Notes

Title: Optimal rule discovery and applications

1
Optimal rule discoveryand applications

Dr Jiuyong (John) Li
Dept of Mathematics and Computing
The University Southern of Queensland
Toowoomba, Australia

2
Outline

Introductions
Optimal rule discovery
Robust rule based classification
Mining risk patterns in medical data
Summaries

3
Rules

Strong implications
If outlook is sunny, and humidity is normal, then
play tennis.
Advantages
Straightforward and expressive
Human understandable
Rule based classification systems are competitive
to many other systems, such as neural networks,
nearest neighbor classifiers, and Bayesian
classifiers.

4
Rule types 1

Traditional classifications rules
Decision trees, e.g. C4.5rules (Quinlan 1993),
Covering algorithm based, e.g. AQ15 (Michalski,
Mozetic, Hong Lavrac 1986) and CN2 (Clark
Niblett 1989)
Efficient
Heuristic search, may miss many quality rules

5
Data
6
A decision tree
7
Decision rules

If outlook is sunny and humidity is high, then do
not play tennis.
If outlook is sunny and humidity is normal, then
play tennis.
If outlook is overcast, then play tennis.
If outlook is rain and wind is strong, then do
not play tennis.
If outlook is rain and wind is weak, then play
tennis.

8
Rule types 2

Association rules
Complete search
Too many rules
Bottle-neck problem (combinatorial explosion)
Searching by some anti-monotone properties
Apriori (Agrawal Srikant 1994) and FP-growth
(Han, Pei Yin 2000) based on anti-monotone
property of support
Many variants
Non-redundant association rules (Zaki 2004)
Based on anti-monotone property of closure

9
Association rules 1

Items attribute-value pairs
(outlook, sunny), (humidity, normal)
Patterns set of attribute-value pairs
(outlook, sunny), (humidity, normal)
Implications pattern -gt class
(outlook, sunny), (humanity, normal)-gtplay
Support fraction of pattern, class in data set
Confidence
support of pattern, class / support of pattern
Support 2/14 0.14, confidence 2/2 100

10
Association rules 2

Association rules
implications whose support and confidence are
greater than the user specified minimum support
and confidence
Frequent patterns (rules)
Support gt minimum support
Super (sub) patterns (rules)
(outlook, sunny), (humidity, normal),
(outlook, sunny)

11
Association rules 3

Anti-monotone property of support
If a pattern (rule) is infrequent, all of its
super patterns (rules) are infrequent
Complete search space
A1xA2xxAm gt 2 power m
Practical infeasible
Association rule mining
Anti-monotone property of support makes the
association rule mining feasible
The minimum support cannot be too small

12
Why optimal rules

Optimal rules
Complete
Defined by a variant interestingness criteria
Reduce the number of rules
New anti-monotone property that supports the
efficient search
Work well with low minimum support
Wide applications
Robust classification
Medical data mining
Related work
Constraint association rule mining method
(Bayardo, Agrawal Gunopulos 2000)
Mining most interesting rules (Bayardo Agrawal,
1999)

13
Various interestingness criteria

Many interestingness criteria have been presented
as a substitute of confidence
Such as lift (interest or strength), gain,
addedvalue, Klosgen, conviction, p-s, Laplace,
cosine, certainty factor, Jaccard, and many
others Tan, Kumar and Srivastava, 2004
Confidence (or an interestingness criterion) has
no effect in pruning the search space
Confidence is used in forming rules when the
major computational task has finished.

14
Uninteresting rules

Some rules do not carry useful information
If outlook is overcast, then play tennis.
(support 4/14 confidence 100)
If outlook is overcast and temperature is hot,
then play tennis. (support 2/14 confidence 100)
The latter rule is redundant
Redundant rules are not optimal. Some
non-redundant rules are not optimal neither.

15
Optimal rules 1

General and specific relationships
Given two rules P -gt c and Q -gt c where
P Q, we say that the latter is more specific
than the former and the former is more general
than the latter.
The optimal rule set
A rule set is optimal with respect to an
interestingness metric if it contains all rules
except those with no greater interestingness than
one of its more general rule.

16
Optimal rules 2

An association rule set
a -gt z (conf 80), ab -gt z (conf 70), abc
-gt z (conf 70), b -gt z (conf 60 )
An optimal rule set
a -gt z (conf 80), b -gt z (conf 60 )
A non-redundant association rule set
a -gt z (conf 80), ab -gt z (conf 70), b -gt
z (conf 60 )

17
Main results 1

Anti-monotonic property
if supp(PX c) supp(P c) then rule PX -gt c
and all its more specific rules will not occur in
an optimal rule set defined by confidence, odds
ratio, lift (interest or strength), gain,
added-value, Klosgen, conviction, p-s (or
leverage), Laplace, cosine, certainty factor or
Jaccard.
The relationship with the non-redundant rule set
An optimal rule set is a subset of a
non-redundant rule set.

18
An illustration
19
Main results 2

Closure property
If supp(P) supp(PX), then rule PX -gt c for any
c and all its more specific rules do not occur in
an optimal rule set defined by confidence, odds
ratio, lift (interest or strength), gain,
added-value, Klosgen, conviction, p-s (or
leverage), Laplace, cosine, certainty factor or
Jaccard.
Termination property
If supp(P c) 0, then all more specific
rules of the rule
P -gt c do not occur in an optimal rule set
defined by confidence, odds ratio, lift (interest
or strength), gain, added-value, Klosgen,
conviction, p-s (or leverage), Laplace, cosine,
certainty factoror Jaccard.

20
More illustrations
21
More illustration
22
Data
23
Patterns searched by exhaustive search

1-patterns 3 3 2 2 10
2-patterns 3 X (3 2 2) 3 X (2 2) 2 X
2 35
3-patterns 3 X 3 X 2 3 X 3 X 2 3 X 2 X 2
3 X 2 X 2 60
4-patterns 3 X 3 X 2 X 2 36
Total 141

24
Patterns searched by association rule discovery
(103)
25
Patterns searched by optimal rule discovery (42)
26
Experimental results 1
27
Experimental results 2
28
Experimental results 3
29
Experimental results 4
30
Conclusions

Rules defined by various interestingness criteria
can be discovered in the optimal rule discovery
framework, i.e. they satisfy the same
anti-monotone property.
Optimal rule discovery is an efficient approach.
It is significantly more efficient than
association rule discovery and more efficient
than non-redundant rule discovery.

31
More details

J Li, On Optimal Rule Discovery, IEEE
transactions on Knowledge and Data Engineering,
18(4), 2006.
J. Li, H. Shen and R. Topor, Mining the optimal
class association rule set, Knowledge-based
systems, 15 (7), 2002, 399-405, Elsevier Science.

32
Data
33
Why robust 1
34
Why robust 2

If outlook is sunny and humidity is high, then do
not play tennis.
If outlook is sunny and humidity is normal, then
play tennis.
If outlook is overcast, then play tennis.
If outlook is rain and wind is strong, then do
not play tennis.
If outlook is rain and wind is weak, then play
tennis.

35
Some additional rules are useful

If humidity is normal and wind is weak, then play
tennis.
If temperature is cool and wind is weak, then
play tennis.
If temperature is mild and humidity is normal,
then play tennis.
If humidity is normal, then play tennis.

36
Motivations

Those additional useful rules are not found by
decision trees.
An association rule set includes too many rules,
and even an optimal rule set includes too many
rules.
For example, mushrooms data set
Association rules 99126
Optimal rules 1691
C4.5rules 16
How to choose a reasonable rule set for data with
missing values?

37
Robust prediction problem 1

Problem
Making prediction on a test data that is less
complete than the training data.
Practical implication
Training data, typically some selective history
data, more controllable.
Test data, future coming data, less controllable.

38
Robust prediction problem 2

General methods for handling missing values are
to pre-process data by substituting missing
values with estimations by some approaches, e.g.
the nearest neighbour substitution (Batista
Monard 2003).
treatment
The proposed method does not estimate and
substitute any missing values, but builds a model
to tolerate certain number of missing values in
test data.
immunisation

39
Definitions 1

Ordered rule based classifiers
Rules are organised as a sequence usually in the
descending accuracy order, and only the first
matching rule makes a prediction. For example,
C4.5rules (Quinlan 1993) and CBA (Liu, Hsu Ma
1998).
Predictive rule
Let T be a record in data set D and R a rule set
for D. A rule r in R is predictive for T wrt R if
r covers T. If two rules cover T we choose the
one with the greater accuracy. If two rules have
the same accuracy we choose the one with higher
support. If two rules have the same support we
choose the one with the shorter antecedent.

40
Definitions 2

Robustness
Let D be a data set, and R1 and R2 be two rule
sets for D. R2 is at least as robust as R1 if,
for all and ,
predictions made by R2 are at least as accurate
as those by R1.
K-incomplete data set
Let D be a data set with n attributes, and k gt
0. The k-incomplete data set of D is
Dk
K-optimal rule set
A k-optimal rule set contains the set of all
predictive rules on the k-incomplete data set.

41
Major results

The optimal rule set is the most robust rule set
with the smallest rule set size.
A (k 1)-optimal rule set is at least the same
robust as a k-optimal rule set.
A (k 1)-optimal rule set is a super rule set of
a k-optimal rule set.

42
An illustrative example

When a is missing
min-optimal rule set does not work
1-optimal rule set works

43
Experiment design

Use 10 cross validation.
Randomly add missing values to test data
controlled by parameter l (on average each record
has l missing values).
Repeat 10 X 10 times for one data set.
Experiment on 28 data sets form UCML.
Compare with some benchmark classifiers
C4.5rules and CBA.
Compare with some missing values handling
methods most common value substitution and
k-nearest neighbour substitution.

44
Experimental results 1
45
Experimental results 2
46
Experimental results 3
47
Experimental results 4
48
Main conclusions

Optimal classifiers are more robust than some
benchmark rule based classifiers, such as
C4.5rules and CBA. They make higher accurate
predictions on test data with missing values than
C4.5rules and CBA do.
Building optimal classifiers is better than some
missing value handling, such as most k-nearest
neighbour substitution and most common value
substitution.

49
More details

J. Li, Robust Rule-based Prediction A Redundant
Rule Approach, IEEE transactions on Knowledge and
Data Engineering, 18(8), 2006.
H. Hu and J. Li, Using association rules to make
rule-based classifiers robust, Proceedings of
sixteenth Australasian database conference (ADC),
2005, 47 52, ACS Society.
J. Li, R. Topor and H. Shen, Construct robust
rule sets for classification, Proceedings of the
eighth ACMKDD international conference on
knowledge discovery and data mining (KDD), 2002,
Edmonton, Canada, 564-569, ACM press.

50
Risk patterns 1

Out of 200 smokers, 3 suffer lung cancer
Out of 800 non-smokers. 0.5 suffer lung cancer
Smoking is 6 times more risky to lung cancer than
non-smokers

51
Risk patterns 2
Relative risk
A concept that has been widely used in
epidemiological research.
52
Problems

Relative risk metric is not consistent with
accuracy, and a normal classification system does
not work well.
Data set is normally very skewed, and the global
support of association rule mining is not
suitable.
Patterns may contain many conditions, and this
causes combinatorial explosion.

53
A solution

Replace the (global) support by local support
It can be characterised as the optimal rule
discovery problem
Both local support and relative risk satisfy
anti-monotone properties
If a pattern is not frequent, neither are its
super patterns
If (supp(Pxa) supp(Pa)) then pattern Px and
all its super patterns do not occur in the
optimal risk pattern set.

54
A real world case study 1

This method has been applied to a real world
project of detecting adverse drug reactions
The project has been sponsored by the Australian
Commonwealth Department of Health and Aging
The data set used is a linked data set of
hospital, pharmaceutical and medical service data
To determine how ACE inhibitor usage is
associated with Angioedema.

55
A real world case study 2
56
A real world case study 3

Pattern 1 RR 3.99
Gender Female
Hospital Circulatory Flag Yes
Usage of Drugs in category Various Yes
Pattern 2 RR 3.82
Age gt 60
Usage of drugs in category of Genito urinary
system and sex hormones Yes
Usage of drugs in category of Systematic
hormonal preparations Yes
Pattern 3 RR 3.41
Usage of drugs in category of Genito urinary
system and sex hormones Yes
Usage of drugs in category of General
anti-infective for systematic use Yes
Usage of drugs in category of Nervous system
No

57
A real world case study 4
58
A real world case study 5
59
A real world case study 6
60
Conclusions

An optimal rule discovery method is efficient
approach in discovering risk patterns in large
skewed medical data sets.
More details
J. Li, A. Fu, H. He, J. Chen, H. Jin, D.
McAullay, G. Williams, R. Sparks and C. Kelman,
Mining risk patterns in medical data, Proceeding
of the eleventh ACM SIGKDD international
conference on knowledge discovery in data mining
(KDD05), 2005, 770-775, Chicago, ACM Press, New
York.

61
Summaries

Optimal rule discovery is an efficient approach
in discovering various optimal rules
Optimal classifiers are more robust than some
benchmark rule based classifiers, such as
C4.5rules and CBA
An optimal rule discovery method is efficient in
discovering risk patterns in large skewed medical
data set

62
Acknowledgements

Collaborators
Hong Shen, Rodney Topor, Hong Hu, Ada Fu,
Hongxing He, Jie Chen, Huidong Jin, Graham
Williams, and et al.
Internal reviewers
Tony Roberts, Ron House, and Xiaodi Huang
Australian Research Council grant, P0559090
USQ Early Career Researcher Program grant,
4710/1000479

63
Thank you