Title: Optimal rule discovery and applications
1Optimal rule discoveryand applications
- Dr Jiuyong (John) Li
- Dept of Mathematics and Computing
- The University Southern of Queensland
- Toowoomba, Australia
2Outline
- Introductions
- Optimal rule discovery
- Robust rule based classification
- Mining risk patterns in medical data
- Summaries
3Rules
- Strong implications
- If outlook is sunny, and humidity is normal, then
play tennis. - Advantages
- Straightforward and expressive
- Human understandable
- Rule based classification systems are competitive
to many other systems, such as neural networks,
nearest neighbor classifiers, and Bayesian
classifiers.
4Rule types 1
- Traditional classifications rules
- Decision trees, e.g. C4.5rules (Quinlan 1993),
Covering algorithm based, e.g. AQ15 (Michalski,
Mozetic, Hong Lavrac 1986) and CN2 (Clark
Niblett 1989) - Efficient
- Heuristic search, may miss many quality rules
5Data
6A decision tree
7Decision rules
- If outlook is sunny and humidity is high, then do
not play tennis. - If outlook is sunny and humidity is normal, then
play tennis. - If outlook is overcast, then play tennis.
- If outlook is rain and wind is strong, then do
not play tennis. - If outlook is rain and wind is weak, then play
tennis.
8Rule types 2
- Association rules
- Complete search
- Too many rules
- Bottle-neck problem (combinatorial explosion)
- Searching by some anti-monotone properties
- Apriori (Agrawal Srikant 1994) and FP-growth
(Han, Pei Yin 2000) based on anti-monotone
property of support - Many variants
- Non-redundant association rules (Zaki 2004)
- Based on anti-monotone property of closure
9Association rules 1
- Items attribute-value pairs
- (outlook, sunny), (humidity, normal)
- Patterns set of attribute-value pairs
- (outlook, sunny), (humidity, normal)
- Implications pattern -gt class
- (outlook, sunny), (humanity, normal)-gtplay
- Support fraction of pattern, class in data set
- Confidence
- support of pattern, class / support of pattern
- Support 2/14 0.14, confidence 2/2 100
10Association rules 2
- Association rules
- implications whose support and confidence are
greater than the user specified minimum support
and confidence - Frequent patterns (rules)
- Support gt minimum support
- Super (sub) patterns (rules)
- (outlook, sunny), (humidity, normal),
- (outlook, sunny)
11Association rules 3
- Anti-monotone property of support
- If a pattern (rule) is infrequent, all of its
super patterns (rules) are infrequent - Complete search space
- A1xA2xxAm gt 2 power m
- Practical infeasible
- Association rule mining
- Anti-monotone property of support makes the
association rule mining feasible - The minimum support cannot be too small
12Why optimal rules
- Optimal rules
- Complete
- Defined by a variant interestingness criteria
- Reduce the number of rules
- New anti-monotone property that supports the
efficient search - Work well with low minimum support
- Wide applications
- Robust classification
- Medical data mining
- Related work
- Constraint association rule mining method
(Bayardo, Agrawal Gunopulos 2000) - Mining most interesting rules (Bayardo Agrawal,
1999)
13Various interestingness criteria
- Many interestingness criteria have been presented
as a substitute of confidence - Such as lift (interest or strength), gain,
addedvalue, Klosgen, conviction, p-s, Laplace,
cosine, certainty factor, Jaccard, and many
others Tan, Kumar and Srivastava, 2004 - Confidence (or an interestingness criterion) has
no effect in pruning the search space - Confidence is used in forming rules when the
major computational task has finished.
14Uninteresting rules
- Some rules do not carry useful information
- If outlook is overcast, then play tennis.
(support 4/14 confidence 100) - If outlook is overcast and temperature is hot,
then play tennis. (support 2/14 confidence 100) - The latter rule is redundant
- Redundant rules are not optimal. Some
non-redundant rules are not optimal neither.
15Optimal rules 1
- General and specific relationships
- Given two rules P -gt c and Q -gt c where
- P Q, we say that the latter is more specific
than the former and the former is more general
than the latter. - The optimal rule set
- A rule set is optimal with respect to an
interestingness metric if it contains all rules
except those with no greater interestingness than
one of its more general rule.
16Optimal rules 2
- An association rule set
- a -gt z (conf 80), ab -gt z (conf 70), abc
-gt z (conf 70), b -gt z (conf 60 ) - An optimal rule set
- a -gt z (conf 80), b -gt z (conf 60 )
- A non-redundant association rule set
- a -gt z (conf 80), ab -gt z (conf 70), b -gt
z (conf 60 )
17Main results 1
- Anti-monotonic property
- if supp(PX c) supp(P c) then rule PX -gt c
and all its more specific rules will not occur in
an optimal rule set defined by confidence, odds
ratio, lift (interest or strength), gain,
added-value, Klosgen, conviction, p-s (or
leverage), Laplace, cosine, certainty factor or
Jaccard. - The relationship with the non-redundant rule set
- An optimal rule set is a subset of a
non-redundant rule set.
18An illustration
19Main results 2
- Closure property
- If supp(P) supp(PX), then rule PX -gt c for any
c and all its more specific rules do not occur in
an optimal rule set defined by confidence, odds
ratio, lift (interest or strength), gain,
added-value, Klosgen, conviction, p-s (or
leverage), Laplace, cosine, certainty factor or
Jaccard. - Termination property
- If supp(P c) 0, then all more specific
rules of the rule - P -gt c do not occur in an optimal rule set
defined by confidence, odds ratio, lift (interest
or strength), gain, added-value, Klosgen,
conviction, p-s (or leverage), Laplace, cosine,
certainty factoror Jaccard.
20More illustrations
21More illustration
22Data
23Patterns searched by exhaustive search
- 1-patterns 3 3 2 2 10
- 2-patterns 3 X (3 2 2) 3 X (2 2) 2 X
2 35 - 3-patterns 3 X 3 X 2 3 X 3 X 2 3 X 2 X 2
3 X 2 X 2 60 - 4-patterns 3 X 3 X 2 X 2 36
- Total 141
24Patterns searched by association rule discovery
(103)
25Patterns searched by optimal rule discovery (42)
26Experimental results 1
27Experimental results 2
28Experimental results 3
29Experimental results 4
30Conclusions
- Rules defined by various interestingness criteria
can be discovered in the optimal rule discovery
framework, i.e. they satisfy the same
anti-monotone property. - Optimal rule discovery is an efficient approach.
It is significantly more efficient than
association rule discovery and more efficient
than non-redundant rule discovery.
31More details
- J Li, On Optimal Rule Discovery, IEEE
transactions on Knowledge and Data Engineering,
18(4), 2006. - J. Li, H. Shen and R. Topor, Mining the optimal
class association rule set, Knowledge-based
systems, 15 (7), 2002, 399-405, Elsevier Science.
32Data
33Why robust 1
34Why robust 2
- If outlook is sunny and humidity is high, then do
not play tennis. - If outlook is sunny and humidity is normal, then
play tennis. - If outlook is overcast, then play tennis.
- If outlook is rain and wind is strong, then do
not play tennis. - If outlook is rain and wind is weak, then play
tennis.
35Some additional rules are useful
- If humidity is normal and wind is weak, then play
tennis. - If temperature is cool and wind is weak, then
play tennis. - If temperature is mild and humidity is normal,
then play tennis. - If humidity is normal, then play tennis.
36Motivations
- Those additional useful rules are not found by
decision trees. - An association rule set includes too many rules,
and even an optimal rule set includes too many
rules. - For example, mushrooms data set
- Association rules 99126
- Optimal rules 1691
- C4.5rules 16
- How to choose a reasonable rule set for data with
missing values?
37Robust prediction problem 1
- Problem
- Making prediction on a test data that is less
complete than the training data. - Practical implication
- Training data, typically some selective history
data, more controllable. - Test data, future coming data, less controllable.
38Robust prediction problem 2
- General methods for handling missing values are
to pre-process data by substituting missing
values with estimations by some approaches, e.g.
the nearest neighbour substitution (Batista
Monard 2003). - treatment
- The proposed method does not estimate and
substitute any missing values, but builds a model
to tolerate certain number of missing values in
test data. - immunisation
39Definitions 1
- Ordered rule based classifiers
- Rules are organised as a sequence usually in the
descending accuracy order, and only the first
matching rule makes a prediction. For example,
C4.5rules (Quinlan 1993) and CBA (Liu, Hsu Ma
1998). - Predictive rule
- Let T be a record in data set D and R a rule set
for D. A rule r in R is predictive for T wrt R if
r covers T. If two rules cover T we choose the
one with the greater accuracy. If two rules have
the same accuracy we choose the one with higher
support. If two rules have the same support we
choose the one with the shorter antecedent.
40Definitions 2
- Robustness
- Let D be a data set, and R1 and R2 be two rule
sets for D. R2 is at least as robust as R1 if,
for all and ,
predictions made by R2 are at least as accurate
as those by R1. - K-incomplete data set
- Let D be a data set with n attributes, and k gt
0. The k-incomplete data set of D is - Dk
- K-optimal rule set
- A k-optimal rule set contains the set of all
predictive rules on the k-incomplete data set.
41Major results
- The optimal rule set is the most robust rule set
with the smallest rule set size. - A (k 1)-optimal rule set is at least the same
robust as a k-optimal rule set. - A (k 1)-optimal rule set is a super rule set of
a k-optimal rule set.
42An illustrative example
- When a is missing
- min-optimal rule set does not work
- 1-optimal rule set works
43Experiment design
- Use 10 cross validation.
- Randomly add missing values to test data
controlled by parameter l (on average each record
has l missing values). - Repeat 10 X 10 times for one data set.
- Experiment on 28 data sets form UCML.
- Compare with some benchmark classifiers
C4.5rules and CBA. - Compare with some missing values handling
methods most common value substitution and
k-nearest neighbour substitution.
44Experimental results 1
45Experimental results 2
46Experimental results 3
47Experimental results 4
48Main conclusions
- Optimal classifiers are more robust than some
benchmark rule based classifiers, such as
C4.5rules and CBA. They make higher accurate
predictions on test data with missing values than
C4.5rules and CBA do. - Building optimal classifiers is better than some
missing value handling, such as most k-nearest
neighbour substitution and most common value
substitution.
49More details
- J. Li, Robust Rule-based Prediction A Redundant
Rule Approach, IEEE transactions on Knowledge and
Data Engineering, 18(8), 2006. - H. Hu and J. Li, Using association rules to make
rule-based classifiers robust, Proceedings of
sixteenth Australasian database conference (ADC),
2005, 47 52, ACS Society. - J. Li, R. Topor and H. Shen, Construct robust
rule sets for classification, Proceedings of the
eighth ACMKDD international conference on
knowledge discovery and data mining (KDD), 2002,
Edmonton, Canada, 564-569, ACM press.
50Risk patterns 1
- Out of 200 smokers, 3 suffer lung cancer
- Out of 800 non-smokers. 0.5 suffer lung cancer
- Smoking is 6 times more risky to lung cancer than
non-smokers
51Risk patterns 2
Relative risk
A concept that has been widely used in
epidemiological research.
52Problems
- Relative risk metric is not consistent with
accuracy, and a normal classification system does
not work well. - Data set is normally very skewed, and the global
support of association rule mining is not
suitable. - Patterns may contain many conditions, and this
causes combinatorial explosion.
53A solution
- Replace the (global) support by local support
- It can be characterised as the optimal rule
discovery problem - Both local support and relative risk satisfy
anti-monotone properties - If a pattern is not frequent, neither are its
super patterns - If (supp(Pxa) supp(Pa)) then pattern Px and
all its super patterns do not occur in the
optimal risk pattern set.
54A real world case study 1
- This method has been applied to a real world
project of detecting adverse drug reactions - The project has been sponsored by the Australian
Commonwealth Department of Health and Aging - The data set used is a linked data set of
hospital, pharmaceutical and medical service data - To determine how ACE inhibitor usage is
associated with Angioedema.
55A real world case study 2
56A real world case study 3
- Pattern 1 RR 3.99
- Gender Female
- Hospital Circulatory Flag Yes
- Usage of Drugs in category Various Yes
- Pattern 2 RR 3.82
- Age gt 60
- Usage of drugs in category of Genito urinary
system and sex hormones Yes - Usage of drugs in category of Systematic
hormonal preparations Yes - Pattern 3 RR 3.41
- Usage of drugs in category of Genito urinary
system and sex hormones Yes - Usage of drugs in category of General
anti-infective for systematic use Yes - Usage of drugs in category of Nervous system
No
57A real world case study 4
58A real world case study 5
59A real world case study 6
60Conclusions
- An optimal rule discovery method is efficient
approach in discovering risk patterns in large
skewed medical data sets. - More details
- J. Li, A. Fu, H. He, J. Chen, H. Jin, D.
McAullay, G. Williams, R. Sparks and C. Kelman,
Mining risk patterns in medical data, Proceeding
of the eleventh ACM SIGKDD international
conference on knowledge discovery in data mining
(KDD05), 2005, 770-775, Chicago, ACM Press, New
York.
61Summaries
- Optimal rule discovery is an efficient approach
in discovering various optimal rules - Optimal classifiers are more robust than some
benchmark rule based classifiers, such as
C4.5rules and CBA - An optimal rule discovery method is efficient in
discovering risk patterns in large skewed medical
data set
62Acknowledgements
- Collaborators
- Hong Shen, Rodney Topor, Hong Hu, Ada Fu,
Hongxing He, Jie Chen, Huidong Jin, Graham
Williams, and et al. - Internal reviewers
- Tony Roberts, Ron House, and Xiaodi Huang
- Australian Research Council grant, P0559090
- USQ Early Career Researcher Program grant,
4710/1000479
63Thank you
My papers and software tools are available
from http//www.sci.usq.edu.au/staff/jiuyong