Informative rule set and unbalanced class distributions - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Informative rule set and unbalanced class distributions

Description:

Mining Informative Rule Set for ... Association Rule Discovery with Unbalanced Class ... Presented by Jonas Maaskola. Outline. Introduction. Notation and ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 36
Provided by: zib8
Category:

less

Transcript and Presenter's Notes

Title: Informative rule set and unbalanced class distributions


1
Informative rule set and unbalanced class
distributions
  • J. Li, H. Shen, R. Topor
  • Mining Informative Rule Set for Prediction
  • Journal of Intelligent Information Systems, 222,
    155174, 2004
  • L. Gu, J. Li, H. He, G. J. Williams, S. Hawkins,
    C. Kelman
  • Association Rule Discovery with Unbalanced Class
    Distributions
  • Australian Conference on Artificial Intelligence
    2003 221-232
  • Presented by Jonas Maaskola

2
Outline
  • Introduction
  • Notation and rule measures
  • Informative rule set
  • Definition the informative rule set (IR)
  • Lemmas and properties
  • Algorithm mine IR
  • Unbalanced Classes
  • Illustration of the problem
  • Different interestingness metrics
  • Application example evaluated

3
Part 0Introduction
4
Notation
  • I 1,2,...,m a set of items.
  • Transaction T Í I is a set of items
  • Database D collection of transactions
  • Given two non-overlapping itemsets X,Y
    Association rule XgtY defined if
  • sup(X) gt ?sup and conf(XgtY) gt ?conf
  • (Please see next slide for definition of sup(X)
    and conf(XgtY).)

5
Rule measures
  • sup(X) Support of itemset X. Relative frequency
    of transactions containing X.
  • conf(XgtY) Confidence of rule XgtY. Conditional
    probability of transactions containing Y given
    they contain X.

6
Rule measures
  • If A,B are itemsets, denote AÈB as AB.
  • Given two rules Agtc and ABgtc, Agtc is more
    general than ABgtc and ABgtc is more specific
    than Agtc.

7
Part 1Informative rule set
8
Informative vs. association rule set
  • Association rule set
  • Includes all association rules that exceed a
    confidence threshold.
  • Informative rule set
  • Includes all rules satisfying a minimum support.
  • Excludes all more specific rules whose confidence
    is not greater than any of its more general
    rules'.

9
IR Example 1
  • min sup(z) min conf(xgty) 0.5
  • Transaction DB 1a,b,c, 2a,b,c, 3a,b,c,
    4a,b,d, 5a,c,d, 6b,c,d

10
IR Example 1
  • min sup(z) min conf(xgty) 0.5
  • Transaction DB 1a,b,c, 2a,b,c, 3a,b,c,
    4a,b,d, 5a,c,d, 6b,c,d
  • 12 association rules exceed the thresholds
    agtb(0.67,0.8), agtc(0.67,0.8), bgtc(0.67,0.8),
    bgta(0.67,0.8), cgta(0.67,0.8), cgtb(0.67,0.8),
    abgtc(0.5,0.75), acgtb(0.5,0.75),
    bcgta(0.5,0.75), agtbc(0.5,0.6), bgtac(0.5,0.6),
    cgtab(0.5,0.6)

11
IR Example 1
  • min sup(z) min conf(xgty) 0.5
  • Transaction DB 1a,b,c, 2a,b,c, 3a,b,c,
    4a,b,d, 5a,c,d, 6b,c,d
  • 12 association rules exceed the thresholds
    agtb(0.67,0.8), agtc(0.67,0.8), bgtc(0.67,0.8),
    bgta(0.67,0.8), cgta(0.67,0.8), cgtb(0.67,0.8),
    abgtc(0.5,0.75), acgtb(0.5,0.75),
    bcgta(0.5,0.75), agtbc(0.5,0.6), bgtac(0.5,0.6),
    cgtab(0.5,0.6)
  • Informative rule set agtb(0.67,0.8),
    agtc(0.67,0.8), bgtc(0.67,0.8), bgta(0.67,0.8),
    cgta(0.67,0.8), cgtb(0.67,0.8)

12
IR Example 2
  • min sup(z) min conf(xgty) 0.5
  • Rule set agtb(0.25, 1.0), agtc(0.2,0.7),
    abgtc(0.2,0.7), bgtd(0.3,1.0), agtd(0.25,1.0)

13
IR Example 2
  • min sup(z) min conf(xgty) 0.5
  • Rule set agtb(0.25, 1.0), agtc(0.2,0.7),
    abgtc(0.2,0.7), bgtd(0.3,1.0), agtd(0.25,1.0)
  • In this case the IR is identical to the above
    rule set, because
  • abgtc can not be ommited because the more general
    rule agtc has same confidence and
  • agtd can not be ommited, as transitive reasoning
    is not intended.

14
Lemmas and properties of IR
  • There exists a unique IR for any given rule set.
  • IR is the smallest subset of AR fulfilling (4).
  • To predict select matching rules in decreasing
    order of confidence. Stop when satisfied or no
    rules left.
  • IR predicts items in the same order as
    association rule set when using confidence
    priority.

15
Candidate tree
16
Algorithm mine informative rule set
  • Input Database D, the minimum support s and the
    minimum confidence ?.
  • Output The informative rule set R.
  • Set the informative rule set R Ø
  • Count support of 1-itemsets
  • Initialize candidate tree T
  • Generate new candidates as leaves of T
  • While (new candidate set is non-empty)
  • Count support of the new candidates
  • Prune the new candiate set
  • Include qualified rules from T to R
  • Generate new candidates as leaves of T
  • Return rule set R

17
IR - Conclusions
  • IR makes the same predictions as AR.
  • IR set significantly smaller than AR when
    minimum support is small.
  • IR can be generated efficiently.
  • IR does not make use of transitive reasoning.

18
Part 2Unbalanced classes
19
Introductory problem illustration
  • Consider the following cases
  • Here P denotes a pattern that is observed in two
    different classes C1 and C2.

20
Introductory problem illustration
  • Consider the following cases

Example 1 Prob(Pc2)0.6, Prob(Pc1)0.3 Prob(Pc
2) / Prob(Pc1) 2 Example 2
Prob(Pc2)0.95, Prob(Pc1)0.8 Prob(Pc2) /
Prob(Pc1) 1.19
21
Nature ofinterestingness metrics
  • The metrics should be fair for both large and
    small classes.
  • More generally, they should be fair regardless of
    the classes' distribution.

22
Interestingness metrics
  • Lift
  • lift(XgtY) sup(XY)/(sup(X)sup(Y))
  • Local support (reverse cond. prob.)
  • lsup(XgtY) sup(XY)/sup(Y) Prob(XY)
  • Exclusiveness

23
Application example
  • Identify groups of patients with high risk of
    adverse drug reaction to certain drugs.
  • Both those patients with adverse drug reactions
    and those taking the certain drugs are
    underrepresented.

24
Feature selection
  • Interpret the m classes as the independent
    variables.
  • The other variables are then the dependent
    variables.
  • One now has to decide which of the dependent
    variables have the strongest influence onto the
    independent ones.

25
Feature selection method
  • Calculate a statistical measure on the joint
    distribution of dependent and independent
    variables.
  • Compare the value of dependent-independent
    variable pairs to a certain cut-off value.

26
Feature selection method ?²
  • Bivariate analysis
  • Calculate the ?² value of the dependent and
    independent variables.
  • Compare to cut-off value for m-1 degrees of
    freedom at a required p value.

27
Feature selection method Logistic regression
  • Do a regression of the form
  • ln(p/(1-p)) aß1x1ß2x2...ßnxn
  • Use coefficients ßi to compare the odds-ratio
    ORießi to cutoff 1.

28
Data
  • Queensland Linked Data Set covers the period
    July 1995 to June 1999
  • De-identified patient level hospital separation
    data,
  • Medicare Benefits Scheme data, and
  • Pharmaceutical Benefits Scheme data.
  • Initially extracted variables age, gender,
    indigenous status, postcode, total number of bed
    days, and 8 hospital flags. From PBS 15 drug
    flags (ACE inhibitor scripts number, 14 other ATC
    level-1 drug flags).

29
Results of feature selection
  • Selected 15 most discriminating features Among
    them
  • Age
  • Gender
  • Hospital flags
  • Flags for exposure to other drugs
  • Selected data consists of 132000 records.

30
Results of data mining
Rule 1 Gender Female Age 60 Took genito
urinary system and sex hormone drugs Yes Took
Antineoplastic and immunimodulating agent drugs
Yes Took musculo-skeletal system drugs Yes
31
Results of data mining
Rule 2 Gender Female Had circulatory
disease Yes Took systemic hormonal
preparation drugs Yes Took musculo-skeletal
system drugs Yes Took various other drugs
Yes
32
Results of data mining
Rule 3 Gender Female Had circulatory
disease Yes Had respiratory disease Yes
Took systemic hormonal preparation drugs Yes
Took various other drugs Yes
33
(No Transcript)
34
Concluding remarks
  • Fair interestingness measures allow to find rules
    for underrepresented classes.
  • Allows to identify key areas in the data worthy
    of exploration and explanation.
  • Usage of IR leads to a compact selection of rules.

35
We have reached the end...
  • Any question?
Write a Comment
User Comments (0)
About PowerShow.com