Title: Extending Nave Bayes Classifiers Using Long Itemsets
1Extending Naïve Bayes Classifiers Using Long
Itemsets
- Dimitris Meretakis and Beat Wuthrich
- Computer Science Department
- Hong Kong University of Science and Technology
2Introduction
- Intuition Association Mining reveals local
properties of the data but has not dealt with
using the local patterns for classification. Why
not use the discovered local patterns for
classification also? - Discovered itemsets describe strong patterns in
the data but provide no class-specific
information - pregnantlt6.5, age lt 28.5 (47.3)
- Use labeled itemsets
- pregnantlt6.5, age lt 28.5 (diabetes 39.5,
no-diabetes 7.8) - P(pregnantlt6.5, age lt 28.5, diabetes) 39.5
- P(pregnantlt6.5, age lt 28.5, no-diabetes) 7.8
3Large Bayes An overview
- A classifier built from long (large) labeled
itemsets. - Learning Use an Apriori-like method to discover
some labeled itemsets. - No classification model is built. Store the raw
itemsets. - In between Lazy and Eager learning.
- Classification Given a new case Aa1, a2,...,
an, estimate P(ciA) for each class ci and
choose the most probable class. Probabilistically
combine the stored itemsets for the estimation - e.g. P(a1a2a3a4a5ci) P(a1a2a3ci)P(a4a2ci)P(a5a
3ci) - Large Bayes Because it reduces to Naïve Bayes
when only 1-itemsets are discovered and used - P(a1a2a3a4a5ci) P(a1ci)P(a2ci)P(a3ci)P(a4ci)P
(a5ci)
4Large Bayes Learning Phase
- Generate a set of frequent, interesting and
preferably long labeled itemsets. - frequent support gt user defined minimum
threshold - interesting Their support cannot be accurately
approximated by their direct subsets. - long to discover higher order interactions.
- Use an association miner (e.g. Apriori) to
discover the itemsets - 1. Discover all 1-itemsets
- 2. Generate promising 2-itemsets and select the
most frequent and interesting - 3. Use selected 2-itemsets to generate some
3-itemsets - 4. Repeat until no more are generated.
5Interestingness Measure
- la1,...,an is interesting if P(l,c?) cannot be
approximated by subsets of l. Quantification in
two steps
- Itemset l is interesting if I(l) gt user defined
interestingness threshold.
6Large Bayes Learned classifier
7Large Bayes Classification Phase
- Given a new case A to be classified
- Select longest subsets of A among the stored
itemsets - Incrementally construct the approximation of
P(A,ci) adding one itemset at a time. - Select the most probable class and assign it to A
A a1, a3, a7, a9, a11
P(a1,a3,a7,a9,a11,ci) P(ci)P(a1, a11 ci)
P(a3 a11,ci) P(a7 a3, ci)P(a9 a7, a11,
ci)
P(A,c1)gtP(A,c2) þ AÎ c1
8Constructing a product Approximation
- Idea Approximate longer marginals using the
stored shorter ones. - Condition 1.
- Do not allow cycles E.g. P(a1a2a3ci)P(a4a1ci)P(a
2a4ci) WRONG - Condition 2.
- Maximize number of itemsets used for the
approximation / Reduce independence assumptions
E.g. - P(a1a2a3ci)P(a4a1ci)P(a5a4ci) better than
P(a1a2a3ci)P(a4a5ci) - Condition 3.
- Prefer higher order interactions
- Condition 4.
- Prefer most interesting itemsets
9Classification An example
a1a2 a1 a2 a3 a4 a5
a3a4a5 a1a2a1aa1a2 a1a33 a1a43 a1a5
a2a33 a2a4 a2a5 a3a4 a3a5
a4a5 a1a2a3 a1a2a4 a1a2a5 a1a3a4 a1a3a5
a1a4a5 a2a3a4 a2a3a5 a2a4a5
a3a4a5 a1a2a3a4 a1a2a3a5 a1a2a4a5
a1a3a4a5 a2a3a4a5 a1a2a3a4a5
10LB - Creating local models
Approximation for the classification of
a1,a3,a7,a9,a11 P(a1,a3,a7,a9,a11,ci)
P(ci)P(a1, a11 ci)P(a3 a11,ci) P(a7 a3,
ci)P(a9 a7, a11, ci)
a9
a11
a3
a1
a7
Equivalent network of local assumptions (in the
context of a1,a3,a7,a9,a11 )
11Local (LB)-Global (TAN) Independencies.
Local Network by LB
Global Network by TAN
To classify case lt-6.5, 99.5, -27.35, 0.5285,
-28.5, posgt or equivalent lta1, a3, a7, a9,
a11,c1gt
12Experiments Accuracy
13Effect of varying Interestingness threshold
14CPU time for learning and classifying
15Future Work
- Employ LB in highly-dimensional problem spaces
(Deal with Aprioris performance degradation in
such domains) - Come up with an evaluation of the interestingness
measure / Product approximation heuristic. - Class-specific product approximations.
- Eliminate interestingness threshold.