Covering Algorithms

About This Presentation

Title:

Covering Algorithms

Description:

Antecedent contains a condition for every node on the path from ... But: decision tree inducer maximizes overall purity. Each new test reduces rule's coverage. ... – PowerPoint PPT presentation

Number of Views:16

Avg rating:3.0/5.0

Slides: 17

Provided by: alext8

Category:

more less

Transcript and Presenter's Notes

Title: Covering Algorithms

1
Covering Algorithms
2
Trees vs. rules

From trees to rules.
Easy converting a tree into a set of rules
One rule for each leaf
Antecedent contains a condition for every node on
the path from the root to the leaf
Consequent is the class assigned by the leaf
From rules to trees
More difficult transforming a rule set into a
tree
Tree cannot easily express disjunction between
rules
Example
If a and b then x
If c and d then x
Corresponding tree contains identical subtrees
(Þreplicated subtree problem)

3
A tree for a simple disjunction
4
Covering algorithms

Strategy for generating a rule set directly
for each class in turn find a rule set that
covers all instances in it (excluding instances
not in the class)
This approach is called a covering approach
because at each stage a rule is identified that
covers some of the instances

5
Example generating a rule

Possible rule set for class b
More rules could be added for perfect rule set
If x ? 1.2 then class b
If x gt 1.2 and y ? 2.6 then class b

6
A simple covering algorithm

Generates a rule by adding tests that maximize
rules accuracy
Similar to situation in decision trees problem
of selecting an attribute to split on.
But decision tree inducer maximizes overall
purity
Each new test reduces rules coverage.

7
Selecting a test

Goal maximizing accuracy
t total number of instances covered by rule
p positive examples of the class covered by rule
t-p number of errors made by rule
Þ Select test that maximizes the ratio p/t
We are finished when p/t 1 or the set of
instances cant be split any further

8
Example contact lenses data
9
Example contact lenses data
The numbers on the right show the fraction of
correct instances in the set singled out by
that choice. In this case, correct means that
their recommendation is hard.
10
Modified rule and resulting data
The rule isnt very accurate, getting only 4 out
of 12 that it covers. So, it needs further
refinement.
11
Further refinement
12
Modified rule and resulting data
Should we stop here? Perhaps. But lets say we
are going for exact rules, no matter how complex
they become. So, lets refine further.
13
Further refinement
14
The result
15
Pseudo-code for PRISM

For each class C
Initialize E to the instance set
While E contains instances in class C
Create a rule R with an empty left-hand side that
predicts class C
Until R is perfect (or there are no more
attributes to use) do
For each attribute A not mentioned in R, and each
value v,
Consider adding the condition A v to the
left-hand side of R
Select A and v to maximize the accuracy p/t
(break ties by choosing the condition with the
largest p)
Add A v to R
Remove the instances covered by R from E

16
Separate and conquer

Methods like PRISM (for dealing with one class)
are separate-and-conquer algorithms
First, a rule is identified
Then, all instances covered by the rule are
separated out
Finally, the remaining instances are conquered
Difference to divide-and-conquer methods
Subset covered by rule doesnt need to be
explored any further