Data Mining using Decision Trees - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Data Mining using Decision Trees

Description:

5 large green pillar yes. 6 large red pillar no. 7 large green ... (SHAPE = wedge) OR (SHAPE = pillar AND. COLOUR = red) ))) OR (SIZE = small AND SHAPE = wedge) ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 27
Provided by: jfbal
Category:

less

Transcript and Presenter's Notes

Title: Data Mining using Decision Trees


1
Data Mining using Decision Trees
  • Professor J. F. Baldwin

2
Decision Trees from Data Base
Ex Att Att Att Concept Num Size Colour Shape
Satisfied 1 med blue brick yes 2 small red
wedge no 3 small red sphere yes 4 large red
wedge no 5 large green pillar yes 6 large red
pillar no 7 large green sphere yes
Choose target Concept satisfied Use all
attributes except Ex Num
3
CLS - Concept LearningSystem - Hunt et al.
Tree Structure
Node with mixture of ve and -ve examples
Parent node
Attribute V
v1
v2
v3
Children nodes
4
CLS ALGORITHM
1. Initialise the tree T by setting it to consist
of onenode containing all the examples, both ve
and -ve,in the training set 2. If all the
examples in T are ve, create a YES node and
HALT 3. If all the examples in T are -ve,
create a NO node and HALT 4. Otherwise, select
an attribute F with values v1, ..., vn Partition
T into subsets T1, ..., Tn according to the
values on F. Create branches with F as parent
and T1, ..., Tn as child nodes. 5. Apply the
procedure recursively to each child node
5
Data Base Example
Using attribute SIZE
1, 2, 3, 4, 5, 6, 7 SIZE
med
large
small
1
2, 3
4, 5, 6, 7
Expand
Expand
YES
6
Expanding
1, 2, 3, 4, 5, 6, 7 SIZE
large
med
small
1
2, 3 COLOUR
4, 5, 6, 7 SHAPE
wedge
sphere
YES
2, 3 SHAPE
pillar
wedge
7
4
5, 6 COLOUR
sphere
3
2
red
green
Yes
No
6 No
5 Yes
no
yes
7
Rules from Tree
IF (SIZE large AND ((SHAPE wedge)
OR (SHAPE pillar AND
COLOUR red) ))) OR (SIZE small AND SHAPE
wedge) THEN NO IF (SIZE large AND
((SHAPE pillar) AND COLOUR green)
OR SHAPE sphere) ) OR (SIZE
small AND SHAPE sphere) OR (SIZE
medium) THEN YES
8
Disjunctive Normal Form - DNF
IF (SIZE medium) OR (SIZE small AND SHAPE
sphere) OR (SIZE large AND SHAPE
sphere) OR (SIZE large AND SHAPE pillar
AND
COLOUR green THEN CONCEPT satisfied ELSE
CIONCEPT not satisfied
9
ID3 - Quinlan
Attributes are chosen in any order for the CLS
algorithm. This can result in large decision
trees if the ordering is not optimal. Optimal
ordering would result in smallest decision Tree.
No method is known to determine optimal
ordering. We use a heuristic to provide efficient
ordering which will result in near optimal
ordering
ID3 CLS efficient ordering of attributes
Entropy is used to order the attributes.
10
Entropy
For random variable V which can take values v1,
v2, , vn with Pr(vi) pi, all i, the entropy
of V is given by
Entropy for a fair dice
1.7917
Entropy for fair dice with even score
1.0986
Differences between entropies
Information gain 1.7917 - 1.0986 0.6931
11
Attribute Expansion
Expand attribute Ai -
other attributes
Ai
T
Pr
Equally likely unless specified
Pr(A1, Ai, An, T)
Attributes Except Ai
aim
ai1
T
Pr
T
Pr(A1, Ai-1, Ai1, An, T Ai ai1)
Pass probabilities corresponding to ai1 from
above and re-normalise -equally likely again if
previous equally likely
12
Expected Entropy for an Attribute
Attribute Ai and target T -
Ai
T
Pr
Pass probabilities corresponding to tk from
above for ai1and re-normalise
aim
ai1
Pr
T
T
Pr
Pr(T Aiaim)
S(ai2)
S(aim)
S(ai1)
Expected Entropy for Ai
13
How to choose attribute and Information gain
Determine expected entropy for each
attribute i.e. S(Ai), all i
Choose s such that
Expand attribute As
By choosing attribute As the information gain
is S - S(As) where
where
Minimising expected entropy is equivalent to
maximising Information gain
14
Previous Example
Ex Att Att Att Concept Num Size Colour Shape
Satisfied 1 med blue brick yes
1/7 2 small red wedge no
1/7 3 small red sphere yes
1/7 4 large red wedge no
1/7 5 large green pillar yes
1/7 6 large red pillar no
1/7 7 large green sphere yes 1/7
Pr
Concept satisfied
Pr
S (4/7)Log(4/7) (3/7)Log(3/7) 0.99
yes no
4/7 3/7
15
Entropy for attribute Size
Att Concept Size Satisfied med yes
1/7 small no 1/7 small yes
1/7 large no 2/7 large yes 2/7
Pr
S(Size) (2/7)1 (1/7)0 (4/7)1 6/7 0.86
Information Gain for Size 0.99 - 0.86 0.13
Pr(large) 4/7
Pr(small) 2/7
large
small
Concept Satisfied no 1/2 yes 1/2
Pr
Concept Satisfied no 1/2 yes 1/2
Pr
med
Pr(med) 1/7
Concept Satisfied yes 1
S(large) 1
Pr
S(small) 1
S(med) 0
16
First Expansion
Attribute Information Gain SIZE 0.13 COLOUR 0.52
SHAPE 0.7
max
choose
1, 2, 3, 4, 5, 6, 7
SHAPE
sphere
wedge
pillar
brick
5, 6
2, 4
1
3, 7
Expand
YES
NO
YES
17
Complete Decision Tree
1, 2, 3, 4, 5, 6, 7
Rule IF Shape is wedge OR Shape is
brick OR Shape is pillar AND Colour is
red OR Shape is sphere THEN NO ELSE YES
SHAPE
sphere
wedge
pillar
brick
5, 6
2, 4
1
3, 7
COLOUR
YES
NO
YES
green
red
5
6)
YES
NO
18
A new case
Att Att Att Concept Size Colour Shape Sati
sfied med red pillar ?
SHAPE
pillar
COLOUR
red
? NO
19
Post Pruning
Any Node S
N examples in node
Let C be class with most examples i.e majority
E(S)
n cases of C
C is one of YES, NO
Suppose we terminate this node and make it a leaf
with classification C. What will be the expected
error, E(S), if we use the tree for new cases and
we reach this node. E(S) Pr(class of new case
is a class ? C)
20
Bayes Updating for Post Pruning
Let p denote probability of class C for new case
arriving at S We do not know p. Let f(p) be a
prior probability distribution for p on 0, 1.
We can update this prior using Bayes
updating with the information at node S. The
information at node S is
n C in S
Pr(n C in S p) f(p)
f(p n in S)
1
?
Pr(n C in S p) f(p)dp
0
21
Mathematics of Post Pruning
Assume f(p) to be uniform over 0, 1
The evaluation of the integral
n
N n
p (1-p)
1
f(p n C in S)
a
b
?
dx

x (1-x)
1
n
N n
0
?
p (1-p)
dp
n! (N n 1)!
0
(N 2)!
using Beta Functions
E(S) E (1 p)
f(p n C in S)
n
N n 1
?
N n 1
p (1-p)
dp
using Beta Functions.
E(S)

N 2
1
n
N n
?
p (1-p)
dp
0
22
Post Pruning for Binary Case
For leaf nodes Si Error(Si) E(Si)
Error(S) MIN
S


E(S) BackUpError(S)
Num of examples in Si
Pm
Pi
Num of examples in S
P2
P1
E(S) BackUpError(S)
S1
S2
Sm
Error(Sm)
Error(S2)
Error(S1)
For any node S which is not a leaf node we can
calculate BackUpError(S) Pi Error(Si)
Decision Prune at S if BackUpError(S) Error(S)
?
i
23
Example of Post Pruning
x, y means x YES cases and y NO cases
Before Pruning
a
0.417 0.378
6, 4
We underline Error(Sk)
c
0.5 0.383
b
0.375 0.413
2, 2
PRUNE
4, 2
1, 0 0.333
3, 2 0.429
1, 0 0.333
d
0.4 0.444
PRUNE
1, 2
PRUNE means cut the sub- tree below this point
1, 1 0.5
0, 1 0.333
24
Result of Pruning
After Pruning
a
6, 4
c
4, 2
2, 2
1, 0
1, 2
25
Generalisation
For the case in which we have k classes the
generalisation for E(S) is
N n k 1
E(S)

N k
Otherwise, pruning method is the same.
26
Testing
DataBase
Learn rules using Training Set and Prune Test
rules on this set and record correct Test
rules on Test Set record correct
Training Set
Test Set
accuracy on test set should be close to that of
training set. This indicates good generalisation
Over-fitting can occur if noisy data is used or
too specific attributes are used. Pruning will
overcome noise to some extent but not
completely. Too specific attributes must be
dropped.
Write a Comment
User Comments (0)
About PowerShow.com