Constructing Decision Trees - PowerPoint PPT Presentation

About This Presentation
Title:

Constructing Decision Trees

Description:

Gain(windy) = 0.048 bits. The Strategy for Selecting an Attribute to Place at a Node ... Gain(Outlook=sunny;Windy) = 0.971 0.951 = 0.02 ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 32
Provided by: Oya5
Learn more at: https://csie.org
Category:

less

Transcript and Presenter's Notes

Title: Constructing Decision Trees


1
Constructing Decision Trees
2
A Decision Tree Example
The weather data example.
ID code Outlook Temperature Humidity Windy Play
a b c d e f g h i j k l m n Sunny Sunny Overcast Rainy Rainy Rainy Overcast Sunny Sunny Rainy Sunny Overcast Overcast Rainy Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot Mild High High High High Normal Normal Normal High Normal Normal Normal High Normal High False True False False False True True False False False True True False True No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No
3
continues
Decision tree for the weather data.
4
The Process of Constructing a Decision Tree
  • Select an attribute to place at the root of the
    decision tree and make one branch for every
    possible value.
  • Repeat the process recursively for each branch.

5
Which Attribute Should Be Placed at a Certain Node
  • One common approach is based on the information
    gained by placing a certain attribute at this
    node.

6
Information Gained by Knowing the Result of a
Decision
  • In the weather data example, there are 9
    instances of which the decision to play is yes
    and there are 5 instances of which the decision
    to play is no. Then, the information gained by
    knowing the result of the decision is

7
The General Form for Calculating the Information
Gain
  • Entropy of a decision
  • P1, P2, , Pn are the probabilities of the n
    possible outcomes.

8
Information Further Required If Outlook Is
Placed at the Root
9
Information Gained by Placing Each of the 4
Attributes
  • Gain(outlook) 0.940 bits 0.693 bits
  • 0.247 bits.
  • Gain(temperature) 0.029 bits.
  • Gain(humidity) 0.152 bits.
  • Gain(windy) 0.048 bits.

10
The Strategy for Selecting an Attribute to Place
at a Node
  • Select the attribute that gives us the largest
    information gain.
  • In this example, it is the attribute Outlook.

11
The Recursive Procedure for Constructing a
Decision Tree
  • The operation discussed above is applied to each
    branch recursively to construct the decision
    tree.
  • For example, for the branch Outlook Sunny, we
    evaluate the information gained by applying each
    of the remaining 3 attributes.
  • Gain(OutlooksunnyTemperature) 0.971 0.4
    0.571
  • Gain(OutlooksunnyHumidity) 0.971 0 0.971
  • Gain(OutlooksunnyWindy) 0.971 0.951 0.02

12
  • Similarly, we also evaluate the information
    gained by applying each of the remaining 3
    attributes for the branch Outlook rainy.
  • Gain(OutlookrainyTemperature) 0.971 0.951
    0.02
  • Gain(OutlookrainyHumidity) 0.971 0.951
    0.02
  • Gain(OutlookrainyWindy) 0.971 0 0.971

13
The Over-fitting Issue
  • Over-fitting is caused by creating decision rules
    that work accurately on the training set based on
    insufficient quantity of samples.
  • As a result, these decision rules may not work
    well in more general cases.

14
Example of the Over-fitting Problem in Decision
Tree Construction
15
  • Hence, with the binary split, we gain more
    information.
  • However, if we look at the pessimistic error
    rate, i.e. the upper bound of the confidence
    interval of the error rate, we may get different
    conclusion.
  • The formula for the pessimistic error rate is
  • Note that the pessimistic error rate is a
    function of the confidence level used.

16
  • The pessimistic error rates under 95 confidence
    are

17
  • Therefore, the average pessimistic error rate at
    the children is
  • Since the pessimistic error rate increases with
    the split, we do not want to keep the children.
    This practice is called tree pruning.

18
Tree Pruning based on ?2 Test of Independence
  • We construct the corresponding contingency table

Ai0 Ai1
Yes 3 8 11
No 0 9 9
3 17 20
19
  • Therefore, we should not split the subroot node,
    if we require that the ?2 statistic must be
    larger than ?2k,0.05 , where k is the degree of
    freedom of the corresponding contingency table.

20
Constructing Decision Trees based on ?2 test of
Independence
  • Using the following example, we can construct a
    contingency table accordingly.

75 Yess out of100 samplesPrediction Yes
Ai2
Ai0
Ai1
45 Yess out of50 samples
20 Yess out of 25 samples
10 Yess out of25 samples
21
  • Therefore, we may say that the split is
    statistically robust.

22
Assume that we have another attribute Aj to
consider
Aj0 Aj1
Yes 25 50 75
No 0 25 25
25 75 100
23
  • Now, both Ai and Aj pass our criterion. How
    should we make our selection?
  • We can make our selection based on the
    significance levels of the two contingency
    tables.

24
  • Therefore, Ai is preferred over Aj.

25
Termination of Split due to Low Significance level
  • If a subtree is as follows
  • ?2 4.543 lt 5.991
  • In this case, we do not want to carry out the
    split.

26
A More Realistic Example and Some Remarks
  • In the following example, a bank wants to derive
    a credit evaluation tree for future use based on
    the records of existing customers.
  • As the data set shows, it is highly likely that
    the training data set contains inconsistencies.
  • Furthermore, some values may be missing.
  • Therefore, for most cases, it is impossible to
    derive perfect decision trees, i.e. decision
    trees with 100 accuracy.

27
continues
Attributes Attributes Attributes Attributes Attributes Class
Education Annual Income Age Own House Sex Credit ranking
College High Old Yes Male Good
High school ----- Middle Yes Male Good
High school Middle Young No Female Good
College High Old Yes Male Poor
College High Old Yes Male Good
College Middle Young No Female Good
High school High Old Yes Male Poor
College Middle Middle ----- Female Good
High school Middle Young No Male Poor
28
continues
  • A quality measure of decision trees can be based
    on the accuracy. There are alternative measures
    depending on the nature of applications.
  • Overfitting is a problem caused by making the
    derived decision tree work accurately for the
    training set. As a result, the decision tree may
    work less accurately in the real world.

29
continues
  • There are two situations in which overfitting may
    occur
  • insufficient number of samples at the subroot.
  • some attributes are highly branched.
  • A conventional practice for handling missing
    values is to treat them as possible attribute
    values. That is, each attribute has one
    additional attribute value corresponding to the
    missing value.

30
Alternative Measures of Quality of Decision Trees
  • The recall rate and precision are two widely used
    measures.
  • where C is the set of samples in the class and C
    is the set of samples which the decision tree
    puts into the class.

31
continues
  • A situation in which the recall rate is the main
    concern
  • A bank wants to find all the potential credit
    card customers.
  • A situation in which precision is the main
    concern
  • A bank wants to find a decision tree for credit
    approval.
Write a Comment
User Comments (0)
About PowerShow.com