Artificial Intelligence 7. Decision trees - PowerPoint PPT Presentation

About This Presentation
Title:

Artificial Intelligence 7. Decision trees

Description:

Artificial Intelligence 7. Decision trees Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka Outline What is a decision tree? – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 19
Provided by: tsu53
Category:

less

Transcript and Presenter's Notes

Title: Artificial Intelligence 7. Decision trees


1
Artificial Intelligence7. Decision trees
  • Japan Advanced Institute of Science and
    Technology (JAIST)
  • Yoshimasa Tsuruoka

2
Outline
  • What is a decision tree?
  • How to build a decision tree
  • Entropy
  • Information Gain
  • Overfitting
  • Generalization performance
  • Pruning
  • Lecture slides
  • http//www.jaist.ac.jp/tsuruoka/lectures/

3
Decision treesChapter 3 of Mitchell, T., Machine
Learning (1997)
  • Decision Trees
  • Disjunction of conjunctions
  • Successfully applied to a broad range of tasks
  • Diagnosing medical cases
  • Assessing credit risk of loan applications
  • Nice characteristics
  • Understandable to human
  • Robust to noise

4
A decision tree
  • Concept PlayTennis

Outlook
Sunny
Rain
Overcast
Humidity
Wind
Yes
High
Normal
Strong
Weak
No
Yes
No
Yes
5
Classification by a decision tree
  • Instance
  • ltOutlook Sunny, Temperature Hot, Humidity
    High, Wind Stronggt

Outlook
Sunny
Rain
Overcast
Humidity
Wind
Yes
High
Normal
Strong
Weak
No
Yes
No
Yes
6
Disjunction of conjunctions
  • (Outlook Sunny Humidity Normal)
  • v (Outlook Overcast)
  • v (Outlook Rain Wind Weak)

Outlook
Sunny
Rain
Overcast
Humidity
Wind
Yes
High
Normal
Strong
Weak
No
Yes
No
Yes
7
Problems suited to decision trees
  • Instanced are represented by attribute-value
    pairs
  • The target function has discrete target values
  • Disjunctive descriptions may be required
  • The training data may contain errors
  • The training data may contain missing attribute
    values

8
Training data
Day Outlook Temperature Humidity Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
9
Which attribute should be tested at each node?
  • We want to build a small decision tree
  • Information gain
  • How well a given attribute separates the training
    examples according to their target classification
  • Reduction in entropy
  • Entropy
  • (im)purity of an arbitrary collection of examples

10
Entropy
  • If there are only two classes
  • In general,

11
Information Gain
  • The expected reduction in entropy achieved by
    splitting the training examples

12
Example
13
Coumpiting Information Gain
Humidity
Wind
High
Normal
Weak
Strong
14
Which attribute is the best classifier?
  • Information gain

15
Splitting training data with Outlook
D1,D2,,D14 9,5-
Outlook
Sunny
Rain
Overcast
D1,D2,D8,D9,D11 2,3-
D3,D7,D12,D13 4,0-
D4,D5,D6,D10,D14 3,2-
Yes
?
?
16
Overfitting
  • Growing each branch of the tree deeply enough to
    perfectly classify the training examples is not a
    good strategy.
  • The resulting tree may overfit the training data
  • Overfitting
  • The tree can explain the training data very well
    but performs poorly on new data

17
Alleviating the overfitting problem
  • Several approaches
  • Stop growing the tree earlier
  • Post-prune the tree
  • How can we evaluate the classification
    performance of the tree for new data?
  • The available data are separated into two sets of
    examples a training set and a validation
    (development) set

18
Validation (development) set
  • Use a portion of the original training data to
    estimate the generalization performance.

Original training set
Training set
Validation set
Test set
Test set
Write a Comment
User Comments (0)
About PowerShow.com