Title: Artificial Intelligence 7. Decision trees
1Artificial Intelligence7. Decision trees
- Japan Advanced Institute of Science and
Technology (JAIST) - Yoshimasa Tsuruoka
2Outline
- What is a decision tree?
- How to build a decision tree
- Entropy
- Information Gain
- Overfitting
- Generalization performance
- Pruning
- Lecture slides
- http//www.jaist.ac.jp/tsuruoka/lectures/
3Decision treesChapter 3 of Mitchell, T., Machine
Learning (1997)
- Decision Trees
- Disjunction of conjunctions
- Successfully applied to a broad range of tasks
- Diagnosing medical cases
- Assessing credit risk of loan applications
- Nice characteristics
- Understandable to human
- Robust to noise
4A decision tree
Outlook
Sunny
Rain
Overcast
Humidity
Wind
Yes
High
Normal
Strong
Weak
No
Yes
No
Yes
5Classification by a decision tree
- Instance
- ltOutlook Sunny, Temperature Hot, Humidity
High, Wind Stronggt
Outlook
Sunny
Rain
Overcast
Humidity
Wind
Yes
High
Normal
Strong
Weak
No
Yes
No
Yes
6Disjunction of conjunctions
- (Outlook Sunny Humidity Normal)
- v (Outlook Overcast)
- v (Outlook Rain Wind Weak)
Outlook
Sunny
Rain
Overcast
Humidity
Wind
Yes
High
Normal
Strong
Weak
No
Yes
No
Yes
7Problems suited to decision trees
- Instanced are represented by attribute-value
pairs - The target function has discrete target values
- Disjunctive descriptions may be required
- The training data may contain errors
- The training data may contain missing attribute
values
8Training data
Day Outlook Temperature Humidity Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
9Which attribute should be tested at each node?
- We want to build a small decision tree
- Information gain
- How well a given attribute separates the training
examples according to their target classification - Reduction in entropy
- Entropy
- (im)purity of an arbitrary collection of examples
10Entropy
- If there are only two classes
- In general,
11Information Gain
- The expected reduction in entropy achieved by
splitting the training examples
12Example
13Coumpiting Information Gain
Humidity
Wind
High
Normal
Weak
Strong
14Which attribute is the best classifier?
15Splitting training data with Outlook
D1,D2,,D14 9,5-
Outlook
Sunny
Rain
Overcast
D1,D2,D8,D9,D11 2,3-
D3,D7,D12,D13 4,0-
D4,D5,D6,D10,D14 3,2-
Yes
?
?
16Overfitting
- Growing each branch of the tree deeply enough to
perfectly classify the training examples is not a
good strategy. - The resulting tree may overfit the training data
- Overfitting
- The tree can explain the training data very well
but performs poorly on new data
17Alleviating the overfitting problem
- Several approaches
- Stop growing the tree earlier
- Post-prune the tree
- How can we evaluate the classification
performance of the tree for new data? - The available data are separated into two sets of
examples a training set and a validation
(development) set
18Validation (development) set
- Use a portion of the original training data to
estimate the generalization performance.
Original training set
Training set
Validation set
Test set
Test set