Iterative Dichotomiser ID3 Algorithm - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Iterative Dichotomiser ID3 Algorithm

Description:

Nothing. The target classification is 'Outcome' which can be 'Responded' or 'Nothing' ... Nothing. Detailed Calculation for Gain(S, District) ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 21
Provided by: phuong
Category:

less

Transcript and Presenter's Notes

Title: Iterative Dichotomiser ID3 Algorithm


1
Iterative Dichotomiser (ID3) Algorithm
  • By Phuong H. Nguyen
  • Professor Lee, Sin-Min
  • Course CS 157B
  • Section 2
  • Date 05/08/07
  • Spring 2007

2
Overview
  • Introduction
  • Entropy
  • Information Gain
  • Detailed Example Walkthrough
  • Conclusion
  • References

3
Introduction
  • ID3 algorithm is a greedy algorithm for decision
    tree construction developed by Ross Quinlan in
    1987.
  • ID3 algorithm uses information gain to select
    best attribute as root node or decision nodes
  • Max-Gain approach (highest information gain) for
    splitting

4
Entropy
  • Measure the impurity or randomness of an example
    collection.
  • A quantitative measurement of the homogeneity of
    a set of examples.
  • Basically, it tells us how random the given
    examples are according to the target
    classification class.

5
Entropy (cont.)
  • Entropy (S) -Ppositive log2Ppositive
    Pnegative log2Pnegative
  • Where
  • - Ppositive proportion of positive examples
  • Pnegative proportion of negative examples
  • Example
  • If S is a collection of 14 examples with 9 YES
    and 5 NO, then
  • Entropy(S) - (9/14) log2 (9/14) - (5/14) log2
    (5/14) 0.940

6
Entropy (cont.)
  • More than two classification classes
  • Entropy(S) ? -p(i) log2 p(i)
  • Result for any entropy calculation will be
    between 0 and 1.
  • Two special cases

If Entropy(S) 1(max value) members are split
equally between the two classes (min uniformity,
max randomness)
If Entropy(S) 0 all members in S belong to
strictly one class (max uniformity, min
randomness)
7
Information Gain
  • A statistical property measures how well a given
    attribute separates example collection into
    target classes.
  • ID3 algorithm uses Max-Gain approach (highest
    information gain) to select best attribute for
    root node and decision nodes.

8
Information Gain (cont.)
  • Gain(S, A) Entropy(S) ?((Sv / S)
    Entropy(Sv))
  • Where
  • A is an attribute of collection S
  • Sv subset of S for which attribute A has value
    v
  • Sv number of elements in Sv
  • S number of elements in S

9
Information Gain (cont.)
  • Example
  • Collection S 14 examples (9 YES - 5 NO)
  • Wind speed is one attribute of S Weak, Strong
  • Weak 8 occurrences (6 YES - 2 NO)
  • Strong 6 occurrences (3 YES - 3 NO)
  • Calculation
  • Entropy(S) - (9/14) log2 (9/14) - (5/14) log2
    (5/14) 0.940
  • Entropy(Sweak) - (6/8)log2(6/8) -
    (2/8)log2(2/8) 0.811
  • Entropy(Sstrong) - (3/6)log2(3/6) -
    (3/6)log2(3/6) 1.00
  • Gain(S,Wind) Entropy(S) - (8/14)Entropy(Swea
    k) - (6/14)Entropy(Sstrong)
  • 0.940 - (8/14)0.811 - (6/14)1.00
  • 0.048
  • Then for each attribute in S, the information
    gain is calculated in the same way.
  • The highest gain attribute is used in the root
    node or decision node.

10
Example Walkthrough
  • An example of a company sending out some
    promotions to various houses and recording a few
    facts about each house and also whether people
    responded or not

11
Example Walkthrough (cont.)
The target classification is Outcome which can
be Responded or Nothing. The attributes in
collection are District, House Type, Income,
Previous Customer, and Outcome. They have the
following values - District Suburban, Rural,
Urban - House Type Detached, Semi-detached,
Terrace - Income High, Low - Previous
Customer No, Responded - Outcome Nothing,
Responded
12
Example Walkthrough (cont.)
Detailed Calculation for Gain(S,
District) Entropy (S 9/14 responses, 5/14 no
responses) -9/14 log2 9/14 - 5/14 log2
5/14 0.40978 0.5305
0.9403 Entropy(SDistrict Suburban 2/5
responses, 3/5 no responses) -2/5 log2 2/5
3/5 log2 3/5 0.5288 0.4422
0.9709 Entropy(SDistrict Rural 4/4
responses, 0/4 no responses) -4/4 log2
4/4 0 Entropy(SDistrict Urban 3/5
responses, 2/5 no responses) -3/5 log2 3/5
2/5 log2 2/5 0.4422 0.5288
0.9709 Gain(S, District) Entropy(S) ((5/14)
Entropy(SDistrict Suburban) (5/14)
Entropy(SDistrict Urban) (4/14)
Entropy(SDistrict Rural)) 0.9403
((5/14)0.9709 (5/14)0 (4/14)0.9709)
0.9403 0.3468 0 0.34678 0.2468
13
Example Walkthrough (cont.)
  • So we now have Gain(S, District) 0.2468
  • Apply the same process to the remaining 3
    attributes of S, we get
  • - Gain(S,House Type) 0.049
  • - Gain(S,Income) 0.151
  • - Gain(S,Previous Customer) 0.048
  • Comparing the information gain of the four
    attributes, we see that District has the
    highest value.
  • District will be the root node of the decision
    tree.
  • So far the decision tree will look like
    following

District
Suburban
Urban
Rural
???
???
???
14
Example Walkthrough (cont.)
  • Apply the same process to the left side of the
    root node (Suburban), we get
  • - Entropy(Ssuburban) 0.970
  • - Gain(Ssuburban,House Type) 0.570
  • - Gain(Ssuburban,Income) 0.970
  • - Gain(Ssuburban,Previous Customer) 0.019
  • The information gain of Income is highest
  • Income will be the decision node.
  • Then decision tree will look like following

District
Suburban
Urban
Rural
Income
???
???
15
Example Walkthrough (cont.)
For the center of the root node (Rural), it is a
special case because - Entropy(SRural)
0 ? all members in SRural belong to strictly
one target classification class, which
is Responded Thus, we skip all the
calculation and add the corresponding target
classification value to the tree. Then decision
will look like following
District
Suburban
Urban
Rural
Income
Responded
???
16
Example Walkthrough (cont.)
  • Apply the same process to the right side of the
    root node (Urban), we get
  • Entropy(Surban) 0.970
  • Gain(Surban,House Type) 0.019
  • Gain(Surban,Income) 0.019
  • Gain(Surban,Previous Customer) 0.970
  • The information gain of Previous Customer
  • is highest
  • Previous Customer will be the decision node.
  • Then decision tree will look like following

District
Suburban
Urban
Rural
Income
Previous Customer
Responded
17
For Income side, we have High ? Nothing
(3/3) ? Entropy 0 and Low ? Responded (2/2)
? Entropy 0 For Previous Customer side, we
have No ? Responded (3/3) ? Entropy 0 and
Yes ? Nothing (2/2) ? Entropy 0 ?
No longer need to split the tree therefore, the
final decision tree will look like following
District
Suburban
Urban
Rural
Income
Previous Customer
Responded
High
Low
No
Yes
Responded
Responded
Nothing
Nothing
18
District
Suburban
Urban
Rural
Income
Previous Customer
Responded
High
Low
No
Yes
Responded
Responded
Nothing
Nothing
  • From the above decision tree, some rules can be
    extracted
  • Examples
  • (District Suburban) AND (Income Low) ?
    (Outcome Responded)
  • (District Rural) ? (Outcome Responded)
  • (District Urban) AND (Previous Customer Yes)
    ? (Outcome Nothing)
  • and so on

19
Conclusion
  • ID3 algorithm is easy to implement if we know how
    it works.
  • ID3 algorithm is one of the most important
    techniques in data mining.
  • Industry has shown that ID3 algorithm has been
    effective for data mining.

20
References
  • Dr. Lees Slides, San Jose State University,
    Spring 2007, http//www.cs.sjsu.edu/7Elee/cs157b/
    cs157b.html
  • "Building Decision Trees with the ID3 Algorithm",
    by Andrew Colin, Dr. Dobbs Journal, June 1996
  • "Incremental Induction of Decision Trees", by
    Paul E. Utgoff, Kluwer Academic Publishers, 1989
  • http//www.cise.ufl.edu/ddd/cap6635/Fall-97/Short
    -papers/2.htm
  • http//decisiontrees.net/node/27
Write a Comment
User Comments (0)
About PowerShow.com