Title: Rule induction: Ross Quinlan's ID3 algorithm
1Rule inductionRoss Quinlan's ID3 algorithm
- Fredda Weinberg
- CIS 718X
- Fall 2005
- Professor Kopec
- Assignment 3
2The learning problem
- You are presented with the data.
- You have a supervised learning problem (that is,
a target variable). - In practice, there is no such thing as the
correct model. - You are looking for a best approximating model.
- There is no reason to think that linear models
provide the best approximating model. - SPSS CLementine Users Group
3Terms
- General
- Decision trees.
- Recursive partitioning -- Apply the same
splitting rule to smaller and smaller partitions
of the sample space. - Classification
- Tree-based classification.
- Classification trees.
- ibid
4Rule induction
- 1. For each attribute, compute its entropy with
respect to the conclusion - 2. Select the attribute (say A) with lowest
entropy. - 3. Divide the data into separate sets so that
within a set, - A has a fixed value (eg Colorgreen eye
color in one set, Colorbrown in another, etc). - 4. Build a tree with branches
- if Aa1 then ... (subtree1)
- if Aa2 then ... (subtree2)
- ...etc...
- 5. For each subtree, repeat this process from
step 1. - 6. At each iteration, one attribute gets
removed from consideration. The process stops
when there are no attributes left to consider, or
when all the data being considered in a subtree
have the same value for the conclusion (eg they
all say Conclusionsafe from sunburn). - Rule induction Ross Quinlan's ID3 algorithm
5Iterative Dichotomizer
The rule induction algorithm was first used by
Hunt in his CLS (concept learning system) in
1962. Then, with extensions for handling numeric
data too, it was used by Ross Quinlan for his ID3
system in 1979. Quinlan's ID3 tried to cut down
on effort by inducing a set of rules from a small
subset of data, and then testing to see if those
rules explained other data. Data not explained
were then added to the chosen subset, and new
rules induced. This process continued until all
the data was accounted for. The letters ID stood
for iterative dichotomiser', a fancy name for
this simple algorithm. Rule induction Ross
Quinlan's ID3 algorithm
6Entropy
- Entropy Si -pi log2 pi
- Information-theoretic criterion Minimum number
of bits needed to encode the classification of an
arbitrary case. - Ranges from 0 to 1.
- 0 if p is concentrated in one class.
- Maximal if p is uniform across classes.
- Entropy gain is reduction in entropy after split.
Interpretation Number of bits saved when
encoding the target value with knowledge of the
predictor. - Entropy gain is biased in favor of attributes
with many values. Gain ratio discourages the
selection of attributes with many uniformly
distributed values. - SPSS CLementine Users Group
7Tech Support toy database is it the equipment or
the commander?
Decision Trees by Computational Intelligence
8The Decision Tree produced by the training data
9Testing with new examples Predictions
10Applications
- Predicting Magnetic Properties of Crystals
- Profiling High Income Earners from Census Data
- Assessing Churn Risk
- Detecting Advertisements on the Web
- Identifying Spam
- Diagnosing Hypothyroidism