Title: Inductive Learning (1/2) Decision Tree Method
1Inductive Learning (1/2)Decision Tree Method
- Russell and Norvig Chapter 18, Sections 18.1
through 18.4 - Chapter 18, Sections 18.1 through 18.3
- CS121 Winter 2003
2Quotes
- Our experience of the world is specific, yet we
are able to formulate general theories that
account for the past and predict the future
Genesereth and Nilsson, Logical Foundations of
AI, 1987 - Entities are not to be multiplied without
necessityOckham, 1285-1349
3Learning Agent
Critic
Percepts
Problem solver
Learning element
KB
Actions
4Contents
- Introduction to inductive learning
- Logic-based inductive learning
- Decision tree method
- Version space method
- Function-based inductive learning
- Neural nets
5Contents
- Introduction to inductive learning
- Logic-based inductive learning
- Decision tree method
- Version space method
- why inductive learning works
- Function-based inductive learning
- Neural nets
1
2
3
6Inductive Learning Frameworks
- Function-learning formulation
- Logic-inference formulation
7Function-Learning Formulation
- Goal function f
- Training set (xi, f(xi)), i 1,,n
- Inductive inference Find a function h that
fits the point well
? Neural nets
8Logic-Inference Formulation
- Background knowledge KB
- Training set D (observed knowledge) such that
KB D and KB, D is satisfiable - Inductive inference Find h (inductive
hypothesis) such that - KB, h is satisfiable
- KB,h D
h D is a trivial,but uninteresting solution
(data caching)
9Rewarded Card Example
- Deck of cards, with each card designated by
r,s, its rank and suit, and some cards
rewarded - Background knowledge KB ((r1) v v (r10)) ?
NUM(r)((rJ) v (rQ) v (rK)) ? FACE(r)((sS) v
(sC)) ? BLACK(s)((sD) v (sH)) ? RED(s) - Training set DREWARD(4,C) ? REWARD(7,C) ?
REWARD(2,S) ?
?REWARD(5,H) ? ?REWARD(J,S)
10Rewarded Card Example
- Background knowledge KB ((r1) v v (r10)) ?
NUM(r)((rJ) v (rQ) v (rK)) ? FACE(r)((sS) v
(sC)) ? BLACK(s)((sD) v (sH)) ? RED(s) - Training set DREWARD(4,C) ? REWARD(7,C) ?
REWARD(2,S) ?
?REWARD(5,H) ? ?REWARD(J,S) - Possible inductive hypothesish ? (NUM(r) ?
BLACK(s) ? REWARD(r,s))
There are several possible inductive hypotheses
11Learning a Predicate
- Set E of objects (e.g., cards)
- Goal predicate CONCEPT(x), where x is an object
in E, that takes the value True or False (e.g.,
REWARD)
- Example CONCEPT describes the precondition of
an action, e.g., Unstack(C,A) - E is the set of states
- CONCEPT(x) ? HANDEMPTY?x, BLOCK(C) ?x, BLOCK(A)
?x, CLEAR(C) ?x, ON(C,A) ?x - Learning CONCEPT is a step toward learning the
action
12Learning a Predicate
- Set E of objects (e.g., cards)
- Goal predicate CONCEPT(x), where x is an object
in E, that takes the value True or False (e.g.,
REWARD) - Observable predicates A(x), B(X), (e.g., NUM,
RED) - Training set values of CONCEPT for some
combinations of values of the observable
predicates
13A Possible Training Set
Ex. A B C D E CONCEPT
1 True True False True False False
2 True False False False False True
3 False False True True True False
4 True True True False True True
5 False True True False False False
6 True True False True True False
7 False False True False True False
8 True False True False True True
9 False False False True True False
10 True True True True False True
Note that the training set does not say whether
an observable predicate A, , E is pertinent or
not
14Learning a Predicate
- Set E of objects (e.g., cards)
- Goal predicate CONCEPT(x), where x is an object
in E, that takes the value True or False (e.g.,
REWARD) - Observable predicates A(x), B(X), (e.g., NUM,
RED) - Training set values of CONCEPT for some
combinations of values of the observable
predicates - Find a representation of CONCEPT in the form
CONCEPT(x) ? S(A,B, )where
S(A,B,) is a sentence built with the observable
predicates, e.g. CONCEPT(x) ? A(x)
? (?B(x) v C(x))
15Learning the concept of an Arch
ARCH(x) ? HAS-PART(x,b1) ? HAS-PART(x,b2) ?
HAS-PART(x,b3) ? IS-A(b1,BRICK) ?
IS-A(b2,BRICK) ? ?MEET(b1,b2) ?
(IS-A(b3,BRICK) v
IS-A(b3,WEDGE)) ?
SUPPORTED(b3,b1) ? SUPPORTED(b3,b2)
16Example set
- An example consists of the values of CONCEPT and
the observable predicates for some object x - A example is positive if CONCEPT is True, else
it is negative - The set E of all examples is the example set
- The training set is a subset of E
17Hypothesis Space
- An hypothesis is any sentence h of the form
CONCEPT(x) ? S(A,B, )where S(A,B,) is
a sentence built with the observable predicates - The set of all hypotheses is called the
hypothesis space H - An hypothesis h agrees with an example if it
gives the correct value of CONCEPT
18Inductive Learning Scheme
19Size of Hypothesis Space
- n observable predicates
- 2n entries in truth table
- In the absence of any restriction (bias), there
are hypotheses to choose from - n 6 ? 2x1019 hypotheses!
20Multiple Inductive Hypotheses
Need for a system of preferences called a bias
to compare possible hypotheses
h1 ? NUM(x) ? BLACK(x) ? REWARD(x) h2 ?
BLACK(r,s) ? ?(rJ) ? REWARD(r,s) h3 ?
(r,s4,C) ? (r,s7,C) ? r,s2,S)
? REWARD(r,s) h3 ? ?(r,s5,H) ?
?(r,sJ,S) ? REWARD(r,s) agree with all the
examples in the training set
21Keep-It-Simple (KIS) Bias
- Motivation
- If an hypothesis is too complex it may not be
worth learning it (data caching might just do
the job as well) - There are much fewer simple hypotheses than
complex ones, hence the hypothesis space is
smaller - Examples
- Use much fewer observable predicates than
suggested by the training set - Constrain the learnt predicate, e.g., to use only
high-level observable predicates such as NUM,
FACE, BLACK, and RED and/or to have simple syntax
(e.g., conjunction of literals)
If the bias allows only sentences S that
are conjunctions of k ltlt n predicates picked
fromthe n observable predicates, then the size
of H is O(nk)
22Putting Things Together
23Predicate-Learning Methods
- Decision tree
- Version space
24Predicate as a Decision Tree
The predicate CONCEPT(x) ? A(x) ? (?B(x) v C(x))
can be represented by the following decision
tree
- ExampleA mushroom is poisonous iffit is yellow
and small, or yellow, - big and spotted
- x is a mushroom
- CONCEPT POISONOUS
- A YELLOW
- B BIG
- C SPOTTED
25Predicate as a Decision Tree
The predicate CONCEPT(x) ? A(x) ? (?B(x) v C(x))
can be represented by the following decision
tree
- ExampleA mushroom is poisonous iffit is yellow
and small, or yellow, - big and spotted
- x is a mushroom
- CONCEPT POISONOUS
- A YELLOW
- B BIG
- C SPOTTED
- D FUNNEL-CAP
- E BULKY
26Training Set
Ex. A B C D E CONCEPT
1 False False True False True False
2 False True False False False False
3 False True True True True False
4 False False True False False False
5 False False False True True False
6 True False True False False True
7 True False False True False True
8 True False True False True True
9 True True True False True True
10 True True True True True True
11 True True False False False False
12 True True False False True False
13 True False True True True True
27Possible Decision Tree
28Possible Decision Tree
CONCEPT ? (D ? (?E v A)) v
(C ? (B v ((E ? ?A) v A)))
KIS bias ? Build smallest decision tree
Computationally intractable problem? greedy
algorithm
29Getting Started
The distribution of the training set is
True 6, 7, 8, 9, 10,13 False 1, 2, 3, 4, 5, 11,
12
Ex. A B C D E CONCEPT
1 False False True False True False
2 False True False False False False
3 False True True True True False
4 False False True False False False
5 False False False True True False
6 True False True False False True
7 True False False True False True
8 True False True False True True
9 True True True False True True
10 True True True True True True
11 True True False False False False
12 True True False False True False
13 True False True True True True
30Getting Started
The distribution of training set is
True 6, 7, 8, 9, 10,13 False 1, 2, 3, 4, 5, 11,
12
Without testing any observable predicate,
we could report that CONCEPT is False (majority
rule) with an estimated probability of error
P(E) 6/13
31Getting Started
The distribution of training set is
True 6, 7, 8, 9, 10,13 False 1, 2, 3, 4, 5, 11,
12
Without testing any observable predicate,
we could report that CONCEPT is False (majority
rule)with an estimated probability of error P(E)
6/13
Assuming that we will only include one observable
predicate in the decision tree, which
predicateshould we test to minimize the
probability or error?
32Assume Its A
33Assume Its B
34Assume Its C
35Assume Its D
36Assume Its E
So, the best predicate to test is A
37Choice of Second Predicate
A
F
T
False
C
F
T
The majority rule gives the probability of error
Pr(EA) 1/8and Pr(E) 1/13
38Choice of Third Predicate
A
F
T
False
C
F
T
True
B
T
F
39Final Tree
L ? CONCEPT ? A ? (C v ?B)
40Learning a Decision Tree
- DTL(D,Predicates)
- If all examples in D are positive then return
True - If all examples in D are negative then return
False - If Predicates is empty then return failure
- A ? most discriminating predicate in Predicates
- Return the tree whose
- - root is A,
- - left branch is DTL(DA,Predicates-A),
- - right branch is DTL(D-A,Predicates-A)
41Using Information Theory
- Rather than minimizing the probability of error,
most existing learning procedures try to minimize
the expected number of questions needed to decide
if an object x satisfies CONCEPT - This minimization is based on a measure of the
quantity of information that is contained in
the truth value of an observable predicate
42Miscellaneous Issues
- Assessing performance
- Training set and test set
- Learning curve
43Miscellaneous Issues
- Assessing performance
- Training set and test set
- Learning curve
- Overfitting
- Tree pruning
44Miscellaneous Issues
- Assessing performance
- Training set and test set
- Learning curve
- Overfitting
- Tree pruning
- Cross-validation
- Missing data
45Miscellaneous Issues
- Assessing performance
- Training set and test set
- Learning curve
- Overfitting
- Tree pruning
- Cross-validation
- Missing data
- Multi-valued and continuous attributes
These issues occur with virtually any learning
method
46Multi-Valued Attributes
WillWait predicate (Russell and Norvig)
47Applications of Decision Tree
- Medical diagnostic / Drug design
- Evaluation of geological systems for assessing
gas and oil basins - Early detection of problems (e.g., jamming)
during oil drilling operations - Automatic generation of rules in expert systems
48Summary
- Inductive learning frameworks
- Logic inference formulation
- Hypothesis space and KIS bias
- Inductive learning of decision trees
- Assessing performance
- Overfitting