Inductive Learning (1/2) Decision Tree Method - PowerPoint PPT Presentation

About This Presentation
Title:

Inductive Learning (1/2) Decision Tree Method

Description:

Inductive Learning (1/2) Decision Tree Method Russell and Norvig: Chapter 18, Sections 18.1 through 18.4 Chapter 18, Sections 18.1 through 18.3 CS121 Winter 2003 – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 49
Provided by: JeanClaud80
Category:

less

Transcript and Presenter's Notes

Title: Inductive Learning (1/2) Decision Tree Method


1
Inductive Learning (1/2)Decision Tree Method
  • Russell and Norvig Chapter 18, Sections 18.1
    through 18.4
  • Chapter 18, Sections 18.1 through 18.3
  • CS121 Winter 2003

2
Quotes
  • Our experience of the world is specific, yet we
    are able to formulate general theories that
    account for the past and predict the future
    Genesereth and Nilsson, Logical Foundations of
    AI, 1987
  • Entities are not to be multiplied without
    necessityOckham, 1285-1349

3
Learning Agent
Critic
Percepts
Problem solver
Learning element
KB
Actions
4
Contents
  • Introduction to inductive learning
  • Logic-based inductive learning
  • Decision tree method
  • Version space method
  • Function-based inductive learning
  • Neural nets

5
Contents
  • Introduction to inductive learning
  • Logic-based inductive learning
  • Decision tree method
  • Version space method
  • why inductive learning works
  • Function-based inductive learning
  • Neural nets

1
2
3
6
Inductive Learning Frameworks
  1. Function-learning formulation
  2. Logic-inference formulation

7
Function-Learning Formulation
  • Goal function f
  • Training set (xi, f(xi)), i 1,,n
  • Inductive inference Find a function h that
    fits the point well

? Neural nets
8
Logic-Inference Formulation
  • Background knowledge KB
  • Training set D (observed knowledge) such that
    KB D and KB, D is satisfiable
  • Inductive inference Find h (inductive
    hypothesis) such that
  • KB, h is satisfiable
  • KB,h D

h D is a trivial,but uninteresting solution
(data caching)
9
Rewarded Card Example
  • Deck of cards, with each card designated by
    r,s, its rank and suit, and some cards
    rewarded
  • Background knowledge KB ((r1) v v (r10)) ?
    NUM(r)((rJ) v (rQ) v (rK)) ? FACE(r)((sS) v
    (sC)) ? BLACK(s)((sD) v (sH)) ? RED(s)
  • Training set DREWARD(4,C) ? REWARD(7,C) ?
    REWARD(2,S) ?
    ?REWARD(5,H) ? ?REWARD(J,S)

10
Rewarded Card Example
  • Background knowledge KB ((r1) v v (r10)) ?
    NUM(r)((rJ) v (rQ) v (rK)) ? FACE(r)((sS) v
    (sC)) ? BLACK(s)((sD) v (sH)) ? RED(s)
  • Training set DREWARD(4,C) ? REWARD(7,C) ?
    REWARD(2,S) ?
    ?REWARD(5,H) ? ?REWARD(J,S)
  • Possible inductive hypothesish ? (NUM(r) ?
    BLACK(s) ? REWARD(r,s))

There are several possible inductive hypotheses
11
Learning a Predicate
  • Set E of objects (e.g., cards)
  • Goal predicate CONCEPT(x), where x is an object
    in E, that takes the value True or False (e.g.,
    REWARD)
  • Example CONCEPT describes the precondition of
    an action, e.g., Unstack(C,A)
  • E is the set of states
  • CONCEPT(x) ? HANDEMPTY?x, BLOCK(C) ?x, BLOCK(A)
    ?x, CLEAR(C) ?x, ON(C,A) ?x
  • Learning CONCEPT is a step toward learning the
    action

12
Learning a Predicate
  • Set E of objects (e.g., cards)
  • Goal predicate CONCEPT(x), where x is an object
    in E, that takes the value True or False (e.g.,
    REWARD)
  • Observable predicates A(x), B(X), (e.g., NUM,
    RED)
  • Training set values of CONCEPT for some
    combinations of values of the observable
    predicates

13
A Possible Training Set
Ex. A B C D E CONCEPT
1 True True False True False False
2 True False False False False True
3 False False True True True False
4 True True True False True True
5 False True True False False False
6 True True False True True False
7 False False True False True False
8 True False True False True True
9 False False False True True False
10 True True True True False True
Note that the training set does not say whether
an observable predicate A, , E is pertinent or
not
14
Learning a Predicate
  • Set E of objects (e.g., cards)
  • Goal predicate CONCEPT(x), where x is an object
    in E, that takes the value True or False (e.g.,
    REWARD)
  • Observable predicates A(x), B(X), (e.g., NUM,
    RED)
  • Training set values of CONCEPT for some
    combinations of values of the observable
    predicates
  • Find a representation of CONCEPT in the form
    CONCEPT(x) ? S(A,B, )where
    S(A,B,) is a sentence built with the observable
    predicates, e.g. CONCEPT(x) ? A(x)
    ? (?B(x) v C(x))

15
Learning the concept of an Arch
ARCH(x) ? HAS-PART(x,b1) ? HAS-PART(x,b2) ?
HAS-PART(x,b3) ? IS-A(b1,BRICK) ?
IS-A(b2,BRICK) ? ?MEET(b1,b2) ?
(IS-A(b3,BRICK) v
IS-A(b3,WEDGE)) ?
SUPPORTED(b3,b1) ? SUPPORTED(b3,b2)
16
Example set
  • An example consists of the values of CONCEPT and
    the observable predicates for some object x
  • A example is positive if CONCEPT is True, else
    it is negative
  • The set E of all examples is the example set
  • The training set is a subset of E

17
Hypothesis Space
  • An hypothesis is any sentence h of the form
    CONCEPT(x) ? S(A,B, )where S(A,B,) is
    a sentence built with the observable predicates
  • The set of all hypotheses is called the
    hypothesis space H
  • An hypothesis h agrees with an example if it
    gives the correct value of CONCEPT

18
Inductive Learning Scheme
19
Size of Hypothesis Space
  • n observable predicates
  • 2n entries in truth table
  • In the absence of any restriction (bias), there
    are hypotheses to choose from
  • n 6 ? 2x1019 hypotheses!

20
Multiple Inductive Hypotheses
Need for a system of preferences called a bias
to compare possible hypotheses
h1 ? NUM(x) ? BLACK(x) ? REWARD(x) h2 ?
BLACK(r,s) ? ?(rJ) ? REWARD(r,s) h3 ?
(r,s4,C) ? (r,s7,C) ? r,s2,S)
? REWARD(r,s) h3 ? ?(r,s5,H) ?
?(r,sJ,S) ? REWARD(r,s) agree with all the
examples in the training set
21
Keep-It-Simple (KIS) Bias
  • Motivation
  • If an hypothesis is too complex it may not be
    worth learning it (data caching might just do
    the job as well)
  • There are much fewer simple hypotheses than
    complex ones, hence the hypothesis space is
    smaller
  • Examples
  • Use much fewer observable predicates than
    suggested by the training set
  • Constrain the learnt predicate, e.g., to use only
    high-level observable predicates such as NUM,
    FACE, BLACK, and RED and/or to have simple syntax
    (e.g., conjunction of literals)

If the bias allows only sentences S that
are conjunctions of k ltlt n predicates picked
fromthe n observable predicates, then the size
of H is O(nk)
22
Putting Things Together
23
Predicate-Learning Methods
  • Decision tree
  • Version space

24
Predicate as a Decision Tree
The predicate CONCEPT(x) ? A(x) ? (?B(x) v C(x))
can be represented by the following decision
tree
  • ExampleA mushroom is poisonous iffit is yellow
    and small, or yellow,
  • big and spotted
  • x is a mushroom
  • CONCEPT POISONOUS
  • A YELLOW
  • B BIG
  • C SPOTTED

25
Predicate as a Decision Tree
The predicate CONCEPT(x) ? A(x) ? (?B(x) v C(x))
can be represented by the following decision
tree
  • ExampleA mushroom is poisonous iffit is yellow
    and small, or yellow,
  • big and spotted
  • x is a mushroom
  • CONCEPT POISONOUS
  • A YELLOW
  • B BIG
  • C SPOTTED
  • D FUNNEL-CAP
  • E BULKY

26
Training Set
Ex. A B C D E CONCEPT
1 False False True False True False
2 False True False False False False
3 False True True True True False
4 False False True False False False
5 False False False True True False
6 True False True False False True
7 True False False True False True
8 True False True False True True
9 True True True False True True
10 True True True True True True
11 True True False False False False
12 True True False False True False
13 True False True True True True
27
Possible Decision Tree
28
Possible Decision Tree
CONCEPT ? (D ? (?E v A)) v
(C ? (B v ((E ? ?A) v A)))
KIS bias ? Build smallest decision tree
Computationally intractable problem? greedy
algorithm
29
Getting Started
The distribution of the training set is
True 6, 7, 8, 9, 10,13 False 1, 2, 3, 4, 5, 11,
12
Ex. A B C D E CONCEPT
1 False False True False True False
2 False True False False False False
3 False True True True True False
4 False False True False False False
5 False False False True True False
6 True False True False False True
7 True False False True False True
8 True False True False True True
9 True True True False True True
10 True True True True True True
11 True True False False False False
12 True True False False True False
13 True False True True True True
30
Getting Started
The distribution of training set is
True 6, 7, 8, 9, 10,13 False 1, 2, 3, 4, 5, 11,
12
Without testing any observable predicate,
we could report that CONCEPT is False (majority
rule) with an estimated probability of error
P(E) 6/13
31
Getting Started
The distribution of training set is
True 6, 7, 8, 9, 10,13 False 1, 2, 3, 4, 5, 11,
12
Without testing any observable predicate,
we could report that CONCEPT is False (majority
rule)with an estimated probability of error P(E)
6/13
Assuming that we will only include one observable
predicate in the decision tree, which
predicateshould we test to minimize the
probability or error?
32
Assume Its A
33
Assume Its B
34
Assume Its C
35
Assume Its D
36
Assume Its E
So, the best predicate to test is A
37
Choice of Second Predicate
A
F
T
False
C
F
T
The majority rule gives the probability of error
Pr(EA) 1/8and Pr(E) 1/13
38
Choice of Third Predicate
A
F
T
False
C
F
T
True
B
T
F
39
Final Tree
L ? CONCEPT ? A ? (C v ?B)
40
Learning a Decision Tree
  • DTL(D,Predicates)
  • If all examples in D are positive then return
    True
  • If all examples in D are negative then return
    False
  • If Predicates is empty then return failure
  • A ? most discriminating predicate in Predicates
  • Return the tree whose
  • - root is A,
  • - left branch is DTL(DA,Predicates-A),
  • - right branch is DTL(D-A,Predicates-A)

41
Using Information Theory
  • Rather than minimizing the probability of error,
    most existing learning procedures try to minimize
    the expected number of questions needed to decide
    if an object x satisfies CONCEPT
  • This minimization is based on a measure of the
    quantity of information that is contained in
    the truth value of an observable predicate

42
Miscellaneous Issues
  • Assessing performance
  • Training set and test set
  • Learning curve

43
Miscellaneous Issues
  • Assessing performance
  • Training set and test set
  • Learning curve
  • Overfitting
  • Tree pruning

44
Miscellaneous Issues
  • Assessing performance
  • Training set and test set
  • Learning curve
  • Overfitting
  • Tree pruning
  • Cross-validation
  • Missing data

45
Miscellaneous Issues
  • Assessing performance
  • Training set and test set
  • Learning curve
  • Overfitting
  • Tree pruning
  • Cross-validation
  • Missing data
  • Multi-valued and continuous attributes

These issues occur with virtually any learning
method
46
Multi-Valued Attributes
WillWait predicate (Russell and Norvig)
47
Applications of Decision Tree
  • Medical diagnostic / Drug design
  • Evaluation of geological systems for assessing
    gas and oil basins
  • Early detection of problems (e.g., jamming)
    during oil drilling operations
  • Automatic generation of rules in expert systems

48
Summary
  • Inductive learning frameworks
  • Logic inference formulation
  • Hypothesis space and KIS bias
  • Inductive learning of decision trees
  • Assessing performance
  • Overfitting
Write a Comment
User Comments (0)
About PowerShow.com