Combining Inductive and Analytical Learning - PowerPoint PPT Presentation

1 / 16

About This Presentation

Title:

Combining Inductive and Analytical Learning

Description:

Combining Inductive and Analytical Learning. Ch 12. in Machine Learning. Tom M. Mitchell ... network must fit a combined function of the training data & domain theory ... – PowerPoint PPT presentation

Number of Views:458

Avg rating:3.0/5.0

Slides: 17

Provided by: nlpKo

Category:

more less

Transcript and Presenter's Notes

Title: Combining Inductive and Analytical Learning

1
Combining Inductive and Analytical Learning

Ch 12. in Machine Learning
Tom M. Mitchell
????? ????? ???
? ? ?
1999. 7. 9.

2
Contents

Motivation
Inductive-Analytical Approaches to Learning
Using Prior Knowledge to Initialize the
Hypothesis
The KBANN Algorithm
Using Prior Knowledge to Alter the Search
Objective
The TANGENTPROP Algorithm
The EBNN Algorithm
Using Prior Knowledge to Augment Search Operators
The FOCL Algorithm

3
Motivation(1/2)

Inductive Analytical Learning
Inductive Learning Analytical
Learning
Goal Hypothesis fits data Hypothesis
fits domain theory
Justification Statistical inference
Deductive inference
Advantages Requires little prior knowledge
Learns from scarce data
Pitfalls Scarce data, incorrect bias
Imperfect domain theory
A spectrum of learning tasks
Most practical learning problems lie somewhere
between these two extremes of the spectrum.

4
Motivation(2/2)

What kinds of learning algorithms can we devise
that make use of approximate prior knowledge,
together with available data, to form general
hypothesis?
domain-independent algorithms that employ
explicitly input domain-dependent knowledge
Desirable Properties
no domain theory ? learn as well as inductive
methods
perfect domain theory ? learn as well as
analytical methods
imperfect domain theory imperfect training data
? combine the two to outperform either inductive
or analytical methods
accommodate arbitrary and unknown errors in
domain theory
accommodate arbitrary and unknown errors in
training data

5
The Learning Problem

Given
A set of training examples D, possibly containing
errors
A domain theory B, possibly containing errors
A space of candidate hypothesis H
Determine
A hypothesis that best fits the training examples
domain theory

6
Hypothesis Space Search

Learning as a task of searching through
hypothesis space
hypothesis space H
initial hypothesis
the set of search operator O
define individual search steps
the goal criterion G
specifies the search objective
Methods for using prior knowledge
Use prior knowledge to
derive an initial hypothesis from which to
begin the search
alter the objective G of the hypothesis space
search
alter the available search steps O

7
Using Prior Knowledge to Initialize the Hypothesis

Two Steps
1. initialize the hypothesis to perfectly fit the
domain theory
2. inductively refine this initial hypothesis as
needed to fit the training data
KBANN(Knowledge-Based Artificial Neural Network)
1. Analytical Step
create an initial network equivalent to the
domain theory
2. Inductive Step
refine the initial network (use BACKPROP)

Given ? A set of training examples ? A
domain theory consisting of nonrecursive,
propositional Horn clauses Determine ? An
artificial neural network that fits the training
examples, biased the domain theory
? Table 12.2(p.341)
8
Example The Cup Learning Task
Neural Net Equivalent to Domain Theory
Result of refining the network
9
Remarks

KBANN vs. Backpropagation
when given an approximately correct domain theory
scarce training data
KBANN generalizes more accurately than
Backpropagation
Classifying promoter regions in DNA
Backpropagation error rate 8/106
KBANN error rate 4/106
bias
KBANN
domain-specific theory
Backpropagation
domain-independent syntactic bias
toward small weight values

10
Using Prior Knowledge to Alter the Search
Objective

Use of prior knowledge
incorporate it into the error criterion minimized
by gradient descent
network must fit a combined function of the
training data domain theory
Form of prior knowledge
derivatives of the target function
certain type of prior knowledge can be expressed
quite naturally
example recognizing handwritten characters
the identity of the character is independent of
small translations and rotations of the image.

11
The TANGENTPROP Algorithm

Domain Knowledge
expressed as derivatives of the target function
with respect to transformations of its inputs
Training Derivatives
TANGENTPROP assumes various training derivatives
of the target function are provided.
Error Function

transformation(rotation or
translation) constant to determine the
relative importance
? Table 12.4(p.349)
12
Remarks

TANGENTPROP combines the prior knowledge with
observed training data, by minimizing an
objective function that measures both
the networks error with respect to the training
example values
the networks error with respect to the desired
derivatives
TANGENTPROP is not robust errors in the prior
knowledge
need to automatically select
EBNN Algorithm

13
The EBNN Algorithm(1/2)

Input
A set of training examples of the form
A domain theory represented by a set of
previously trained NN
Output
A new NN that approximates the target function
Algorithm
Create a new, fully connected feedforward network
to represent the target function
For each training example, determine
corresponding training derivatives
Use the TANGENTPROP algorithm to train the target
network

14
The EBNN Algorithm(2/2)

Computation of training derivatives
compute them itself for each observed training
example
explain each training example in terms of a given
domain theory
extract training derivatives from this
explanation
provide important information for distinguishing
relevant from irrelevant features
How to weight the relative importance of the
inductive analytical component of learning
is chosen independently for each training
example
consider how accurately the domain theory
predicts the training value for this particular
example
Error Function

? Figure 12.7(p.353)
A(x) domain theory prediction for input x
ith training instance jth component of the
vector x c normalizing constant
15
Remarks

EBNN vs. Symbolic Explanation-Based Learning
domain theory consisting of NNs rather than Horn
clauses
relevant dependencies take the form of
derivatives
accommodates imperfect domain theories
learns a fixed-sized neural network
requires constant time to classify new instances
unable to represent sufficiently complex functions

16
Using Prior Knowledge to Augment Search Operators

The FOCL Algorithm
Two operators for generating candidate
specializations
1. Add a single new literal
2. Add a set of literals that constitute
logically sufficient conditions for the target
concept, according to the domain theory
select one of the domain theory clauses whose
head matches the target concept.
Unfolding Each nonoperational literal is
replaced, until the sufficient conditions have
been restated in terms of operational literals.
Pruning the literal is removed unless its
removal reduces classification accuracy over the
training examples.
FOCL selects among all these candidate
specializations, based on their performance over
the data
domain theory is used in a fashion that biases
the learner
leaves final search choices to be made based on
performance over the training data