Title: Combining Inductive and Analytical Learning
1Combining Inductive and Analytical Learning
- Ch 12. in Machine Learning
- Tom M. Mitchell
- ????? ????? ???
- ? ? ?
- 1999. 7. 9.
2Contents
- Motivation
- Inductive-Analytical Approaches to Learning
- Using Prior Knowledge to Initialize the
Hypothesis - The KBANN Algorithm
- Using Prior Knowledge to Alter the Search
Objective - The TANGENTPROP Algorithm
- The EBNN Algorithm
- Using Prior Knowledge to Augment Search Operators
- The FOCL Algorithm
3Motivation(1/2)
- Inductive Analytical Learning
- Inductive Learning Analytical
Learning - Goal Hypothesis fits data Hypothesis
fits domain theory - Justification Statistical inference
Deductive inference - Advantages Requires little prior knowledge
Learns from scarce data - Pitfalls Scarce data, incorrect bias
Imperfect domain theory - A spectrum of learning tasks
- Most practical learning problems lie somewhere
between these two extremes of the spectrum.
4Motivation(2/2)
- What kinds of learning algorithms can we devise
that make use of approximate prior knowledge,
together with available data, to form general
hypothesis? - domain-independent algorithms that employ
explicitly input domain-dependent knowledge - Desirable Properties
- no domain theory ? learn as well as inductive
methods - perfect domain theory ? learn as well as
analytical methods - imperfect domain theory imperfect training data
? combine the two to outperform either inductive
or analytical methods - accommodate arbitrary and unknown errors in
domain theory - accommodate arbitrary and unknown errors in
training data
5The Learning Problem
- Given
- A set of training examples D, possibly containing
errors - A domain theory B, possibly containing errors
- A space of candidate hypothesis H
- Determine
- A hypothesis that best fits the training examples
domain theory
6Hypothesis Space Search
- Learning as a task of searching through
hypothesis space - hypothesis space H
- initial hypothesis
- the set of search operator O
- define individual search steps
- the goal criterion G
- specifies the search objective
- Methods for using prior knowledge
- Use prior knowledge to
- derive an initial hypothesis from which to
begin the search - alter the objective G of the hypothesis space
search - alter the available search steps O
7Using Prior Knowledge to Initialize the Hypothesis
- Two Steps
- 1. initialize the hypothesis to perfectly fit the
domain theory - 2. inductively refine this initial hypothesis as
needed to fit the training data - KBANN(Knowledge-Based Artificial Neural Network)
- 1. Analytical Step
- create an initial network equivalent to the
domain theory - 2. Inductive Step
- refine the initial network (use BACKPROP)
Given ? A set of training examples ? A
domain theory consisting of nonrecursive,
propositional Horn clauses Determine ? An
artificial neural network that fits the training
examples, biased the domain theory
? Table 12.2(p.341)
8Example The Cup Learning Task
Neural Net Equivalent to Domain Theory
Result of refining the network
9Remarks
- KBANN vs. Backpropagation
- when given an approximately correct domain theory
scarce training data - KBANN generalizes more accurately than
Backpropagation - Classifying promoter regions in DNA
- Backpropagation error rate 8/106
- KBANN error rate 4/106
- bias
- KBANN
- domain-specific theory
- Backpropagation
- domain-independent syntactic bias
- toward small weight values
10Using Prior Knowledge to Alter the Search
Objective
- Use of prior knowledge
- incorporate it into the error criterion minimized
by gradient descent - network must fit a combined function of the
training data domain theory - Form of prior knowledge
- derivatives of the target function
- certain type of prior knowledge can be expressed
quite naturally - example recognizing handwritten characters
- the identity of the character is independent of
small translations and rotations of the image.
11The TANGENTPROP Algorithm
- Domain Knowledge
- expressed as derivatives of the target function
with respect to transformations of its inputs - Training Derivatives
- TANGENTPROP assumes various training derivatives
of the target function are provided. - Error Function
transformation(rotation or
translation) constant to determine the
relative importance
? Table 12.4(p.349)
12Remarks
- TANGENTPROP combines the prior knowledge with
observed training data, by minimizing an
objective function that measures both - the networks error with respect to the training
example values - the networks error with respect to the desired
derivatives - TANGENTPROP is not robust errors in the prior
knowledge - need to automatically select
- EBNN Algorithm
13The EBNN Algorithm(1/2)
- Input
- A set of training examples of the form
- A domain theory represented by a set of
previously trained NN - Output
- A new NN that approximates the target function
- Algorithm
- Create a new, fully connected feedforward network
to represent the target function - For each training example, determine
corresponding training derivatives - Use the TANGENTPROP algorithm to train the target
network
14The EBNN Algorithm(2/2)
- Computation of training derivatives
- compute them itself for each observed training
example - explain each training example in terms of a given
domain theory - extract training derivatives from this
explanation - provide important information for distinguishing
relevant from irrelevant features - How to weight the relative importance of the
inductive analytical component of learning - is chosen independently for each training
example - consider how accurately the domain theory
predicts the training value for this particular
example - Error Function
? Figure 12.7(p.353)
A(x) domain theory prediction for input x
ith training instance jth component of the
vector x c normalizing constant
15Remarks
- EBNN vs. Symbolic Explanation-Based Learning
- domain theory consisting of NNs rather than Horn
clauses - relevant dependencies take the form of
derivatives - accommodates imperfect domain theories
- learns a fixed-sized neural network
- requires constant time to classify new instances
- unable to represent sufficiently complex functions
16Using Prior Knowledge to Augment Search Operators
- The FOCL Algorithm
- Two operators for generating candidate
specializations - 1. Add a single new literal
- 2. Add a set of literals that constitute
logically sufficient conditions for the target
concept, according to the domain theory - select one of the domain theory clauses whose
head matches the target concept. - Unfolding Each nonoperational literal is
replaced, until the sufficient conditions have
been restated in terms of operational literals. - Pruning the literal is removed unless its
removal reduces classification accuracy over the
training examples. - FOCL selects among all these candidate
specializations, based on their performance over
the data - domain theory is used in a fashion that biases
the learner - leaves final search choices to be made based on
performance over the training data
? Figure 12.8(p.358)
? Figure 12.9(p.361)