Combining Inductive and Analytical Learning - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Combining Inductive and Analytical Learning

Description:

Combining Inductive and Analytical Learning. Ch 12. in Machine Learning. Tom M. Mitchell ... network must fit a combined function of the training data & domain theory ... – PowerPoint PPT presentation

Number of Views:458
Avg rating:3.0/5.0
Slides: 17
Provided by: nlpKo
Category:

less

Transcript and Presenter's Notes

Title: Combining Inductive and Analytical Learning


1
Combining Inductive and Analytical Learning
  • Ch 12. in Machine Learning
  • Tom M. Mitchell
  • ????? ????? ???
  • ? ? ?
  • 1999. 7. 9.

2
Contents
  • Motivation
  • Inductive-Analytical Approaches to Learning
  • Using Prior Knowledge to Initialize the
    Hypothesis
  • The KBANN Algorithm
  • Using Prior Knowledge to Alter the Search
    Objective
  • The TANGENTPROP Algorithm
  • The EBNN Algorithm
  • Using Prior Knowledge to Augment Search Operators
  • The FOCL Algorithm

3
Motivation(1/2)
  • Inductive Analytical Learning
  • Inductive Learning Analytical
    Learning
  • Goal Hypothesis fits data Hypothesis
    fits domain theory
  • Justification Statistical inference
    Deductive inference
  • Advantages Requires little prior knowledge
    Learns from scarce data
  • Pitfalls Scarce data, incorrect bias
    Imperfect domain theory
  • A spectrum of learning tasks
  • Most practical learning problems lie somewhere
    between these two extremes of the spectrum.

4
Motivation(2/2)
  • What kinds of learning algorithms can we devise
    that make use of approximate prior knowledge,
    together with available data, to form general
    hypothesis?
  • domain-independent algorithms that employ
    explicitly input domain-dependent knowledge
  • Desirable Properties
  • no domain theory ? learn as well as inductive
    methods
  • perfect domain theory ? learn as well as
    analytical methods
  • imperfect domain theory imperfect training data
    ? combine the two to outperform either inductive
    or analytical methods
  • accommodate arbitrary and unknown errors in
    domain theory
  • accommodate arbitrary and unknown errors in
    training data

5
The Learning Problem
  • Given
  • A set of training examples D, possibly containing
    errors
  • A domain theory B, possibly containing errors
  • A space of candidate hypothesis H
  • Determine
  • A hypothesis that best fits the training examples
    domain theory

6
Hypothesis Space Search
  • Learning as a task of searching through
    hypothesis space
  • hypothesis space H
  • initial hypothesis
  • the set of search operator O
  • define individual search steps
  • the goal criterion G
  • specifies the search objective
  • Methods for using prior knowledge
  • Use prior knowledge to
  • derive an initial hypothesis from which to
    begin the search
  • alter the objective G of the hypothesis space
    search
  • alter the available search steps O

7
Using Prior Knowledge to Initialize the Hypothesis
  • Two Steps
  • 1. initialize the hypothesis to perfectly fit the
    domain theory
  • 2. inductively refine this initial hypothesis as
    needed to fit the training data
  • KBANN(Knowledge-Based Artificial Neural Network)
  • 1. Analytical Step
  • create an initial network equivalent to the
    domain theory
  • 2. Inductive Step
  • refine the initial network (use BACKPROP)

Given ? A set of training examples ? A
domain theory consisting of nonrecursive,
propositional Horn clauses Determine ? An
artificial neural network that fits the training
examples, biased the domain theory
? Table 12.2(p.341)
8
Example The Cup Learning Task
Neural Net Equivalent to Domain Theory
Result of refining the network
9
Remarks
  • KBANN vs. Backpropagation
  • when given an approximately correct domain theory
    scarce training data
  • KBANN generalizes more accurately than
    Backpropagation
  • Classifying promoter regions in DNA
  • Backpropagation error rate 8/106
  • KBANN error rate 4/106
  • bias
  • KBANN
  • domain-specific theory
  • Backpropagation
  • domain-independent syntactic bias
  • toward small weight values

10
Using Prior Knowledge to Alter the Search
Objective
  • Use of prior knowledge
  • incorporate it into the error criterion minimized
    by gradient descent
  • network must fit a combined function of the
    training data domain theory
  • Form of prior knowledge
  • derivatives of the target function
  • certain type of prior knowledge can be expressed
    quite naturally
  • example recognizing handwritten characters
  • the identity of the character is independent of
    small translations and rotations of the image.

11
The TANGENTPROP Algorithm
  • Domain Knowledge
  • expressed as derivatives of the target function
    with respect to transformations of its inputs
  • Training Derivatives
  • TANGENTPROP assumes various training derivatives
    of the target function are provided.
  • Error Function

transformation(rotation or
translation) constant to determine the
relative importance
? Table 12.4(p.349)
12
Remarks
  • TANGENTPROP combines the prior knowledge with
    observed training data, by minimizing an
    objective function that measures both
  • the networks error with respect to the training
    example values
  • the networks error with respect to the desired
    derivatives
  • TANGENTPROP is not robust errors in the prior
    knowledge
  • need to automatically select
  • EBNN Algorithm

13
The EBNN Algorithm(1/2)
  • Input
  • A set of training examples of the form
  • A domain theory represented by a set of
    previously trained NN
  • Output
  • A new NN that approximates the target function
  • Algorithm
  • Create a new, fully connected feedforward network
    to represent the target function
  • For each training example, determine
    corresponding training derivatives
  • Use the TANGENTPROP algorithm to train the target
    network

14
The EBNN Algorithm(2/2)
  • Computation of training derivatives
  • compute them itself for each observed training
    example
  • explain each training example in terms of a given
    domain theory
  • extract training derivatives from this
    explanation
  • provide important information for distinguishing
    relevant from irrelevant features
  • How to weight the relative importance of the
    inductive analytical component of learning
  • is chosen independently for each training
    example
  • consider how accurately the domain theory
    predicts the training value for this particular
    example
  • Error Function

? Figure 12.7(p.353)
A(x) domain theory prediction for input x
ith training instance jth component of the
vector x c normalizing constant
15
Remarks
  • EBNN vs. Symbolic Explanation-Based Learning
  • domain theory consisting of NNs rather than Horn
    clauses
  • relevant dependencies take the form of
    derivatives
  • accommodates imperfect domain theories
  • learns a fixed-sized neural network
  • requires constant time to classify new instances
  • unable to represent sufficiently complex functions

16
Using Prior Knowledge to Augment Search Operators
  • The FOCL Algorithm
  • Two operators for generating candidate
    specializations
  • 1. Add a single new literal
  • 2. Add a set of literals that constitute
    logically sufficient conditions for the target
    concept, according to the domain theory
  • select one of the domain theory clauses whose
    head matches the target concept.
  • Unfolding Each nonoperational literal is
    replaced, until the sufficient conditions have
    been restated in terms of operational literals.
  • Pruning the literal is removed unless its
    removal reduces classification accuracy over the
    training examples.
  • FOCL selects among all these candidate
    specializations, based on their performance over
    the data
  • domain theory is used in a fashion that biases
    the learner
  • leaves final search choices to be made based on
    performance over the training data

? Figure 12.8(p.358)
? Figure 12.9(p.361)
Write a Comment
User Comments (0)
About PowerShow.com