Title: CS 9633 Machine Learning Explanation Based Learning
1CS 9633 Machine LearningExplanation Based
Learning
2Analytical Learning
- Inductive learning
- Given a large set of examples generalize to find
features that distinguish positive and negative
examples - Examples include NNs, GAs, Decision trees,
support vector machines, etc. - Problem is that they perform poorly with very
small training sets - Analytical learning combines examples and
domain model
3Learning by People
- People can often learn a concept from a single
example. - They appear to do this by analyzing the example
in terms of previous knowledge to determine the
most relevant features. - Some inductive algorithms use domain knowledge to
increase the hypothesis space - Explanation based learning uses domain knowledge
to decrease the size of the hypothesis space.
4Example
Positive example of Chess positions in which
black will lose its queen within two moves
5Inductive versus Analytical Learning
- Inductive Learning Given a hypothesis space H,
set of training examples D, desired output is
hypothesis consistent with training examples.
- Analytical Learning Given hypothesis space H,
set of training examples D, and a domain theory
B, the desired output is hypothesis consistent
with B and D.
6SafeToStack Problem Instances
- Instance Space Each instance describes a pair
of objects represented by the predicates - Type (Ex. Box, Endtable, )
- Color
- Volume
- Owner
- Material
- Density
- On
7SafeToStack Hypothesis Space
- Hypothesis space H is a set of Horn clause rules.
- The head of each rule is a literal containing the
target predicate SafeToStack - The body of each rule is a conjunction of
literals based on - The predicates used to describe the instances
- Additional general purpose predicates like
- LessThan
- Equal
- Greater
- Additional general purpose functions like
- Plus
- Minus
- Times
- SafeToStack(x,y) ?Volume(x,vx) ? Volume(y,vy) ?
LessThan(vx,vy)
8SafeToStack Target Concept
9SafeToStack Training Examples
- SafeToStack(Obj1, Obj2)
- On(Obj1, Obj2)
- Type(Obj1,Box)
- Type(Obj2, Endtable)
- Color(Obj1, Red)
- Color(Obj2, Blue)
- Volume(Obj1, 2)
- Owner(Obj1, Fred)
- Owner(Obj2, Louise)
- Density(Obj1, 0.3)
- Material(Obj1, Cardboard)
- Material(Obj2, Wood)
10SafeToStack Domain Theory B
- SafeToStack(x,y) ? ?Fragile(y)
- SafeToStack(x,y)? Lighter(x,y)
- Lighter(x,y)? Weight(x,wx) ? Weight(y, wy) ?
LessThan(wx,wy) - Weight(x,w) ? Volume(x,v) ?Density(x,d) ?
Equal(w, times(v,d)) - Weight(x,5)? Type(x,Endtable)
- Fragile(x) ? Material(x, Glass)
11Analytical Learning Problem
- We must provide a domain theory sufficient to
explain why observed positive examples satisfy
the target concept. - The domain theory is a set of Horn clauses.
12Learning with Perfect Domain Theories
- Prolog EBG is an example system.
- Domain theory must be
- Correct
- Complete with respect to target concept and
instance space
13Reasonableness of Perfect Domain Theories
- In some cases it is feasible to develop a perfect
domain theory (chess is an example). Can help
improve the performance of search intensive
planning and optimization problems. - It is often not feasible to develop a perfect
domain theory. Must be able to generate
plausible explanations
14Prolog-EBL (see Table 11.2 for details)
- For each new positive training example not yet
covered by a learned Horn clause, form a new Horn
clause by - Explaining the new positive training example by
proving its truth - Analyzing this explanation to determine an
appropriate generalization - Refine the current hypothesis by adding a new
Horn clause to cover this positive example as
well as other similar instances.
151. Explaining the Training Example
- Provide a proof that the training example
satisfies the target concept. - If the domain theory is correct and complete, use
a proof procedure like resolution. - If the domain theory is not correct and complete,
must extend proof procedure to allow plausible
approximate arguments.
16SafeToStack(Obj1,Obj2)
Weight(Obj1,0.6)
LessThan(0.6,5)
Weight(Obj2,5)
Type(Obj2,Endtable)
Volume(Obj1,2)
Density(Obj1,0.3)
Equal(0.6,20.3)
17EndTable
Type
Obj2
On
Material
Wood
Owner
Obj1
Density
Volume
0.3
Color
Louise
2
Blue
Type
Material
Color
Owner
Box
Cardboard
Red
Fred
182. Generating a General Rule
- General rule from domain theory
- SafeToStack(x,y)?Volume(x,2)?Density(x,0.3)?Type(y
, EndT) - Note that we omitted the leaf nodes that are
always satisfied independent of x and y - Equal(0.6, times(2,0.3))
- LessThan(0.6, 5)
- However, we would like an even more general rule
19Weakest Preimage
- Goal is to compute the most general rule that can
be justified by the explanation. - We do this by computing the weakest preimage
- Definition the weakest preimage of a conclusion
C with respect to proof P is the most general set
of assertions A, such that A entails C according
to P.
20Most General Rule
- The most general rule that can be justified by
the explanation is - SafeToStack(x,y)?Volume(x,vx)?Density(x,dx)?Equal(
wx,times(vx,dx))?LessThan(wx,5) ?Type(y, EndT) - Use general procedure called regression to
generate this rule - Start with the target concept with respect to the
final step in the explanation - Generate weakest preimage of the target concept
with respect to the preceding step - Terminate after iterating over all steps in the
explanation.
21SafeToStack(Obj1,Obj2) SafeToStack(x,y)
22SafeToStack(Obj1,Obj2) SafeToStack(x,y)
Lighter(Obj1,Obj2) Lighter(x,y)
23SafeToStack(Obj1,Obj2) SafeToStack(x,y)
Lighter(Obj1,Obj2) Lighter(x,y)
Weight(Obj1,0.6) Weight(x,wx)
LessThan(0.6,5) LessThan(wx,wy)
Weight(Obj2,5) Weight(y,wy)
24SafeToStack(Obj1,Obj2) SafeToStack(x,y)
Lighter(Obj1,Obj2) Lighter(x,y)
Weight(Obj1,0.6) Weight(x,wx)
LessThan(0.6,5) LessThan(wx,wy)
Weight(Obj2,5) Weight(y,wy)
Volume(Obj1,2) Density(Obj1, 0.3)
Equal(0.6,20.3) Volume(x,xv)
Density(x,dx) Equal(wx,vxdx)
LessThan(wx,wv) Weight(y,wy)
25SafeToStack(Obj1,Obj2) SafeToStack(x,y)
Lighter(Obj1,Obj2) Lighter(x,y)
Weight(Obj1,0.6) Weight(x,wx)
LessThan(0.6,5) LessThan(wx,wy)
Weight(Obj2,5) Weight(y,wy)
Volume(Obj1,2) Density(Obj1, 0.3)
Equal(0.6,20.3) Volume(x,xv)
Density(x,dx) Equal(wx,vxdx)
LessThan(wx,wy) Weight(y,wy)
Type((obj2, EndT) Volume(x,xv)
Density(x,dx) Equal(wx,vxdx)
LessThan(wx,5) Type(y,EndT)
263. Refine the Current Hypothesis
- The current hypothesis is the set of Horn clauses
learned so far. - At each stage, a new positive example is picked
that is not yet covered by the current hypothesis
and a new rule is developed to cover it. - Only positive examples are covered by the rules.
- Instances not covered by the rules are classified
as negative (negation-as-failure approach)
27EBL Summary
- Individual examples are explained (proven) using
prior knowledge - Attributes included in the proof are considered
relevant. - Regression is used to generalize the rule.
- Generality of learned clauses depends on the
formulation of the domain theory, the order in
which examples are encountered, and other
instances that share the same explanation. - Assumes domain theory is complete and correct.
28Different Perspectives on EBL
- EBL is a theory-guided generalization of
examples. - EBL is an example-guided reformulation of
theories. Rules created that - Follow deductively from the domain theory
- Classify the observed training examples in a
single inference step - EBL is just a restating of what the learner
already knows (knowledge compilation)
29Inductive Bias of EBL
- Domain theory
- Algorithm (sequential covering) used to choose
among alternative Horn clauses. - Generalization procedure favors small sets of
Horn clauses.
30EBL for Search Strategies
- Requirement for correct and complete domain
theory is often difficult to meet, but can often
be met in complex search tasks. - This type of learning is called speedup learning.
- Can use EBL to learn efficient sequences of
operators (evolve meta-operators)