Learning Agents Laboratory - PowerPoint PPT Presentation

About This Presentation
Title:

Learning Agents Laboratory

Description:

Concept learning from examples. Version spaces and ... monarchy. governing_body. other_state_. government. dictator. deity_figure. chief_and_. tribal_council ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 69
Provided by: Gheorgh
Learn more at: http://lalab.gmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Learning Agents Laboratory


1
CS 782 Machine Learning
3. Inductive Learning from Examples Version
space learning
Prof. Gheorghe Tecuci
Learning Agents Laboratory Computer Science
Department George Mason University
2
Overview
Instances, concepts and generalization
Concept learning from examples
Version spaces and the candidate elimination
algorithm
The LEX system
The learning bias
Discussion
Recommended reading
3
Basic ontological elements instances and concepts
An instance is a representation of a particular
entity from the application domain.
A concept is a representation of a set of
instances.
state_government
state_government
government_of_US_1943
government_of_Britain_1943
instance_of
instance_of
government_of_Britain_1943
government_of_US_1943
instance_of is the relationship between an
instance and the concept to which it belongs.
state_government represents the set of all
entities that are governments of states. This set
includes government_of_US_1943 and
government_of_Britain_1943 which are called
positive examples.
An entity which is not an instance of a concept
is called a negative example of that concept.
4
An instance is a representation of a specific
entity, such as US_1943 or the
government_of_US_1943. A concept is a
representation of a set of instances. For
example, state_government represents the set of
all entities that are governments of states. This
set includes government_of_US_1943. One may
use a concept name to refer to an unspecified
individual from the set represented by the
concept. For example, one could say that ?X is a
state government, meaning that ?X might be
government_of_US_1943 or government_of_Britain_
1943 or any other instance from the set
state_government. The relationship between an
instance and the concept to which it belongs is
called instance_of.
5
Concept generality
A concept P is more general than another concept
Q if and only if the set of instances represented
by P includes the set of instances represented by
Q.
state_government
Example
democratic_government
representative_ democracy
totalitarian_ government
parliamentary_ democracy
state_government
subconcept_of is the relationship between a
concept and a more general concept.
subconcept_of
democratic_government
6
A concept represents a set of instances. The
larger this set is, the more general the concept
is said to be. For example, democratic_government
represents the set of all the state governments
that are democratic. The set of democratic
governments is included into the set of state
governments. Therefore the concept
democratic_government is said to be less
general than the concept state_government. The
formal relationship between them is called
subconcept_of. Similarly, the concept
state_government is said to be more general
than democratic_government.
7
A generalization hierarchy
governing_body
ad_hoc_ governing_body
established_ governing_body
other_type_of_ governing_body
state_government
group_governing_body
feudal_god_ king_government
other_state_ government
dictator
other_ group_ governing_ body
democratic_ government
monarchy
deity_figure
representative_ democracy
parliamentary_ democracy
government_ of_Italy_1943
democratic_ council_ or_board
autocratic_ leader
totalitarian_ government
government_ of_US_1943
government_ of_Britain_1943
chief_and_ tribal_council
theocratic_ government
military_ dictatorship
police_ state
fascist_ state
religious_ dictatorship
theocratic_ democracy
communist_ dictatorship
religious_ dictatorship
government_ of_Germany_1943
government_ of_USSR_1943
8
The instances and the concepts are organized into
generalization hierarchies like this hierarchy of
governing bodies. Notice, however, that the
generalization hierarchies are not always as
strict as this one, where each concept is a
subconcept of only one concept. For instance,
the concept strategic_raw_material is both a
subconcept of raw_material and a subconcept of
strategically_essential_resource_or_infrastructur
e_element.
9
Overview
Instances, concepts and generalization
Concept learning from examples
Version spaces and the candidate elimination
algorithm
The LEX system
The learning bias
Discussion
Recommended reading
10
Empirical inductive concept learning from examples
Approach Compare the positive and the negative
examples of a concept, in terms of their
similarities and differences, and learn the
concept as a generalized description of the
similarities of the positive examples.
Why is Concept Learning important?
Concept Learning allows the agent to recognize
other entities as being instances of the learned
concept.
11
The learning problem

Given a language of instances a language
of generalizations a set of positive examples
(E1, ..., En) of a concept a set of negative
examples (C1, ... , Cm) of the same concept a
learning bias other background
knowledge Determine a concept description
which is a generalization of the positive
examples that does not cover any of the negative
examples
Purpose of concept learning Predict if an
instance is an example of the learned concept.
12
Generalization and specialization rules
Learning a concept from examples is based on
generalization and specialization rules.
A generalization rule is a rule that transforms
an expression into a more general expression.
A specialization rule is a rule that transforms
an expression into a less general expression.
The reverse of any generalization rule is a
specialization rule.
13
Discussion
Indicate several generalizations of the following
sentence Students who have lived in Fairfax for
more then 3 years.
Indicate several specializations of the following
sentence Students who have lived in Fairfax for
more then 3 years.
14
Generalization (and specialization) rules
Turning constants into variables
Climbing the generalization hierarchy
Dropping condition
Generalizing numbers
Adding alternatives
15
Turning constants into variables
Generalizes an expression by replacing a constant
with a variable.
The set of multi_group_forces with 5 subgroups.
?O1 is multi_group_force
number_of_subgroups 5
Japan_1944_Armed_Forces
generalization
specialization
Axis_forces_Sicily
5 ? ?N1
?N1? 5
Allied_forces_operation_Husky
?O1 is multi_group_force
number_of_subgroups ?N1
The set of multi_group_forces with any number of
subgroups.
16
The top expression represents the following
concept the set of multi group forces with 5
subgroups. This set contains, for instance,
Axis_forces_Sicily from the Sicily_1943 scenario
(the invasion of Sicily by the Allied Forces in
1943). By replacing 5 with a variable ?N1 that
can take any value, we generalize this concept to
the following one the set of multi group forces
with any number of subgroups. In particular ?N1
could be 5. Therefore the second concept includes
the first one. Conversely, by replacing ?N1 with
5, we specialize the bottom concept to the top
one. The important thing to notice here is that
by a simple syntactic operation (transforming a
number into a variable) we can generalize a
concept. This is one way in which an agent
generalizes concepts.
17
Climbing the generalization hierarchies
Generalizes an expression by replacing a concept
with a more general one.
democratic_government
representative_democracy
parliamentary_democracy
The set of single state forces governed by
representative democracies
?O1 is single_state_force has_as_governing_body
?O2 ?O2 is representative_democracy
generalization
specialization
representative_democracy ? democratic_government
democratic_government ? representative_democracy
The set of single state forces governed by
democracies
?O1 is single_state_force has_as_governing_body
?O2 ?O2 is democratic_government
18
One can also generalize an expression by
replacing a concept from its description with a
more general concept, according to some
generalization hierarchy. The reverse
operation, of replacing a concept with a less
general one, leads to the specialization of an
expression. The agent can also generalize a
concept by dropping a condition. That is, by
dropping a constraint that its instances must
satisfy. This rule is illustrated in the next
slide.
19
Dropping conditions
Generalizes an expression by removing a
constraint from its description.
The set of multi-member forces that have
international legitimacy.
?O1 is multi_member_force has_international_legit
imacy yes
generalization
specialization
?O1 is multi_member_force
The set of multi-member forces (that may or may
not have international legitimacy).
20
Extending intervals
Generalizes an expression by replacing a number
with an interval, or by replacing an interval
with a larger interval.
The set of multi_group_forces with exactly 5
subgroups.
?O1 is multi_group_force
number_of_subgroups 5
generalization
specialization
5 ? 3 .. 7
3 .. 7 ? 5
?O1 is multi_group_force
number_of_subgroups ?N1 ?N1 is-in 3 .. 7
The set of multi_group_forces with at least 3
subgroups and at most 7 subgroups.
generalization
specialization
3 .. 7 ? 2 .. 10
2 .. 10 ? 3 .. 7
?O1 is multi_group_force
number_of_subgroups ?N1 ?N1 is-in 2 ..
10
The set of multi_group_forces with at most 10
subgroups.
21
A concept may also be generalized by replacing a
number with an interval containing it, or by
replacing an interval with a larger interval. The
reverse operations specialize the concept. Yet
another generalization rule, which is illustrated
in the next slide, is to add alternatives. Accordi
ng to the expression from the top of this slide,
?O1 is any alliance. Therefore this expression
represents the following concept the set of all
alliances. This concept can be generalized by
adding another alternative for ?O1, namely the
alternative of being a coalition. Now ?O1 could
be either an alliance or coalition. Consequently,
the expression from the bottom of this slide
represents the following more general concept
the set of all alliances and coalitions.
22
Adding alternatives
Generalizes an expression by replacing a concept
C1 with the union (C1 U C2), which is a more
general concept.
The set of alliances.
?O1 is alliance has_as_member ?O2
generalization
specialization
?O1 is alliance OR coalition
has_as_member ?O2
The set including both the alliances and the
coalitions.
23
Generalization and specialization rules
Turning constants into variables
Turning variables into constants
Climbing the generalization hierarchies
Descending the generalization hierarchies
Dropping conditions
Adding conditions
Extending intervals
Reducing intervals
Adding alternatives
Dropping alternatives
24
Types of generalizations and specializations
Operational definition of generalization/specializ
ation
Generalization/specialization of two concepts
Minimally general generalization of two concepts
Maximally general specialization of two concepts
Least general generalization of two concepts
25
Operational definition of generalization
Why isnt this an operational definition?
This definition is not operational because it
requires to show that each instance I from a
potential infinite set Q is also in the set P.
26
Generalization of two concepts
How would you define this?
Is the above definition operational?
27
Generalization of two concepts example
C1
?O1 IS COURSE-OF-ACTION TOTAL-NUMBER-OF-OFFENSIVE
-ACTIONS 10 TYPE OFFENSIVE
C2
?O1 IS COURSE-OF-ACTION TOTAL-NUMBER-OF-OFFENSIVE
-ACTIONS 5
Generalize 10 to 5 .. 10 Drop ?O1 TYPE
OFFENSIVE
Generalize 5 to 5 .. 10
C
?O1 IS COURSE-OF-ACTION TOTAL-NUMBER-OF-OFFENSIVE
-ACTIONS ?N1 ?N1 IS-IN 5 10
Remark COACourse of Action
28
Specialization of two concepts
29
Other useful definitions
Minimally general generalization
The concept G is a minimally general
generalization of A and B if and only if G is a
generalization of A and B, and G is not more
general than any other generalization of A and B.
Least general generalization
If there is only one minimally general
generalization of two concepts A and B, then this
generalization is called the least general
generalization of A and B.
Maximally general specialization
The concept C is a maximally general
specialization of two concepts A and B if and
only if C is a specialization of A and B and no
other specialization of A and B is more general
than C.
Specialization of a concept with a negative
example
30
Concept learning another illustration
31
Discussion
What could be said about the predictions of a
cautious learner?
32
There are many different generalizations of the
positive examples that do not cover the negative
examples. For instance, a cautious learner might
attempt to learn the most specific
generalization. When such a learner classifies an
instance as a positive example of a concept, this
classification is most likely to be
correct. However, the learner may more easily
make mistakes when classifying an instance as a
negative example (this type of error is called
error of omission because some positive
examples are omitted are classified as negative
examples).
33
Concept learning yet another illustration
34
Discussion
What could be said about the predictions of an
aggressive learner?
35
A more aggressive learner, on the other hand,
might attempt to learn the most general
generalization. When such a learner classifies an
instance as a negative example of a concept, this
classification is most likely to be
correct. However, the learner may more easily
make mistakes when classifying an instance as a
positive example (this type of error is called
error of commission because some negative
examples are committed are classified as
positive examples).
36
Discussion
How could one synergistically integrate a
cautious learner with an aggressive learner to
take advantage of their qualities to compensate
for each others weaknesses?
37
Overview
Instances, concepts and generalization
Concept learning from examples
Version spaces and the candidate elimination
algorithm
The LEX system
The learning bias
Discussion
Recommended reading
38
Basic idea of version space concept learning
Consider the examples E1, , E2 in sequence.
39
The main idea of the version space concept
learning is to combine the relative strengths of
the cautious and aggressive learners, to
compensate for each others weaknesses. It
attempts to always find both the most general
generalization and the least general
generalization. It has been demonstrated that if
there are enough negative and positive examples,
and if a generalization exists, then it will
eventually be found as the convergence of the two
generalizations.
40
The candidate elimination algorithm (Mitchell,
1978)
Let us suppose that we have an example e1 of a
concept to be learned. Then, any sentence of the
representation language which is more general
than this example, is a plausible hypothesis for
the concept.
The version space is
H h h is more general than e1
41
The candidate elimination algorithm (cont.)
As new examples and counterexamples are presented
to the program, candidate concepts are eliminated
from H. This is practically done by updating
the set G (which is the set of the most general
elements in H) and the set S (which is the set of
the most specific elements in H).
42
Version spaces and the candidate elimination
algorithm General presentation This is a concept
learning method based on exhaustive search. It
was developed by Mitchell and his colleagues. Let
us suppose that we have an example e1 of a
concept to be learned. Then, any sentence of the
representation language which is more general
than this example, is a plausible hypothesis for
the concept.
The set H of all the plausible hypotheses for the
concept to be learned, is called the version
space H h h is more general than e1 Let
S be the set containing the example e1, and G be
the set containing the most general description
of the representation language which is more
general than e1 S e1 , G eg
The following figure is an intuitive
representation of the version space H (each
hypothesis being represented as a point in the
network)
H is the set of the concepts covering the example
e1.
Because the more general than relation is a
partial ordering relation, one may represent the
version spaces H by its boundaries H h h
is more general than e1 and h is less general
than eg or H S, G As new examples and
counterexamples are presented to the program,
candidate concepts are eliminated from H. This is
practically done by updating the set G (which is
the set of the most general elements in H) and
the set S (which is the set of the most specific
elements in H)
Thus, the version space H, is the set of all
concept descriptions that are consistent with all
the training instances seen so far. When the set
H contains only one candidate concept, the
desired concept has been found.
43
The candidate elimination algorithm
  • Initialize S to the first positive example and G
    to its most general generalization
  • 2. Accept a new training instance I
  • If I is a positive example then
  • - remove from G all the concepts that do not
    cover I
  • - generalize the elements in S as little as
    possible to cover I but remain less
    general than some concept in G
  • - keep in S the minimally general concepts.
  • If I is a negative example then
  • - remove from S all the concepts that cover I
  • - specialize the elements in G as little as
    possible to uncover I and be more general than
    at least one element from S
  • - keep in G the maximally general concepts.
  • 3. Repeat 2 until GS and they contain a single
    concept C (this is the learned concept)

44
Illustration of the candidate elimination
algorithm
Language of generalizations (shape,
size) shape ball, brick, cube, any-shape size
large, small, any-size
Language of instances (shape, size) shape
ball, brick, cube size large, small
45
Another illustration of the learning
algorithm Let us suppose that the positive and
the negative examples are objects used in the
process of loudspeakers' manufacturing. All that
it is known about these objects (the background
knowledge) is the following generalization
hierarchy
Let us also consider that the concept to be
learned represents that set of objects that could
be used to clean the membrane of a
loudspeaker. The positive examples of this
concept are objects which can be used to clean a
membrane alcohol, acetone, air-press The
negative examples of this concept are objects
which cannot be used to clean a membrane
ventilator, emery-paper The problem is to
determine the concept that covers all the
positive examples (i.e. alcohol, acetone,
air-press) and none of the negative examples
(i.e. ventilator, emery-paper). The version space
method is iterative, that is, it analyzes the
examples one after the other, in the order in
which they are presented. The first example
should be a positive one. Let us suppose that the
examples are presented in the following
order instance1() alcohol instance2()
acetone instance3(-) ventilator instance4()
air-press instance5(-) emery-paper Step 1
instance1() alcohol Any concept which is more
general than this example, is a plausible
hypotheses for the concept to be learned
46
This set H of all the plausible hypotheses, is
the version space. Because this space is
partially ordered, one may represent it by its
boundaries G(upper bound)(something) the
most general generalization of the
example S(lower bound) (alcohol) the
example Step2 instance2() acetone G covers
'acetone', therefore it is not changed There are
two least general generalizations of acetone and
alcohol g(acetone, alcohol) (solvent
inflammable-obj) Therefore new S(solvent
inflammable-obj) Step2 instance3(-)
ventilator no concept from S covers
'ventilator', therefore S remains unchanged. G
covers ventilator, therefore it has to be
specialized. There are four possible
specializations of G s1(air-jet-device)
not acceptable because does not cover any element
of S s2(cleaner) acceptable s3
(inflammable-object) acceptable s4(loudspeake
r-component) not acceptable because does not
cover any element of S Neither of s2 and s3 is
more general than the other, therefore both are
kept in G G(cleaner inflammable-obj) Step2 ins
tance4() air-press removes 'inflammable-object'
from G because it does not cover 'air-press' G
(cleaner) generalize the elements of S so as
to cover the new positive example S g(old-S,
air-press) (g(solvent, air-press),
g(inflammable-obj, air-press)) (soft-cleaner
something) remove 'something' from S because it
is more general than 'soft-cleaner' S
(soft-cleaner) Step2 instance5(-)
emery-paper S does not cover 'emery-paper' and
is not changed G covers 'emery-paper' and has to
be specialized. The only possible specialization
is G (soft-cleaner) Step3 SG
(soft-cleaner) Therefore, the concept that
covers all the positive examples and none of the
negative example is soft-cleaner.
47
The performed specializations and generalizations
are shown in the following figure
48
Overview
Instances, concepts and generalization
Concept learning from examples
Version spaces and the candidate elimination
algorithm
The LEX system
The learning bias
Discussion
Recommended reading
49
The LEX system
Lex is a system that uses the version space
method to learn heuristics for suggesting when
the integration operators should be applied for
solving symbolic integration problems.
The problem of learning control
heuristics Given Operators for symbolic
integration OP1 ? r f(x) dx --gt r ? f(x)
dx OP2 ? u dv --gt uv - ? v du, where uf1(x)
and dvf2(x)dx OP3 1 f(x) --gt f(x) OP4 ?
(f1(x) f2(x))dx --gt ? f1(x) dx ?
f2(x)dx OP5 ? sin(x) dx --gt -cos(x) C OP6 ?
cos(x) dx --gt sin(x) C Find Heuristics for
applying the operators as, for instance, the
following one To solve ? rx transc(x) dx apply
OP2 with urx and dvtransc(x)dx
50
Remarks The integration operators assure a
satisfactory level of competence to the LEX
system. That it, LEX is able in principle to
solve a significant class of symbolic integration
problems. However, in practice, it may not be
able to solve many of these problems because this
would require too many resources of time and
space. The description of an operator shows when
the operator is applicable, while a heuristic
associated with an operator shows when the
operator should be applied, in order to solve a
problem. LEX tries to discover, for each
operator OPi, the definition of the
concept situations in which OPi should be used.
51
The architecture of LEX
4. How to generate a new problem?
PROBLEM
GENERATOR
?
Version space of a proposed heuristic
3x cos(x) dx
?
G f1(x) f2(x) dx --gt Apply OP2
with u f1(x)
dv f2(x) dx
3. How to learn from these steps? How is the
initial VS defined?
1. What search strategy to use for problem
solving?
?
S 3x cos(x) dx --gt Apply OP2
with u 3x
dv cos(x) dx
PROBLEM
LEARNER
SOLVER
?
3x cos(x) dx
OP2 with
...
u 3x,
One of the suggested
dv cos(x) dx
positive training instances
?
3x sin(x) - 3sin(x) dx
?
3x cos(x) dx --gt Apply OP2
...
OP1
with u 3x
dv cos(x) dx
?
3x sin(x) - 3 sin(x) dx
CRITIC
OP5
3x sin(x) 3cos(x) C
2. How to characterize individual problem solving
steps?
52
The problem solver This module uses the operators
and heuristics to solve a given problem as, for
instance, ?3xcos(x)dx. It conducts a
uniform-cost search at each step chooses the one
expansion of the search tree that has the
smallest estimated cost (in terms of time and
space). For each integration problem it has a
time and space limit. If it runs out of these
limits then it gives up. The output is a detailed
trace of the search performed in attempting to
solve the problem. The critic This module
examines the trace to assign credit or blame to
the individual decisions made by the problem
solver. It labels as positive instance every
search step along the minimum-cost solution path.
It labels as negative instance every step that
(a) leads from a node on the minimum-cost path to
a node not on this path and (b) leads to a
solution path whose length is greater than or
equal to 1.15 times the length of the minimum
cost path. For instance, ?3xcos(x)dx --gt
3xsin(x) - ?3sin(x)dx is a positive example for
the application of the operator OP2 with u 3x
and dv cos(x)dx The trace shows also positive
examples for OP1 and OP5 ?3sin(x)dx --gt
3?sin(x)dx positive example for OP1 ?sin(x)dx
--gt cos(x) positive example for OP5 The
learner Learns heuristics from examples by using
the version space method. The problem
generator Inspects the current content of the
knowledge base (i.e. operators and heuristics)
and generates problems to solve that are useful
for learning. Strategies for problem
generation - find an operator for which the
version space is still unrefined and select a
problem that matches only half of the patterns in
S and G - take a solved problem and slightly
modify it, guided by the generalization
hierarchy - if the version spaces for two
operators are overlapping, choose a problem for
which both are considered to be applicable in
order to learn a preference for one of them.
53
(No Transcript)
54
Illustration of the learning process
Continue learning of the heuristic for applying
OP2
The problem generator generates a new problem to
solve that is useful for learning. The problem
solver Solves this problem The critic Extract
positive and negative examples from the problem
solving tree. The learner Refine the version
space of the heuristic.
55
Illustration of the learning process The initial
positive example is ?3x cos(x)dx --gt Apply OP2
with u 3x and dv cos(x)dx The initial version
space of this heuristic was shown in the
architecture figure. Notice that S is the
training instance, and G is the most general
pattern for which OP2 is legal. Similarly,
initial version spaces are defined for OP1 and
OP5. Let us suppose that the next problem
generated by the problem generator is ?5x
sin(x)dx. The problem solving tree built by the
problem solver is the following one
This tree shows a positive and a negative example
for OP2 ?5x sin(x)dx --gt Apply OP2 with u
5x and dv sin(x)dx positive example ?5x
sin(x)dx --gt Apply OP2 with u sin(x) and dv
5xdx negative example Consequently, the version
space for OP2 is modified as indicated in the
following figure
56
?
?
f1(x) f2(x) dx --gt Apply OP2
5xsin(x) dx --gt Apply OP2
with uf1(x), vf2(x)dx
with usin(x), v5xdx
G
?
?
f1(x) transc(x) dx --gt Apply OP2
poly(x) f2(x) dx --gt Apply OP2
with upoly(x), vf2(x)dx
with uf1(x), vtransc(x)dx
S
?
kx trig(x) dx --gt Apply OP2
with ukx, vtrig(x)dx
?
?
5xsin(x) dx --gt Apply OP2
3x cos(x) dx --gt Apply OP2
with u3x, vcos(x)dx
with u5x, vsin(x)dx
With a few more training instances, the heuristic
for OP2 converges to the form ?f1(x)
transc(x)dx --gt Apply OP2 with u f1(x) and dv
transc(x)dx
57
Overview
Instances, concepts and generalization
Concept learning from examples
Version spaces and the candidate elimination
algorithm
The LEX system
The learning bias
Discussion
Recommended reading
58
The learning bias
A bias is any basis for choosing one
generalization over another, other than strict
consistency with the observed training examples.
Types of bias - restricted hypothesis space
bias - preference bias.
59
Restricted hypothesis space bias
The hypothesis space H (i.e. the space containing
all the possible concept descriptions) is defined
by the generalization language. This language may
not be capable of expressing all possible classes
of instances. Consequently, the hypothesis space
in which the concept description is searched is
restricted.
Some of the restricted spaces investigated - logi
cal conjunctions (i.e. the learning system will
look for a concept description in the form of a
conjunction) - linear threshold functions (for
exemplar-based representations) - three-layer
neural networks with a fixed number of hidden
units.
60
Restricted hypothesis space bias example
The language of instances consists of triples of
bits as, for example (0, 1, 1), (1, 0, 1). How
many concepts are in this space?
The total number of subsets of instances is 28
256.
The language of generalizations consists of
triples of 0, 1, and , where means any bit,
for example (0, , 1), (, 0, 1).
How many concepts could be represented in this
language?
This hypothesis space consists of 3x3x3 27
elements.
61
Preference bias
A preference bias places a preference ordering
over the hypotheses in the hypothesis space H.
The learning algorithm can then choose the most
preferred hypothesis f in H that is consistent
with the training examples, and produce this
hypothesis as its output.
Most preference biases attempt to minimize some
measure of syntactic complexity of the hypothesis
representation (e.g. shortest logical expression,
smallest decision tree). These are variants of
Occam's Razor, which is the bias first defined by
William of Occam (1300-1349) Given two
explanations of data, all other things being
equal, the simpler explanation is preferable.
62
Preference bias representation
How could the preference bias be represented?
In general, the preference bias may be
implemented as an order relationship 'better(f1,
f2)' over the hypothesis space H. Then, the
system will choose the "best" hypothesis f,
according to the "better" relationship. An
example of such a relationship "less-general-than
" which produces the least general expression
consistent with the data.
63
Overview
Instances, concepts and generalization
Concept learning from examples
Version spaces and the candidate elimination
algorithm
The LEX system
The learning bias
Discussion
Recommended reading
64
Problem
Language of instances An instance is defined by
triplet of the form (specific-color,
specific-shape, specific-size) Language of
generalization (color-concept, shape-concept,
size-concept) Set of examples
color shape size class orange square large i1 bl
ue ellipse small - i2 red triangle small i3 gree
n rectangle small - i4 yellow circle large i5
Background knowledge
Task
Apply the candidate elimination algorithm to
learn the concept represented by the above
examples.
65
Solution i1 (color orange) (shape
square) (size large) S (color orange)
(shape square) (size large) G (color
any-color) (shape any-shape) (size
any-size) -i2 (color blue) (shape
ellipse) (size small) S (color orange)
(shape square) (size large) G (color
warm-color) (shape any-shape) (size
any-size), (color any-color) (shape
polygon) (size any-size), (color
any-color) (shape any-shape) (size
large) i3 (color red) (shape triangle)
(size small) S (color warm-color)
(shape polygon) (size any-size) G
(color warm-color) (shape any-shape)
(any-size), (color any-color) (shape
polygon) (size any-size) -i4 (color
green) (shape rectangle) (size small) S
(color warm-color) (shape polygon)
(size any-size) G (color warm-color)
(shape any-shape) (size any-size) i5
(color yellow) (shape circle) (size
large) S (color warm-color) (shape
any-shape) (size any-size) G (color
warm-color) (shape any-shape) (size
any-size) The concept is (color warm-color)
(shape any-shape) (size any-size) a
warm color object
66
Does the order of the examples count? Why and
how? Consider the following order
color shape size class orange square large i1 re
d triangle small i3 yellow circle large i5
blue ellipse small - i2 green rectangle small - i4
67
Discussion
What happens if there are not enough examples for
S and G to become identical?
Could we still learn something useful?
How could we classify a new instance?
When could we be sure that the classification is
the same as the one made if the concept were
completely learned?
Could we be sure that the classification is
correct?
68
What happens if there are not enough examples for
S and G to become identical? Let us assume that
one learns only from the first 3 examples
color shape size class orange square large i1 bl
ue ellipse small - i2 red triangle small i3
The final version space will be
S (color warm-color) (shape polygon)
(size any-size) G (color warm-color)
(shape any-shape) (any-size), (color
any-color) (shape polygon) (size
any-size)
69
Assume that the final version space is
G (color warm-color) (shape any-shape)
(any-size), (color any-color) (shape
polygon) (size any-size) S (color
warm-color) (shape polygon) (size
any-size)
How could we classify the following examples, how
certain we are about the classification, and why?
color shape size class blue circle large orange s
quare small red ellipse large blue polygon small
_

dont know
dont know
70
Discussion
Could the examples contain errors? What kind of
errors could be found in an example? What will
be the result of the learning algorithm if there
are errors in examples? What could we do if we
know that there are errors?
71
Discussion
Could the examples contain errors? What kind of
errors could be found in an example?
- Classification errors - positive examples
labeled as negative - negative examples labeled
as positive
- Measurement errors - errors in the values of
the attributes
72
What will be the result of the learning algorithm
if there are errors in examples? Let us assume
that the 4th example is incorrectly classified
color shape size class orange square large i1 bl
ue ellipse small - i2 red triangle small i3 gree
n rectangle small i4 (incorrect
classification) yellow circle large i5
The version space after the first three examples
is
S (color warm-color) (shape polygon)
(size any-size) G (color warm-color)
(shape any-shape) (any-size), (color
any-color) (shape polygon) (size
any-size)
Continue learning
73
What could we do if we know that there might be
errors in the examples?
If we cannot find a concept consistent with all
the training examples, then we may try to find a
concept that is consistent with all but one of
the examples.
If this fails, then we may try to find a concept
that is consistent with all but two of the
examples, an so on.
What is a problem with this approach?
Combinatorial explosion.
74
What happens if we extend the generalization
language to include conjunction, disjunction and
negation of examples?
Set of examples
color shape size class orange square large i1 bl
ue ellipse small - i2 red triangle small i3 gree
n rectangle small - i4 yellow circle large i5
Background knowledge
Task
Learn the concept represented by the above
examples by applying the Versions Space method.
75
Set of examples
color shape size class orange square large i1 bl
ue ellipse small - i2 red triangle small i3 gree
n rectangle small - i4 yellow circle large i5
G all the examples S i1
These are the minimal generalizations and
specializations
G i2 all the examples except i2 S i1

G i2 S i1 or i3
G i2 or i4 all examples except i2 and
i4 S i1 or i3
G i2 or i4 all examples except i2 and
i4 S i1 or i3 or i5
76
The futility of bias-free learning
A learner that makes no a priori assumptions
regarding the identity of the target concept has
no rational basis for classifying any unseen
instance.
77
What happens if we extend the generalization
language to include internal disjunction? Does
the algorithm still generalizes over the observed
data?
Generalization(i1, i3) (orange or red, square or
triangle, large or small) Is it different from
i1 or i3?
Set of examples
color shape size class orange square large i1 bl
ue ellipse small - i2 red triangle small i3 gree
n rectangle small - i4 yellow circle large i5
Background knowledge
Task
Learn the concept represented by the above
examples by applying the Versions Space method.
78
How is the generalization language extended by
the internal disjunction? Consider the following
generalization hierarchy
any
-
shape
polygon
rectangle
circle
triangle
79
How is the generalization language extended by
the internal disjunction?
any-shape
polygon
rectangle
triangle
circle
The above hierarchy is replaced with the
following one
polygon or circle
polygon
triangle or rectangle or circle
triangle or rectangle
rectangle or circle
triangle or circle
triangle
rectangle
circle
80
Consider now the following generalization
hierarchy
any-color
cold-color
warm-color
orange
black
green
red
yellow
blue
Which is the corresponding hierarchy containing
disjunctions?
81
Could you think of another approach to learning a
disjunctive concept with the candidate
elimination algorithm?
Find a concept1 that is consistent with some of
the positive examples and none of the negative
examples. Remove the covered positive examples
from the training set and repeat the procedure
for the rest of examples, computing another
concept2 that covers some positive examples, and
so on, until there is no positive example
left. The learned concept is concept1 or
concept2 or
Could you specify this algorithm better?
Hint Initialize S with the first positive
example,
82
Exercise
Consider the following Instance language color
red, orange, yellow, blue, green,
black Generalization language color red,
orange, yellow, blue, green, black, warm-color,
cold-color, any-color sequence of positive and
negative examples of a concept, and the
background knowledge represented by the following
hierarchy
Apply the candidate elimination algorithm to
learn the concept represented by the above
examples.
83
Features of the version space method
  • In its original form learns only conjunctive
    descriptions.
  • However, it could be applied successively to
    learn disjunctive descriptions.
  • Requires an exhaustive set of examples.
  • Conducts an exhaustive bi-directional
    breadth-first search.
  • The sets S and G can be very large for complex
    problems.
  • It is very important from a theoretical point of
    view, clarifying the process of inductive concept
    learning from examples.
  • Has very limited practical applicability because
    of the combinatorial explosion of the S and G
    sets.
  • It is at the basis of the powerful Disciple
    multistrategy learning method which has practical
    applications.

84
Recommended reading
Mitchell T.M., Machine Learning, Chapter 2
Concept learning and the general to specific
ordering, pp. 20-51, McGraw Hill,
1997. Mitchell, T.M., Utgoff P.E., Banerji R.,
Learning by Experimentation Acquiring and
Refining Problem-Solving Heuristics, in Readings
in Machine Learning. Tecuci, G., Building
Intelligent Agents, Chapter 3 Knowledge
representation and reasoning, pp. 31-75, Academic
Press, 1998. Barr A. and Feigenbaum E. (Eds.),
The Handbook of Artificial Intelligence, vol III,
pp.385-400, pp.484-493.
Write a Comment
User Comments (0)
About PowerShow.com