Title: Ch 2' Concept Learning And General ToSpecific Ordering
1Ch 2. Concept Learning And General To-Specific
Ordering
2Objects
- Concept Learning and Terminology
- General-To-Specific Ordering
- Version Space
- Find-S
- Candidate-Eliminate
- Inductive Bias
3Concept Learning
- Inferring a boolean-valued function from training
examples of its input and output - Classification
- Acquiring the definition of a general category
given sample of positive and negative training
examples of the category. - Inductive learning
- Not deductive learning
4Terminology
- Instance set of attributes
- Instance Space X a set of all distinct
instances - Hypothesis Space H a set of all distinct
hypotheses h - h X-gt0,1
- A set of training examples D ? ltx,ygtx ? X, y ?
0, 1 - Target concept c
- c X-gt0, 1
5Example
- Hypothesis representation
- A conjunction of constraints over the instance
attributes - ?, single value, ø
- ltSunny, Warm, Normal, ?, ?, ?gt, lt?, Warm, ?, ?,
?, ?gt - lt?, ?, ?, ?, ?, ?gt, ltø, ø, ø, ø, ø, øgt
- ltø, ?, ?, ?, ?, ?gt
6Example (Contd)
- X 3x2x2x2x2x2 96 distinct instances
- H
- 5x4x4x4x4x4 5120 syntactically distinct
hypotheses - 4x3x3x3x3x3 1 973 semantically distinct
hypotheses - D
- Positive examples x1, x2, x4
- Negative examples x3
7Assumption
- Learning Find a h ? H such that h(x) c(x) for
?x?X - Inductive Learning assumption
- If we find h which h(x) c(x) for all given
examples, this h will correct for unseen data.
8Concept Learning as Search
- Concept learning can be viewed as a task of
searching the best hypothesis in H - The best hypothesis is the one that best fits the
examples - e.g) Search among 4x3x3x3x3x3 1 973
hypotheses
9General-to-Specific Ordering
- Let hj, hk ? Hhj hk iff ?x ? X, hk(x) 1 then
hj(x) 1 - figure 2.1(p. 25) ??
- Hj gt hk iff (hj hk) ? (hk hj)
- relation is partial order (p. 24 ??)
The relation is important because it provides
a useful structure over the hypothesis space H
for any concept learning problem
10Find-S
h ? the most specific hypothesis in H for each
positive training example x for each
attribute constraint a1 in h if the constraint
a1 is not satisfied by x replace a1 in h by
the next more general output h
- Example at figure 2.2 (p. 27)
11Remarks on Find-S
- Find-S is guaranteed to output the most specific
hypothesis within H that is consistent with the
positive training examples. - If the target concept is in H, and there is no
error in the training examples, the algorithm
find a hypothesis that is also consistent with
the negative training exmples - Why the most specific hypothesis?
- What if there are several or no maximally
specific hypotheses? - What if there is an error in the training
examples? - Has the learner converged to the correct target
concept?
12Version Space
- Consistent A hypothesis h is consistent with a
set of training examples D if and only if h(x)
c(x) for each example ltx, c(x)gt in D. - Consistent(h, D) (?ltx, c(x)gt ? D) h(x)
c(x) - Version Space The version space, denoted VSH,D
with respect to hypothesis space H and training
examples D, is the subset of hypotheses from H
consistent with training examples in D - VSH,D h? H consistent(h, D)
13List-Then-Eliminate Algorithm
- VS ? H
- for each training example, ltx, c(x)gt
- for each h in VS
- if (h(x) ?c(x))
- VS VS h
- output VS
- The List-Then-Eliminate algorithm will output the
set of hypotheses that consistent with the D - Weakness requires exhaustively enumerating all
hypotheses in H
14Specific and General Boundary
- The general boundary G, with respect to
hypothesis space H and training data D, is the
set of maximally general members of H consistent
with D. - G g ? H consistent(g, D)?(?g ?H)(ggtg)
?consistent(g, D) - The specific boundary S, with respect to
hypothesis space H and training data D, is the
set of minimally general (i.e. maximally
specific) members of H consistent with D. - S s ? H consistent(s, D)?(?s ?H)(sgts)
?consistent(s, D)
15Version space representationtheorem
- p. 32 ??
- VS h?H(?s?S)(?g?G)(ghs)
- Proof
16Intuition on VS representationtheorem
- Any hypothesis more general than S will cover any
past positive examples. - Any hypothesis more specific than G will not
cover any past negative examples.
17Candidate-EliminationAlgorithm
- S ? set of maximally specific hypotheses in H
- G ? set of maximally general hypotheses in H
- For each training example d ? D, do
- if d is positive
- G ? G g ? G C(g, d) ( C consistent )
- for each s ? S C(s,d)
- S ? S s
- S ? S h ? H(hgtmin s) ?C(h,d) ??g?G(ggth)
- S ? S si?S?sj?S(si gt sj)
- if d is positive
- S ? S s ? S C(s, d)
- for each g ? G C(g,d)
- G ? G g
- G ? G h ? H(ggtmin h) ?C(h,d) ??s?S(hgts)
- G ? G gi?G?gj?G (gj gt gi)
- Examples p. 3436
-
lt- Why?
lt- Why?
lt- Why?
lt- Why?
18Apply to Example
ltSunny, Warm, ?, Strong, ?, ?gt
S
ltSunny, ?, ?, Strong, ?, ?gt
ltSunny, ?, ?, Strong, ?, ?gt
ltSunny, ?, ?, Strong, ?, ?gt
ltSunny, ?, ?, ?, ?, ?gt, lt?, Warm, ?, ?, ?, ?gt
G
19New Example
- H y gt ax 0 lt x, y lt 1, a gt 0
- D
(1,1)
1
2
1
3-
4-
1
S, G, VS?
20Remarks on C-E
- C-E algorithm will converge the hypothesis that
correctly describes the target concept when - 1) there are no errors in the training examples
- 2) there is some hypothesis in H that correctly
describes the target concept - Positive training examples force S to become more
general. - Negative training examples force G to become more
specific.
21What Next?
- What training example to request next for active
learners? - Ask for an example that satisfy exactly half the
hypotheses in the current version space.
22Expressiveness of Hypothesis Space
23Unbiased Learner
- X 3x2x2x2x2x2 96
- Disjunctions of conjunctions
- H 296
- e.g) ltSunny, ?,?,?,?,?gt V ltCloudy,?,?,?,?,?gt
- S (x1?x2?x3), G (x4?x5)
- Not learning i.e., Memorizing
- Voting is futile
- A learner that makes no a priori assumptions
regarding the identity of the target concept has
no rational basis for classifying any unseen
instances.
24Inductive Bias
- L Learning Algorithm
- X instance space
- c an arbitrary concept defined over X
- Dc ltx, c(x)gt. Training example
- L(xi, Dc) the classification that L assigns to
xi after learning from the training data Dc - The inductive bias of L is any minimal set of
assertions B such that for any target concept c
and corresponding training examples Dc - (?xi?X)(B?Dc?xi) L(xi,Dc)