Title: IT/CS 811 Principles of
1IT/CS 811 Principles of Machine Learning and
Inference
Exercises
Prof. Gheorghe Tecuci
Learning Agents Laboratory Computer Science
Department George Mason University
2Overview
Your exercises
Some general questions and exercises
Sample questions on version space learning
Sample questions on decision tree learning
Sample questions on other learning strategies
3Version Spaces
Select the correct answers and justify your
solution
- The version space for a set of examples given
incrementally (for which there is a concept
covering the positive examples and not covering
the negative examples) will decrease (i.e. will
contains strictly fewer concepts) when - Always when a negative example is given
- Always when a positive example is given
- Always when a positive example is not covered by
any concept from the lower bound - Always when a negative example is covered by all
the concepts from the upper bound
Mihai Boicu
4Explanation-based learning
- Given
- The axioms of plane Euclidian geometry
- Several problem solving examples consisting of
geometry problems with their axiomatic solutions - Questions
- What will Explanation-Based Learning generate
from one of the examples? - Are the learned theorems useful and generally
applicable? - How could one learn useful theorems from these
examples?
Cristina Boicu
5Decision-tree learning
- Give an example of a training set on which ID3
does not generate the smallest possible decision
tree. Show the result of applying ID3 and also
show a smaller tree.Hint The information gain
of an attribute is 0 if the ratio pi/(pini) is
the same for all i otherwise the information
gain is strictly positive. - How would you extend the ID3 algorithm to learn
from examples belonging to more than two
classes?Which is the formula for computing the
information gain of an attribute.
Bogdan Stanescu
6Decision-tree learning
Give a counter example to the heuristic used by
the ID3 algorithm for picking the attributes.
Gabriel Balan
7Decision-tree learning
Training examples for the target concept
PlayTennis
A decision tree for the concept PlayTennis
Yan Sun
(continues)
8Answer the following questions true or false, and
explain the answer 1. Is it possible to get
ID3 to further elaborate the tree below the
rightmost leaf (and make no other changes to the
tree), by adding a single new correct training
example to the original fourteen examples? 2. Is
it possible to get ID3 to learn an incorrect tree
(i.e., a tree that is not equivalent to the
target concept) by adding new correct training
examples to the original fourteen ones? 3. Is it
possible to produce some set of correct training
examples that will get ID3 to include the
attribute Temperature in the learned tree, even
though the true target concept is independent of
Temperature?
9Suppose we want to classify whether a given
balloon is inflated based on four attributes
color, size, the act of the person holding the
balloon, and the age of the person holding the
balloon. Show the decision tree that ID3 would
build to learn this classification. Display the
information gain for each candidate attribute at
the root of the tree.
Color Size Act Age Inflated?
Yellow Small Stretch Adult F
Yellow Small Stretch Child T
Yellow Small Dip Adult T
Yellow Small Dip Child T
Yellow Small Dip Child F
Yellow Large Stretch Adult T
Yellow Large Stretch Child T
Yellow Large Dip Adult T
Yellow Large Dip Child F
Yellow Large Dip Child F
Purple Small Stretch Adult T
Purple Small Stretch Child T
Purple Small Dip Adult T
Purple Small Dip Child F
Purple Small Dip Child F
Purple Large Stretch Adult T
Purple Large Stretch Child T
Purple Large Dip Adult T
Purple Large Dip Child F
Purple Large Dip Child F
Discussion In this problem, there are
situations where the information gain for each
attribute is the same, we cannot decide which
attribute to choose. Are there any methods for
these situations ?
Xianjun Hao
10Imagine the following attributes related to
weather  a. "wind degree" - windy, calm b. "sun
degree"Â - sunny, cloudy c. "rain degree" -
raining, not-raining  There are 23 8
possible "weathers" described by these
attributes. Ascribe or - to each of the
combination in such a way, that in every decision
tree the depth of each branch must be equal to
number of attributes (3). How many of such trees
exist? How many such trees exist for n
attributes? Why?
Zbigniew Skolicki
11Consider the following data
Height (inches) Hair Color Eye Color Class
61 Brown Brown 1
63 Brown Brown 1
69 Brown Brown 1
74 Brown Brown 1
67 Brown Blue 0
63 Blonde Blue 0
71 Blonde Brown 0
73 Blonde Blue 0
Just looking at the table, what concept do you
think defines class 1? Use the ID3 algorithm
taught in class to build a decision
tree. (Helpful hints The entropy of a set whose
members all have the same value for the attribute
in question is 0. The entropy of a set which has
exactly equal numbers of each value for the
attribute in question is 1.)
Charles Day
(continues)
12Write out the concept represented by this
tree. Does this rule match your intuitive sense
of the concept represented by the data? Are you
happy with the concept learned using the decision
tree? Why? Do you think this decision tree
would do well in classifying other instances of
the concept represented by the data? What can
you say about attributes with a lot of
values? Another method for choosing attributes
to split a node uses gain ratio. Gain ratio is
defined as Gain/Split Information where the term
Split Information is defined as
13In ID3, when an attribute has continuous values,
one approach for handling the attribute is to
categorize the value into discrete set of bins.
Sometimes, an attribute may have a large set of
finite discrete value that may not render itself
to discrete set of bins. For example, an
attribute like retail store name. Each example
may have a different value for the attribute.
How should a decision tree algorithm deal with
such a situation? Decision tree has often been
applied in data mining applications. A marketing
company may use consumer data to target a
specific group of people earning certain amount
of income or higher. Below is a set of
attributes and associated possible values. What
attributes should be used to create a decision
tree that will predict a persons salary being
above 50K? Remember there are some attributes
containing continuous values and some containing
a larges set of nominal values.
Simon Liu
(continues)
14(No Transcript)
15Overview
Your exercises
Some general questions and exercises
Sample questions on version space learning
Sample questions on decision tree learning
Sample questions on other learning strategies
16Questions
What is an instance? What is a concept? What is a
positive example of a concept? What is a negative
example of a concept? Give an intuitive
definition of generalization. What does it mean
for concept A to be more general than concept
B? Indicate a simple way to prove that a concept
is not more general than another concept. Given
two concepts C1 and C2, from a generalization
point of view, what are all the different
possible relations between them?
17What is a generalization rule? What is a
specialization rule? What is a reformulation
rule? Name all the generalization rules you
know. Briefly describe and illustrate with an
example the turning constants into variables
generalization rule. Define and illustrate the
dropping conditions generalization rule.
18Questions
Indicate various generalizations of the following
sentence A student who has lived in Fairfax for
3 years.
What could be said about the predictions of a
cautious learner?
What could be said about the predictions of an
aggressive learner?
How could one synergistically integrate a
cautious learner with an aggressive learner to
take advantage of their qualities to compensate
for each others weaknesses?
19Questions
What is the learning bias? Which are the
different types of bias?
20Exercise
Consider the background knowledge represented by
the following generalization hierarchies and
theorem
"x"y (ON x y) gt (NEAR x y) Show that E1 is
more general than E2 E1 (COLOR x
warm-color) (SHAPE x round) (COLOR y red)
(SHAPE y polygon) (NEAR x y) E2 (COLOR
u yellow) (SHAPE u circle) (COLOR v red)
(SHAPE v triangle) (ON u v) (ISA u toy)
(ISA v toy)
21Consider the background knowledge represented by
the following generalization hierarchies and
theorem
"x"y (ON x y) gt (NEAR x y) Consider also the
following concept E (COLOR u yellow) (SHAPE
u circle) (COLOR v red) (SHAPE v triangle)
(ON u v) (ISA u toy) (ISA v toy) (HEIGHT u
5) Indicate six different generalization rules.
For each such rule determine an expression Eg
which is more general than E according that that
rule.
22Consider the following two concepts
Indicate different generalization of them.
23- Define the following
- a generalization of two concepts
- a minimally general generalization of two
concepts - the least general generalization of two concepts
- the maximally general specialization of two
concepts.
24Consider the following concepts
and the following generalization hierarchies
Indicate four specializations of G1 and G2
(including two maximally general specializations).
25Overview
Your exercises
Some general questions and exercises
Sample questions on version space learning
Sample questions on decision tree learning
Sample questions on other learning strategies
26Version Space questions
What happens if there are not enough examples for
S and G to become identical? Could we still
learn something useful? How could we classify a
new instance? When could we be sure that the
classification is the same as the one made if the
concept were completely learned? Could we be
sure that the classification is correct?
27Version Space questions
Could the examples contain errors? What kind of
errors could be found in an example? What will
be the result of the learning algorithm if there
are errors in examples? What could we do if we
know that there is at most one example wrong?
28Overview
Your exercises
Some general questions and exercises
Sample questions on version space learning
Sample questions on decision tree learning
Sample questions on other learning strategies
29Questions
What induction hypothesis is made in decision
tree learning?
What are some reasons for transforming a decision
tree into a set if rules?
How to change the ID3 algorithm to deal with
noise in the examples?
What is overfitting and how could it be avoided?
Compare tree pruning with rule post pruning.
How could one use continuous attributes with
decision tree learning?
How to deal with missing attribute values?
30Questions
Compare the candidate elimination algorithm with
the decision tree algorithm, from the point of
view of the generalization language, the bias,
the search strategy and the use of the examples.
What problems are appropriate for decision tree
learning?
Which are the main features of decision tree
learning?
31Overview
Your exercises
Some general questions and exercises
Sample questions on version space learning
Sample questions on decision tree learning
Sample questions on other learning strategies
32Questions
Questions are in the lecture notes corresponding
to each learning strategy.