Title: Introduction to Machine Learning and Knowledge Representation
1Introduction to Machine Learning and Knowledge
Representation
- Florian Gyarfas
- COMP 790-072 (Robotics)
2Outline
- Introduction, definitions
- Common knowledge representations
- Types of learning
- Deductive Learning (rules of inference)
- Explanation-based learning
- Inductive Learning some approaches
- Concept Learning
- Decision-Tree Learning
- Clustering
- Summary, references
3Introduction
- What is Knowledge Representation?
- Formalisms that represent knowledge (facts about
the worlds) and mechanisms to manipulate such
knowledge (for example, derive new facts from
existing knowledge) - What is Machine Learning?
- Mitchell A computer program is said to learn
from experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience E. - Example T playing checkers P percent of
games won against opponents E playing practice
games against itself
4Common Knowledge Representations
- Logic
- propositional
- predicate
- other
- Structured Knowledge Representations
- Frames
- Semantic nets
-
5Propositional Logic
- Consists of
- Constants true/false
- Set of elements called symbols, variables or
atomic formulas (typically letters a,b,c,) - Operators (?, ?, ?, ?, ?)
- Axioms
- Examples
- a ? b ? a
- a ? ?a ? true
- Inference rule(s)
- Modus ponens
6Predicate logic
- In many cases, propositional logic is too weak.
- For example, how can we express something like
this in propositional logic - Every person is mortal.Tom is a person.Tom is
mortal. - How can we represent these sentences in such a
way that we can infer the third sentence from the
first two? - Need quantifiers and predicates
- ?x Person(x) ? Mortal(x)Person(Tom)
- We can infer Mortal(Tom)
- In addition to propositional logic, predicate
logic has - Functions
- Predicates
- Quantifiers (?,?)
- More axioms
- One more inference rule (Generalization)
7Logic Inference Rules
- Used for deductive learning
- Propositional Logic Modus Ponens is all you need
- If P, then Q.PTherefore, Q.
- Meta-rule, not the same as axioms
- Predicate Logic Modus Ponens and Generalization
Rule
8Structured Knowledge Representations
- Semantic nets
- Really just graphs that represent knowledge
- Nodes represent concepts
- Arcs represent binary relationships between
concepts - Frames
- Extension of semantic nets
- Entities have attributes
- Class/subclass hierarchy that supports
inheritance classes inherit attributes from
superclasses
9Semantic Nets
- Example (E. Rich, Artificial Intelligence)
Furniture
is-a
is-part
Chair
Person
Seat
is-a
is-a
owner
color
My-chair
Me
Tan
is-a
covering
Leather
Brown
10Types of Learning
- Deductive Inductive
- Supervised Unsupervised
- Symbolic Non-symbolic
11Deductive vs. inductive
- Deductive learning
- Knowledge is deduced from existing knowledge by
means of truth-preserving transformations (this
is nothing more than reformulation of existing
knowledge). - If the premises are true, the conclusion must be
true! - Example propositional/predicate logic with rules
of inference
12Deductive vs. inductive
- Inductive learning
- Generalization from examples
- Example All observed crows are black ? All
crows are black - process of reasoning in which the premises of an
argument are believed to support the conclusion
but do not ensure it
13Unsupervised vs. Supervised
- Supervised Learning
- There exists a teacher that for each training
example tells the learner how it is classified
(training data consists of pairs of input vectors
and desired outputs) - Reinforcement Learning
- No input/output pairs reward function tells
agent how good its action was - Unsupervised Learning
- No a priori output also no reward training data
just feature vectors the system needs to form
concepts (classes) by itself
14Learning approaches
- Deductive/Analytical
- Explanation-based learning
- Inductive
- Supervised
- Concept Learning
- Decision-Tree Learning
- Neural networks
- Naive Bayes classifier
- Support Vector Machines
-
- Unsupervised
- Clustering
- Neural networks
- Expectation-Maximization
-
15Explanation-based learning (EBL)
- Deductive
- assumes prior knowledge (domain theory) in
addition to training examples - assumes domain theory is given as a set of horn
clauses - Horn clause Disjunction of literals with at most
one positive literal - Example NOT a OR NOT b OR NOT c OR d which is
the same as (a AND b AND c) ? d - Tries to explain training examples using the
domain theory
16EBL example
- Consider multiple physical objects
- Which are the pairs of objects such that one can
be stacked safely on the other? - Target concept
- premise(x,y) ? SafeToStack(x,y)
- where premise is a conjunctive expression
containing the variables x and y. - Domain Theory
- SafeToStack(x,y) ? NOT Fragile(y)
- SafeToStack(x,y) ? Lighter(x,y)
- Lighter(x,y) ? Weight(x,wx) AND Weight(y,wy) AND
LessThan(wx,wy) - Weight(x,w) ? Volume(x,v) AND Density(x,d) AND
Equal(w,times(v,d)) - Weight(x,5) ? Type(x,Table)
- Fragile(x) ? Material(x,Glass)
17EBL example (2)
- Training example
- On(Obj1,Obj2)
- Type(Obj1,Box)
- Type(Obj2,Table)
- Color(Obj1,Red)
- Color(Obj2,Blue)
- Volume(Obj1,2)
- Density(Obj1,0.3)
- Material(Obj1,Cardboard)
- Material(Obj2,Wood)
- SafeToStack(Obj1,Obj2)
18EBL example (3)
SafeToStack(Obj1,Obj2)
Lighter(Obj1,Obj2)
Weight(Obj1,Obj2)
Weight(Obj2,5)
Density(Obj1,0.3)
Type(Obj2,Table)
Volume(Obj1,2)
Equal(0.6,20.3)
LessThan(0.6,5)
19EBL algorithm
- Explain training example
- Analyze/Generalize Explanation
- Add Explanation to Learned Rules (Domain Theory)
- Use for example REGRESS algorithm (Mitchell, p.
318) for step (2) - In our example most general rule that can be
justified by the explanation is - SafeToStack(x,y) ? Volume(x,vx) AND Density(x,dx)
AND Equal(wx,times(vx,dx)) AND LessThan(wx,5) AND
Type(y,Table)
20EBL - Remarks
- Knowledge Reformulation EBL just restates what
the learner already knows - You dont really gain new knowledge!
- Why do we need it then? In principle, we can
compute everything we need using just the domain
theory - In practice, however, this might not work.
Consider chess Does knowing all the rules make
you a perfect player? - So EBL reformulates existing knowledge into a
more operational form which might be much more
effective especially under certain constraints
21Concept Learning
- Inductive
- learn general concept definition from specific
training examples - search through predefined space of potential
hypotheses for target concept - pick the one that best fits training examples
22Concept Learning Example
- Taken from Tom Mitchells book Machine Learning
- Concept to learn Days on which Tom enjoys his
favorite water sport - Training examples D (every row is an instance,
every column an attribute)
Sky AirTemp Humidity Wind Water Forecast EnjoySport (Classification)
1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes
23Concept Learning
- X set of all instances (instance combination
of attributes), D set of all training examples
(D ? X) - The target concept is a function (target
function) c(x) that for a given instance x is
either 0 or 1 (in our example 0 if EnjoySport
No, 1 if EnjoySport Yes) - We do not know the target concept, but we can
come up with hypotheses. We would like to find a
hypothesis h(x) such that h(x) c(x) at least
for all x in D (D training examples) - Hypotheses representation
- Let us assume that the target concept is
expressed as a conjunction of constraints on the
instance attributes. Then we can write a
hypothesis like this AirTemp Cold ? Humidity
High. Or, in short lt?,Cold,High,?,?,?gt where ?
means any value is acceptable for this attribute.
We use ? to indicate that no value is
acceptable for an attribute.
24Concept Learning
- Most general hypothesis
- lt?,?,?,?,?,?gt
- Most specific hypothesis
- lt?, ?, ?, ?, ?, ? gt
- General-to-specific ordering of hypotheses g
(more general than or equal to) defines partial
order over hypothesis space H - FIND-S algorithm
- finds the most specific hypothesis
- For our example ltSunny,Warm,?,Strong,?,?gt
25Concept Learning (FIND-S)
- FIND-S algorithm
- Initialize h to the most specific hypothesis in H
- For each positive training instance x
- For each atttribute constraint ai in h If the
constraint ai is satisfied by x Then do
nothing Else replace ai in h by the next more
general constraint that is satisfied by x - Output hypothesis h
- Why can we ignore negative examples?
26Concept Learning
- FIND-S only computes the most specific hypothesis
- Another approach to concept learning
CANDIDATE-ELIMINATION - CANDIDATE-ELIMINATION finds all hypotheses in the
the version space - The version space, denoted VSH,D, with respect to
hypothesis space H and training examples D, is a
subset of hypotheses from H consistent with the
training examples in D. - VSH,D h ? HConsistent(h,D)
27Concept Learning (Version space)
- Version space for our example
ltSunny,Warm,?,Strong,?,?gt
ltSunny,?,?,Strong,?,?gt
ltSunny,Warm,?,?,?,?gt
lt?,Warm,?,Strong,?,?gt
ltSunny,?,?,?,?,?gt, lt?,Warm,?,?,?,?gt
28Concept LearningCandiate-Elimination algorithm
- Initialize G to the set of maximally general
hypotheses in H - Initialize S to set of maximally specific
hypotheses in H - For each training example d, do
- If d is a positive example
- Remove from G any hypothesis inconsistent with d
- For each hypothesis in s in S that is not
consistent with d - Remove s from S
- Add to S all minimal generalizations h of s such
that - h is consistent with d, and some member of G is
more general than h - Remove from S any hypothesis that is more
general than another hypothesis in S
29Concept LearningCandiate-Elimination algorithm
- If d is a negative example
- Remove from S any hypothesis inconsistent with d
- For each hypothesis in g in G that is not
consistent with d - Remove g from G
- Add to G all minimal specializations h of g such
that - h is consistent with d, and some member of S is
more specific than h - Remove from G any hypothesis that is less
general than another hypothesis in G - How to use version space for classification of
new instances? - Both algorithms cant handle noisy training data,
i.e. they assume none of the training examples is
incorrect - For more complex Concept Learning algorithms see
Mitchell Machine Learning, Chapter 10.
30Inductive bias
- For both algorithms, we assumed that the target
concept was contained in the hypothesis space - Our hypothesis space was the set of all
hypotheses than can be expressed as a conjunction
of attributes - Such an assumption is called an inductive bias
- What if target concept not a conjunction of
constraints? Why not consider all possible
hypotheses?
31Decision Tree Learning
- Another inductive, supervised learning method for
approximating discrete-valued target functions - Learned function represented by a tree
- Leaf nodes provide classification
- Each node specifies a test of some attribute of
the instance
32Decision Tree Learning -Example
- Decision tree for the concept PlayTennis
33Decision Tree Learning
- Classification starts at the root node
- Example tree corresponds to the expression
- (Outlook Sunny AND Humidity Normal)OR
(Outlook Overcast)OR (Outlook Rain AND Wind
Weak) - Using the tree to classify new instances is easy,
but how do we construct a decision tree from
training examples?
34Decision Tree Learning ID3 Algorithm
- Constructs tree top-down
- Order of attributes?
- Evaluate each attribute using a statistical test
(see next slide) to determine how well it alone
classifies the training examples - Use the attribute that best classifies the
training example attribute at the root node.
Create descendants for each possible value of
that attribute. - Repeat process for each descendant
35Decision Tree Learning Entropy/Gain
- Given a collection S, containing positive and
negative examples of some target concept, the
entropy of S is - Entropy(S) -p log2p p- log2p-
- Information gain is then defined as
- ID3 Algorithm picks attribute with highest gain
36Decision Trees - Remarks
- Tree represents hypothesis thus, ID3 determines
only a single hypothesis, unlike
Candidate-Elimination - Unlike both Concept Learning approaches, ID3
makes no assumptions regarding the hypothesis
space every possible hypothesis can be
represented by some tree - Inductive Bias Shorter trees are preferred over
larger trees
37Decision Trees Example (Mitchell, p. 59)
Day Outlook Temp. Humid. Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
38Decision Tree - Example
- Gain(S, Outlook) 0.246
- Gain(S, Humidity) 0.151
- Gain(S, Wind) 0.048
- Gain(S, Temperature) 0.029
- So Outlook is the first attribute of the tree
- Ssunny D1,D2,D8,D9,D11
- GainSsunny,Humidity 0.97 (3/5)0 (2/5)0
0.97 - GainSsunny,Temperature 0.97 (2/5)0
(2/5)1 (1/5)0 0.57 - GainSsunny,Wind 0.97 (2/5)1 (3/5)0.918
0.019
Outlook
Sunny
Rain
Overcast
?
?
Yes
What next?
39Learning algorithms in robotics
- EBL has been used for Planning Algorithms
(example PRODIGY system (Carbonell et al.,
1990)) - Could use EBL, concept learning, decision tree
learning for things like object recognition (CL
and DTL can easily be extended to more than 2
categories) - However, while symbolic learning algorithms are
simple and easy to understand, they are not very
flexible and powerful - In many applications, robots need to learn to
classify something they perceive (using cameras,
sensors etc.) - Sensor inputs etc. are normally numeric
- ? Non-symbolic approaches such as NN,
Reinforcement Learning or HMM are better suited
and thus more commonly used in practice - Combined approaches exist, for example EBNN
(Explanation-based Neural Networks) (See for
example paper Explanation Based Learning for
Mobile Robot Perception by J. OSullivan, T.
Mitchell and S. Thrun)
40Unsupervised Learning
- Training data only consists of feature vectors,
does not include classifications - Unsupervised Learning algorithms try to find
patterns in the data - Classic examples
- Clustering
- Fitting Gaussian Density functions to data
- Dimensionality reduction
41Clustering k-means algorithm
- Cluster objects based on attributes into k
partitions - Tries to minimize intra-cluster variance
- Algorithm
- Algorithm starts by partitioning input points
into k initial sets, either at random or using
some heuristic data. - Then it calculates the mean point, or centroid,
of each set. - Constructs new partition by associating each
point with the closest centroid. - Recalculates centroids for new clusters
- Repeats this until convergence, which is when
points no longer switch clusters.
42COBWEB algorithm
- Incremental clustering algorithm
- Data structure is a tree (Categorization tree)
- Root node represents entire dataset
- Leaves represent instances
- Inner nodes are clusters, subclusters etc.
- Add instances one by one (instances are feature
vectors)
43COBWEB algorithm
- For every example e, the algorithm starts at root
node. Then for every node, one of the following 4
alternatives is chosen based on the category
utility function - Insert e into the best successor node
- Create a new leaf for e and make it a successor
of the current node. - Generate a new node n, which is predecessor of
the two best successors of the actual node and
insert n between the current node and the two
succesors. Example e is inserted into n. - Choose the best successor node, delete it and
make its successors direct successors of the
actual node. Afterwards insert example e in best
successor node. - Category utility function
44Summary
- This presentation mainly covered symbol-based
learning algorithms - Deductive
- EBL
- Inductive
- Concept Learning
- Decision-Tree Learning
- Unsupervised
- Clustering (not necessarily symbol-based), COBWEB
- Non symbol-based learning algorithms such as
neural networks part of the next lecture?
45References (Books)
- Tom Mitchell Machine Learning. McGraw-Hill,
1997. - Stuart Russell, Peter Norvig Artificial
Intelligence a modern approach. Prentice Hall,
2003. - George Luger Artificial Intelligence.
Addison-Wesley, 2002. - Elaine Rich Artificial Intelligence.
McGraw-Hill, 1983.