Introduction to Machine Learning and Knowledge Representation - PowerPoint PPT Presentation

About This Presentation

Title:

Introduction to Machine Learning and Knowledge Representation

Description:

Entities have attributes ... a OR NOT b OR NOT c OR d which is the same as (a AND b AND c) d ... Create descendants for each possible value of that attribute. ... – PowerPoint PPT presentation

Number of Views:136

Avg rating:3.0/5.0

Slides: 44

Provided by: florian47

Learn more at: http://gamma.cs.unc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Machine Learning and Knowledge Representation

1
Introduction to Machine Learning and Knowledge
Representation

Florian Gyarfas
COMP 790-072 (Robotics)

2
Outline

Introduction, definitions
Common knowledge representations
Types of learning
Deductive Learning (rules of inference)
Explanation-based learning
Inductive Learning some approaches
Concept Learning
Decision-Tree Learning
Clustering
Summary, references

3
Introduction

What is Knowledge Representation?
Formalisms that represent knowledge (facts about
the worlds) and mechanisms to manipulate such
knowledge (for example, derive new facts from
existing knowledge)
What is Machine Learning?
Mitchell A computer program is said to learn
from experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience E.
Example T playing checkers P percent of
games won against opponents E playing practice
games against itself

4
Common Knowledge Representations

Logic
propositional
predicate
other
Structured Knowledge Representations
Frames
Semantic nets

5
Propositional Logic

Consists of
Constants true/false
Set of elements called symbols, variables or
atomic formulas (typically letters a,b,c,)
Operators (?, ?, ?, ?, ?)
Axioms
Examples
a ? b ? a
a ? ?a ? true
Inference rule(s)
Modus ponens

6
Predicate logic

In many cases, propositional logic is too weak.
For example, how can we express something like
this in propositional logic
Every person is mortal.Tom is a person.Tom is
mortal.
How can we represent these sentences in such a
way that we can infer the third sentence from the
first two?
Need quantifiers and predicates
?x Person(x) ? Mortal(x)Person(Tom)
We can infer Mortal(Tom)
In addition to propositional logic, predicate
logic has
Functions
Predicates
Quantifiers (?,?)
More axioms
One more inference rule (Generalization)

7
Logic Inference Rules

Used for deductive learning
Propositional Logic Modus Ponens is all you need
If P, then Q.PTherefore, Q.
Meta-rule, not the same as axioms
Predicate Logic Modus Ponens and Generalization
Rule

8
Structured Knowledge Representations

Semantic nets
Really just graphs that represent knowledge
Nodes represent concepts
Arcs represent binary relationships between
concepts
Frames
Extension of semantic nets
Entities have attributes
Class/subclass hierarchy that supports
inheritance classes inherit attributes from
superclasses

9
Semantic Nets

Example (E. Rich, Artificial Intelligence)

Furniture
is-a
is-part
Chair
Person
Seat
is-a
is-a
owner
color
My-chair
Me
Tan
is-a
covering
Leather
Brown
10
Types of Learning

Deductive Inductive
Supervised Unsupervised
Symbolic Non-symbolic

11
Deductive vs. inductive

Deductive learning
Knowledge is deduced from existing knowledge by
means of truth-preserving transformations (this
is nothing more than reformulation of existing
knowledge).
If the premises are true, the conclusion must be
true!
Example propositional/predicate logic with rules
of inference

12
Deductive vs. inductive

Inductive learning
Generalization from examples
Example All observed crows are black ? All
crows are black
process of reasoning in which the premises of an
argument are believed to support the conclusion
but do not ensure it

13
Unsupervised vs. Supervised

Supervised Learning
There exists a teacher that for each training
example tells the learner how it is classified
(training data consists of pairs of input vectors
and desired outputs)
Reinforcement Learning
No input/output pairs reward function tells
agent how good its action was
Unsupervised Learning
No a priori output also no reward training data
just feature vectors the system needs to form
concepts (classes) by itself

14
Learning approaches

Deductive/Analytical
Explanation-based learning
Inductive
Supervised
Concept Learning
Decision-Tree Learning
Neural networks
Naive Bayes classifier
Support Vector Machines
Unsupervised
Clustering
Neural networks
Expectation-Maximization

15
Explanation-based learning (EBL)

Deductive
assumes prior knowledge (domain theory) in
addition to training examples
assumes domain theory is given as a set of horn
clauses
Horn clause Disjunction of literals with at most
one positive literal
Example NOT a OR NOT b OR NOT c OR d which is
the same as (a AND b AND c) ? d
Tries to explain training examples using the
domain theory

16
EBL example

Consider multiple physical objects
Which are the pairs of objects such that one can
be stacked safely on the other?
Target concept
premise(x,y) ? SafeToStack(x,y)
where premise is a conjunctive expression
containing the variables x and y.
Domain Theory
SafeToStack(x,y) ? NOT Fragile(y)
SafeToStack(x,y) ? Lighter(x,y)
Lighter(x,y) ? Weight(x,wx) AND Weight(y,wy) AND
LessThan(wx,wy)
Weight(x,w) ? Volume(x,v) AND Density(x,d) AND
Equal(w,times(v,d))
Weight(x,5) ? Type(x,Table)
Fragile(x) ? Material(x,Glass)

17
EBL example (2)

Training example
On(Obj1,Obj2)
Type(Obj1,Box)
Type(Obj2,Table)
Color(Obj1,Red)
Color(Obj2,Blue)
Volume(Obj1,2)
Density(Obj1,0.3)
Material(Obj1,Cardboard)
Material(Obj2,Wood)
SafeToStack(Obj1,Obj2)

18
EBL example (3)

Explanation

SafeToStack(Obj1,Obj2)
Lighter(Obj1,Obj2)
Weight(Obj1,Obj2)
Weight(Obj2,5)
Density(Obj1,0.3)
Type(Obj2,Table)
Volume(Obj1,2)
Equal(0.6,20.3)
LessThan(0.6,5)
19
EBL algorithm

Explain training example
Analyze/Generalize Explanation
Add Explanation to Learned Rules (Domain Theory)
Use for example REGRESS algorithm (Mitchell, p.
318) for step (2)
In our example most general rule that can be
justified by the explanation is
SafeToStack(x,y) ? Volume(x,vx) AND Density(x,dx)
AND Equal(wx,times(vx,dx)) AND LessThan(wx,5) AND
Type(y,Table)

20
EBL - Remarks

Knowledge Reformulation EBL just restates what
the learner already knows
You dont really gain new knowledge!
Why do we need it then? In principle, we can
compute everything we need using just the domain
theory
In practice, however, this might not work.
Consider chess Does knowing all the rules make
you a perfect player?
So EBL reformulates existing knowledge into a
more operational form which might be much more
effective especially under certain constraints

21
Concept Learning

Inductive
learn general concept definition from specific
training examples
search through predefined space of potential
hypotheses for target concept
pick the one that best fits training examples

22
Concept Learning Example

Taken from Tom Mitchells book Machine Learning
Concept to learn Days on which Tom enjoys his
favorite water sport
Training examples D (every row is an instance,
every column an attribute)

Sky AirTemp Humidity Wind Water Forecast EnjoySport (Classification)
1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes
23
Concept Learning

X set of all instances (instance combination
of attributes), D set of all training examples
(D ? X)
The target concept is a function (target
function) c(x) that for a given instance x is
either 0 or 1 (in our example 0 if EnjoySport
No, 1 if EnjoySport Yes)
We do not know the target concept, but we can
come up with hypotheses. We would like to find a
hypothesis h(x) such that h(x) c(x) at least
for all x in D (D training examples)
Hypotheses representation
Let us assume that the target concept is
expressed as a conjunction of constraints on the
instance attributes. Then we can write a
hypothesis like this AirTemp Cold ? Humidity
High. Or, in short lt?,Cold,High,?,?,?gt where ?
means any value is acceptable for this attribute.
We use ? to indicate that no value is
acceptable for an attribute.

24
Concept Learning

Most general hypothesis
lt?,?,?,?,?,?gt
Most specific hypothesis
lt?, ?, ?, ?, ?, ? gt
General-to-specific ordering of hypotheses g
(more general than or equal to) defines partial
order over hypothesis space H
FIND-S algorithm
finds the most specific hypothesis
For our example ltSunny,Warm,?,Strong,?,?gt

25
Concept Learning (FIND-S)

FIND-S algorithm
Initialize h to the most specific hypothesis in H
For each positive training instance x
For each atttribute constraint ai in h If the
constraint ai is satisfied by x Then do
nothing Else replace ai in h by the next more
general constraint that is satisfied by x
Output hypothesis h
Why can we ignore negative examples?

26
Concept Learning

FIND-S only computes the most specific hypothesis
Another approach to concept learning
CANDIDATE-ELIMINATION
CANDIDATE-ELIMINATION finds all hypotheses in the
the version space
The version space, denoted VSH,D, with respect to
hypothesis space H and training examples D, is a
subset of hypotheses from H consistent with the
training examples in D.
VSH,D h ? HConsistent(h,D)

27
Concept Learning (Version space)

Version space for our example

ltSunny,Warm,?,Strong,?,?gt
ltSunny,?,?,Strong,?,?gt
ltSunny,Warm,?,?,?,?gt
lt?,Warm,?,Strong,?,?gt
ltSunny,?,?,?,?,?gt, lt?,Warm,?,?,?,?gt
28
Concept LearningCandiate-Elimination algorithm

Initialize G to the set of maximally general
hypotheses in H
Initialize S to set of maximally specific
hypotheses in H
For each training example d, do
If d is a positive example
Remove from G any hypothesis inconsistent with d
For each hypothesis in s in S that is not
consistent with d
Remove s from S
Add to S all minimal generalizations h of s such
that
h is consistent with d, and some member of G is
more general than h
Remove from S any hypothesis that is more
general than another hypothesis in S

29
Concept LearningCandiate-Elimination algorithm

If d is a negative example
Remove from S any hypothesis inconsistent with d
For each hypothesis in g in G that is not
consistent with d
Remove g from G
Add to G all minimal specializations h of g such
that
h is consistent with d, and some member of S is
more specific than h
Remove from G any hypothesis that is less
general than another hypothesis in G
How to use version space for classification of
new instances?
Both algorithms cant handle noisy training data,
i.e. they assume none of the training examples is
incorrect
For more complex Concept Learning algorithms see
Mitchell Machine Learning, Chapter 10.

30
Inductive bias

For both algorithms, we assumed that the target
concept was contained in the hypothesis space
Our hypothesis space was the set of all
hypotheses than can be expressed as a conjunction
of attributes
Such an assumption is called an inductive bias
What if target concept not a conjunction of
constraints? Why not consider all possible
hypotheses?

31
Decision Tree Learning

Another inductive, supervised learning method for
approximating discrete-valued target functions
Learned function represented by a tree
Leaf nodes provide classification
Each node specifies a test of some attribute of
the instance

32
Decision Tree Learning -Example

Decision tree for the concept PlayTennis

33
Decision Tree Learning

Classification starts at the root node
Example tree corresponds to the expression
(Outlook Sunny AND Humidity Normal)OR
(Outlook Overcast)OR (Outlook Rain AND Wind
Weak)
Using the tree to classify new instances is easy,
but how do we construct a decision tree from
training examples?

34
Decision Tree Learning ID3 Algorithm

Constructs tree top-down
Order of attributes?
Evaluate each attribute using a statistical test
(see next slide) to determine how well it alone
classifies the training examples
Use the attribute that best classifies the
training example attribute at the root node.
Create descendants for each possible value of
that attribute.
Repeat process for each descendant

35
Decision Tree Learning Entropy/Gain

Given a collection S, containing positive and
negative examples of some target concept, the
entropy of S is
Entropy(S) -p log2p p- log2p-
Information gain is then defined as
ID3 Algorithm picks attribute with highest gain

36
Decision Trees - Remarks

Tree represents hypothesis thus, ID3 determines
only a single hypothesis, unlike
Candidate-Elimination
Unlike both Concept Learning approaches, ID3
makes no assumptions regarding the hypothesis
space every possible hypothesis can be
represented by some tree
Inductive Bias Shorter trees are preferred over
larger trees

37
Decision Trees Example (Mitchell, p. 59)
Day Outlook Temp. Humid. Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
38
Decision Tree - Example

Gain(S, Outlook) 0.246
Gain(S, Humidity) 0.151
Gain(S, Wind) 0.048
Gain(S, Temperature) 0.029
So Outlook is the first attribute of the tree
Ssunny D1,D2,D8,D9,D11
GainSsunny,Humidity 0.97 (3/5)0 (2/5)0
0.97
GainSsunny,Temperature 0.97 (2/5)0
(2/5)1 (1/5)0 0.57
GainSsunny,Wind 0.97 (2/5)1 (3/5)0.918
0.019

Outlook
Sunny
Rain
Overcast
?
?
Yes
What next?
39
Learning algorithms in robotics

EBL has been used for Planning Algorithms
(example PRODIGY system (Carbonell et al.,
1990))
Could use EBL, concept learning, decision tree
learning for things like object recognition (CL
and DTL can easily be extended to more than 2
categories)
However, while symbolic learning algorithms are
simple and easy to understand, they are not very
flexible and powerful
In many applications, robots need to learn to
classify something they perceive (using cameras,
sensors etc.)
Sensor inputs etc. are normally numeric
? Non-symbolic approaches such as NN,
Reinforcement Learning or HMM are better suited
and thus more commonly used in practice
Combined approaches exist, for example EBNN
(Explanation-based Neural Networks) (See for
example paper Explanation Based Learning for
Mobile Robot Perception by J. OSullivan, T.
Mitchell and S. Thrun)

40
Unsupervised Learning

Training data only consists of feature vectors,
does not include classifications
Unsupervised Learning algorithms try to find
patterns in the data
Classic examples
Clustering
Fitting Gaussian Density functions to data
Dimensionality reduction

41
Clustering k-means algorithm

Cluster objects based on attributes into k
partitions
Tries to minimize intra-cluster variance
Algorithm
Algorithm starts by partitioning input points
into k initial sets, either at random or using
some heuristic data.
Then it calculates the mean point, or centroid,
of each set.
Constructs new partition by associating each
point with the closest centroid.
Recalculates centroids for new clusters
Repeats this until convergence, which is when
points no longer switch clusters.

42
COBWEB algorithm

Incremental clustering algorithm
Data structure is a tree (Categorization tree)
Root node represents entire dataset
Leaves represent instances
Inner nodes are clusters, subclusters etc.
Add instances one by one (instances are feature
vectors)

43
COBWEB algorithm

For every example e, the algorithm starts at root
node. Then for every node, one of the following 4
alternatives is chosen based on the category
utility function
Insert e into the best successor node
Create a new leaf for e and make it a successor
of the current node.
Generate a new node n, which is predecessor of
the two best successors of the actual node and
insert n between the current node and the two
succesors. Example e is inserted into n.
Choose the best successor node, delete it and
make its successors direct successors of the
actual node. Afterwards insert example e in best
successor node.
Category utility function

44
Summary

This presentation mainly covered symbol-based
learning algorithms
Deductive
EBL
Inductive
Concept Learning
Decision-Tree Learning
Unsupervised
Clustering (not necessarily symbol-based), COBWEB
Non symbol-based learning algorithms such as
neural networks part of the next lecture?

45
References (Books)

Tom Mitchell Machine Learning. McGraw-Hill,
1997.
Stuart Russell, Peter Norvig Artificial
Intelligence a modern approach. Prentice Hall,
2003.
George Luger Artificial Intelligence.
Addison-Wesley, 2002.
Elaine Rich Artificial Intelligence.
McGraw-Hill, 1983.

Write a Comment

User Comments (0)