Introduction to Machine Learning and Knowledge Representation - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Machine Learning and Knowledge Representation

Description:

Entities have attributes ... a OR NOT b OR NOT c OR d which is the same as (a AND b AND c) d ... Create descendants for each possible value of that attribute. ... – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 44
Provided by: florian47
Learn more at: http://gamma.cs.unc.edu
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Machine Learning and Knowledge Representation


1
Introduction to Machine Learning and Knowledge
Representation
  • Florian Gyarfas
  • COMP 790-072 (Robotics)

2
Outline
  • Introduction, definitions
  • Common knowledge representations
  • Types of learning
  • Deductive Learning (rules of inference)
  • Explanation-based learning
  • Inductive Learning some approaches
  • Concept Learning
  • Decision-Tree Learning
  • Clustering
  • Summary, references

3
Introduction
  • What is Knowledge Representation?
  • Formalisms that represent knowledge (facts about
    the worlds) and mechanisms to manipulate such
    knowledge (for example, derive new facts from
    existing knowledge)
  • What is Machine Learning?
  • Mitchell A computer program is said to learn
    from experience E with respect to some class of
    tasks T and performance measure P, if its
    performance at tasks in T, as measured by P,
    improves with experience E.
  • Example T playing checkers P percent of
    games won against opponents E playing practice
    games against itself

4
Common Knowledge Representations
  • Logic
  • propositional
  • predicate
  • other
  • Structured Knowledge Representations
  • Frames
  • Semantic nets

5
Propositional Logic
  • Consists of
  • Constants true/false
  • Set of elements called symbols, variables or
    atomic formulas (typically letters a,b,c,)
  • Operators (?, ?, ?, ?, ?)
  • Axioms
  • Examples
  • a ? b ? a
  • a ? ?a ? true
  • Inference rule(s)
  • Modus ponens

6
Predicate logic
  • In many cases, propositional logic is too weak.
  • For example, how can we express something like
    this in propositional logic
  • Every person is mortal.Tom is a person.Tom is
    mortal.
  • How can we represent these sentences in such a
    way that we can infer the third sentence from the
    first two?
  • Need quantifiers and predicates
  • ?x Person(x) ? Mortal(x)Person(Tom)
  • We can infer Mortal(Tom)
  • In addition to propositional logic, predicate
    logic has
  • Functions
  • Predicates
  • Quantifiers (?,?)
  • More axioms
  • One more inference rule (Generalization)

7
Logic Inference Rules
  • Used for deductive learning
  • Propositional Logic Modus Ponens is all you need
  • If P, then Q.PTherefore, Q.
  • Meta-rule, not the same as axioms
  • Predicate Logic Modus Ponens and Generalization
    Rule

8
Structured Knowledge Representations
  • Semantic nets
  • Really just graphs that represent knowledge
  • Nodes represent concepts
  • Arcs represent binary relationships between
    concepts
  • Frames
  • Extension of semantic nets
  • Entities have attributes
  • Class/subclass hierarchy that supports
    inheritance classes inherit attributes from
    superclasses

9
Semantic Nets
  • Example (E. Rich, Artificial Intelligence)

Furniture
is-a
is-part
Chair
Person
Seat
is-a
is-a
owner
color
My-chair
Me
Tan
is-a
covering
Leather
Brown
10
Types of Learning
  • Deductive Inductive
  • Supervised Unsupervised
  • Symbolic Non-symbolic

11
Deductive vs. inductive
  • Deductive learning
  • Knowledge is deduced from existing knowledge by
    means of truth-preserving transformations (this
    is nothing more than reformulation of existing
    knowledge).
  • If the premises are true, the conclusion must be
    true!
  • Example propositional/predicate logic with rules
    of inference

12
Deductive vs. inductive
  • Inductive learning
  • Generalization from examples
  • Example All observed crows are black ? All
    crows are black
  • process of reasoning in which the premises of an
    argument are believed to support the conclusion
    but do not ensure it

13
Unsupervised vs. Supervised
  • Supervised Learning
  • There exists a teacher that for each training
    example tells the learner how it is classified
    (training data consists of pairs of input vectors
    and desired outputs)
  • Reinforcement Learning
  • No input/output pairs reward function tells
    agent how good its action was
  • Unsupervised Learning
  • No a priori output also no reward training data
    just feature vectors the system needs to form
    concepts (classes) by itself

14
Learning approaches
  • Deductive/Analytical
  • Explanation-based learning
  • Inductive
  • Supervised
  • Concept Learning
  • Decision-Tree Learning
  • Neural networks
  • Naive Bayes classifier
  • Support Vector Machines
  • Unsupervised
  • Clustering
  • Neural networks
  • Expectation-Maximization

15
Explanation-based learning (EBL)
  • Deductive
  • assumes prior knowledge (domain theory) in
    addition to training examples
  • assumes domain theory is given as a set of horn
    clauses
  • Horn clause Disjunction of literals with at most
    one positive literal
  • Example NOT a OR NOT b OR NOT c OR d which is
    the same as (a AND b AND c) ? d
  • Tries to explain training examples using the
    domain theory

16
EBL example
  • Consider multiple physical objects
  • Which are the pairs of objects such that one can
    be stacked safely on the other?
  • Target concept
  • premise(x,y) ? SafeToStack(x,y)
  • where premise is a conjunctive expression
    containing the variables x and y.
  • Domain Theory
  • SafeToStack(x,y) ? NOT Fragile(y)
  • SafeToStack(x,y) ? Lighter(x,y)
  • Lighter(x,y) ? Weight(x,wx) AND Weight(y,wy) AND
    LessThan(wx,wy)
  • Weight(x,w) ? Volume(x,v) AND Density(x,d) AND
    Equal(w,times(v,d))
  • Weight(x,5) ? Type(x,Table)
  • Fragile(x) ? Material(x,Glass)

17
EBL example (2)
  • Training example
  • On(Obj1,Obj2)
  • Type(Obj1,Box)
  • Type(Obj2,Table)
  • Color(Obj1,Red)
  • Color(Obj2,Blue)
  • Volume(Obj1,2)
  • Density(Obj1,0.3)
  • Material(Obj1,Cardboard)
  • Material(Obj2,Wood)
  • SafeToStack(Obj1,Obj2)

18
EBL example (3)
  • Explanation

SafeToStack(Obj1,Obj2)
Lighter(Obj1,Obj2)
Weight(Obj1,Obj2)
Weight(Obj2,5)
Density(Obj1,0.3)
Type(Obj2,Table)
Volume(Obj1,2)
Equal(0.6,20.3)
LessThan(0.6,5)
19
EBL algorithm
  • Explain training example
  • Analyze/Generalize Explanation
  • Add Explanation to Learned Rules (Domain Theory)
  • Use for example REGRESS algorithm (Mitchell, p.
    318) for step (2)
  • In our example most general rule that can be
    justified by the explanation is
  • SafeToStack(x,y) ? Volume(x,vx) AND Density(x,dx)
    AND Equal(wx,times(vx,dx)) AND LessThan(wx,5) AND
    Type(y,Table)

20
EBL - Remarks
  • Knowledge Reformulation EBL just restates what
    the learner already knows
  • You dont really gain new knowledge!
  • Why do we need it then? In principle, we can
    compute everything we need using just the domain
    theory
  • In practice, however, this might not work.
    Consider chess Does knowing all the rules make
    you a perfect player?
  • So EBL reformulates existing knowledge into a
    more operational form which might be much more
    effective especially under certain constraints

21
Concept Learning
  • Inductive
  • learn general concept definition from specific
    training examples
  • search through predefined space of potential
    hypotheses for target concept
  • pick the one that best fits training examples

22
Concept Learning Example
  • Taken from Tom Mitchells book Machine Learning
  • Concept to learn Days on which Tom enjoys his
    favorite water sport
  • Training examples D (every row is an instance,
    every column an attribute)

Sky AirTemp Humidity Wind Water Forecast EnjoySport (Classification)
1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes
23
Concept Learning
  • X set of all instances (instance combination
    of attributes), D set of all training examples
    (D ? X)
  • The target concept is a function (target
    function) c(x) that for a given instance x is
    either 0 or 1 (in our example 0 if EnjoySport
    No, 1 if EnjoySport Yes)
  • We do not know the target concept, but we can
    come up with hypotheses. We would like to find a
    hypothesis h(x) such that h(x) c(x) at least
    for all x in D (D training examples)
  • Hypotheses representation
  • Let us assume that the target concept is
    expressed as a conjunction of constraints on the
    instance attributes. Then we can write a
    hypothesis like this AirTemp Cold ? Humidity
    High. Or, in short lt?,Cold,High,?,?,?gt where ?
    means any value is acceptable for this attribute.
    We use ? to indicate that no value is
    acceptable for an attribute.

24
Concept Learning
  • Most general hypothesis
  • lt?,?,?,?,?,?gt
  • Most specific hypothesis
  • lt?, ?, ?, ?, ?, ? gt
  • General-to-specific ordering of hypotheses g
    (more general than or equal to) defines partial
    order over hypothesis space H
  • FIND-S algorithm
  • finds the most specific hypothesis
  • For our example ltSunny,Warm,?,Strong,?,?gt

25
Concept Learning (FIND-S)
  • FIND-S algorithm
  • Initialize h to the most specific hypothesis in H
  • For each positive training instance x
  • For each atttribute constraint ai in h If the
    constraint ai is satisfied by x Then do
    nothing Else replace ai in h by the next more
    general constraint that is satisfied by x
  • Output hypothesis h
  • Why can we ignore negative examples?

26
Concept Learning
  • FIND-S only computes the most specific hypothesis
  • Another approach to concept learning
    CANDIDATE-ELIMINATION
  • CANDIDATE-ELIMINATION finds all hypotheses in the
    the version space
  • The version space, denoted VSH,D, with respect to
    hypothesis space H and training examples D, is a
    subset of hypotheses from H consistent with the
    training examples in D.
  • VSH,D h ? HConsistent(h,D)

27
Concept Learning (Version space)
  • Version space for our example

ltSunny,Warm,?,Strong,?,?gt
ltSunny,?,?,Strong,?,?gt
ltSunny,Warm,?,?,?,?gt
lt?,Warm,?,Strong,?,?gt
ltSunny,?,?,?,?,?gt, lt?,Warm,?,?,?,?gt
28
Concept LearningCandiate-Elimination algorithm
  • Initialize G to the set of maximally general
    hypotheses in H
  • Initialize S to set of maximally specific
    hypotheses in H
  • For each training example d, do
  • If d is a positive example
  • Remove from G any hypothesis inconsistent with d
  • For each hypothesis in s in S that is not
    consistent with d
  • Remove s from S
  • Add to S all minimal generalizations h of s such
    that
  • h is consistent with d, and some member of G is
    more general than h
  • Remove from S any hypothesis that is more
    general than another hypothesis in S

29
Concept LearningCandiate-Elimination algorithm
  • If d is a negative example
  • Remove from S any hypothesis inconsistent with d
  • For each hypothesis in g in G that is not
    consistent with d
  • Remove g from G
  • Add to G all minimal specializations h of g such
    that
  • h is consistent with d, and some member of S is
    more specific than h
  • Remove from G any hypothesis that is less
    general than another hypothesis in G
  • How to use version space for classification of
    new instances?
  • Both algorithms cant handle noisy training data,
    i.e. they assume none of the training examples is
    incorrect
  • For more complex Concept Learning algorithms see
    Mitchell Machine Learning, Chapter 10.

30
Inductive bias
  • For both algorithms, we assumed that the target
    concept was contained in the hypothesis space
  • Our hypothesis space was the set of all
    hypotheses than can be expressed as a conjunction
    of attributes
  • Such an assumption is called an inductive bias
  • What if target concept not a conjunction of
    constraints? Why not consider all possible
    hypotheses?

31
Decision Tree Learning
  • Another inductive, supervised learning method for
    approximating discrete-valued target functions
  • Learned function represented by a tree
  • Leaf nodes provide classification
  • Each node specifies a test of some attribute of
    the instance

32
Decision Tree Learning -Example
  • Decision tree for the concept PlayTennis

33
Decision Tree Learning
  • Classification starts at the root node
  • Example tree corresponds to the expression
  • (Outlook Sunny AND Humidity Normal)OR
    (Outlook Overcast)OR (Outlook Rain AND Wind
    Weak)
  • Using the tree to classify new instances is easy,
    but how do we construct a decision tree from
    training examples?

34
Decision Tree Learning ID3 Algorithm
  • Constructs tree top-down
  • Order of attributes?
  • Evaluate each attribute using a statistical test
    (see next slide) to determine how well it alone
    classifies the training examples
  • Use the attribute that best classifies the
    training example attribute at the root node.
    Create descendants for each possible value of
    that attribute.
  • Repeat process for each descendant

35
Decision Tree Learning Entropy/Gain
  • Given a collection S, containing positive and
    negative examples of some target concept, the
    entropy of S is
  • Entropy(S) -p log2p p- log2p-
  • Information gain is then defined as
  • ID3 Algorithm picks attribute with highest gain

36
Decision Trees - Remarks
  • Tree represents hypothesis thus, ID3 determines
    only a single hypothesis, unlike
    Candidate-Elimination
  • Unlike both Concept Learning approaches, ID3
    makes no assumptions regarding the hypothesis
    space every possible hypothesis can be
    represented by some tree
  • Inductive Bias Shorter trees are preferred over
    larger trees

37
Decision Trees Example (Mitchell, p. 59)
Day Outlook Temp. Humid. Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
38
Decision Tree - Example
  • Gain(S, Outlook) 0.246
  • Gain(S, Humidity) 0.151
  • Gain(S, Wind) 0.048
  • Gain(S, Temperature) 0.029
  • So Outlook is the first attribute of the tree
  • Ssunny D1,D2,D8,D9,D11
  • GainSsunny,Humidity 0.97 (3/5)0 (2/5)0
    0.97
  • GainSsunny,Temperature 0.97 (2/5)0
    (2/5)1 (1/5)0 0.57
  • GainSsunny,Wind 0.97 (2/5)1 (3/5)0.918
    0.019

Outlook
Sunny
Rain
Overcast
?
?
Yes
What next?
39
Learning algorithms in robotics
  • EBL has been used for Planning Algorithms
    (example PRODIGY system (Carbonell et al.,
    1990))
  • Could use EBL, concept learning, decision tree
    learning for things like object recognition (CL
    and DTL can easily be extended to more than 2
    categories)
  • However, while symbolic learning algorithms are
    simple and easy to understand, they are not very
    flexible and powerful
  • In many applications, robots need to learn to
    classify something they perceive (using cameras,
    sensors etc.)
  • Sensor inputs etc. are normally numeric
  • ? Non-symbolic approaches such as NN,
    Reinforcement Learning or HMM are better suited
    and thus more commonly used in practice
  • Combined approaches exist, for example EBNN
    (Explanation-based Neural Networks) (See for
    example paper Explanation Based Learning for
    Mobile Robot Perception by J. OSullivan, T.
    Mitchell and S. Thrun)

40
Unsupervised Learning
  • Training data only consists of feature vectors,
    does not include classifications
  • Unsupervised Learning algorithms try to find
    patterns in the data
  • Classic examples
  • Clustering
  • Fitting Gaussian Density functions to data
  • Dimensionality reduction

41
Clustering k-means algorithm
  • Cluster objects based on attributes into k
    partitions
  • Tries to minimize intra-cluster variance
  • Algorithm
  • Algorithm starts by partitioning input points
    into k initial sets, either at random or using
    some heuristic data.
  • Then it calculates the mean point, or centroid,
    of each set.
  • Constructs new partition by associating each
    point with the closest centroid.
  • Recalculates centroids for new clusters
  • Repeats this until convergence, which is when
    points no longer switch clusters.

42
COBWEB algorithm
  • Incremental clustering algorithm
  • Data structure is a tree (Categorization tree)
  • Root node represents entire dataset
  • Leaves represent instances
  • Inner nodes are clusters, subclusters etc.
  • Add instances one by one (instances are feature
    vectors)

43
COBWEB algorithm
  • For every example e, the algorithm starts at root
    node. Then for every node, one of the following 4
    alternatives is chosen based on the category
    utility function
  • Insert e into the best successor node
  • Create a new leaf for e and make it a successor
    of the current node.
  • Generate a new node n, which is predecessor of
    the two best successors of the actual node and
    insert n between the current node and the two
    succesors. Example e is inserted into n.
  • Choose the best successor node, delete it and
    make its successors direct successors of the
    actual node. Afterwards insert example e in best
    successor node.
  • Category utility function

44
Summary
  • This presentation mainly covered symbol-based
    learning algorithms
  • Deductive
  • EBL
  • Inductive
  • Concept Learning
  • Decision-Tree Learning
  • Unsupervised
  • Clustering (not necessarily symbol-based), COBWEB
  • Non symbol-based learning algorithms such as
    neural networks part of the next lecture?

45
References (Books)
  • Tom Mitchell Machine Learning. McGraw-Hill,
    1997.
  • Stuart Russell, Peter Norvig Artificial
    Intelligence a modern approach. Prentice Hall,
    2003.
  • George Luger Artificial Intelligence.
    Addison-Wesley, 2002.
  • Elaine Rich Artificial Intelligence.
    McGraw-Hill, 1983.
Write a Comment
User Comments (0)
About PowerShow.com