Learning through Interactive Behavior Specifications - PowerPoint PPT Presentation

About This Presentation

Title:

Learning through Interactive Behavior Specifications

Description:

Reduce the cost of agent development. Reduce the expertise required to ... We may want ... knowledge is represented with a hierarchy of durative goals. i3. r1 ... – PowerPoint PPT presentation

Number of Views:26

Avg rating:3.0/5.0

Slides: 29

Provided by: tolga2

Learn more at: http://lac.gmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Learning through Interactive Behavior Specifications

1
Learning through Interactive Behavior
Specifications

Tolga Konik
CSLI, Stanford University
Douglas Pearson
Three Penny Software
John Laird
University of Michigan

2
Goal

Automatically generate cognitive agents
Reduce the cost of agent development
Reduce the expertise required to develop agents.

3
Domains

Autonomous Cognitive agents
Dynamic Virtual Worlds
Real time decisions based on knowledge and sensed
data
Soar agent architecture

4
Learning by Observation

Approach
Observe expert behavior
Learn to replicate it
Why?
We may want human-like agents
In complex domains, imitating humans maybe easier
than learning from scratch

5
Bottleneck in pure Learning by Observation

PROBLEM
You cannot observe the internal reasoning of the
expert
SOLUTION
Ask the expert for additional information
Goal annotations
Use additional knowledge sources
Task domain knowledge

6
Learning by Observation
Environment
Goal annotations
Actions
Percepts
Additional Task Knowledge
Learner
7
Learning by Observation
Environment
8
Learning by ObservationCritic Mode
Environment
critic
Learner
9
One Body, Two Minds
?
?
Environment

How and when to switch control
How the expert and the agent program communicate

10
Diagrammatic Behavior Specification
Learner
11
Redux

Visual rule editing

Diagrammatic Behavior Specification

12
Goal Hierarchy
Get-item(Item)
Get-item-different-room(Item)

Task-Performance knowledge is represented with a
hierarchy of durative goals.

13
Goal Hierarchy
Get-item(i3)
Itemi3
Get-item-in-room(Item)
Goto-next-room
Get-item-different-room(Item)
Get-item-in-room(i3)
Go-to(Door)
14
Goal Hierarchy
Get-item(i3)
Itemi3
Get-item-different-room(Item)
Get-item-different-room(i3)
Get-item-in-room(Item)
Go-to(Door)
Go-to(d1)
Doord1
15
Goal Hierarchy
i3
Get-item(i3)
Get-item-in-room(Item)
Get-item-different-room(i3)
Doord1
16
Goal Hierarchy
i3
Get-item(i3)
Get-item-in-room(Item)
Get-item-different-room(i3)
Doord3
17
Behavior Specification

Expert draws initial abstract situation
Create senario by selecting actions

18
Goal Specification

Goals are explicitly selected
The agent contributes based on the current
situation, current goal and its knowledge

19
Switching Roles

Expert generates behavior if the agent doesnt
know how to pursue the current goal
Agent may propose goals, subgoals and actions
If the agent is correct, the expert observes and
validates
Otherwise rejects, corrects, or takes over
Key to the interaction is shared goals shared
assumption about the current situation

20
Goal Hierarchy

Learning by Observation perspective
Unobservable mental reasoning of the expert
Learning Perspective
Bias hypothesis space
learn agent problem reduced to learn goal
selection and termination
MI Perspective
information exchange between the expert and the
agent

21
Relevant Knowledge Specification
Prepare food

Expert can mark important objects in a decision

22
Rich Behavior Trace

Expert specified undesired actions and goals
Expert rejected actions and goals of the
approximately learned agent program

Watch TV
23
Rich Behavior Trace

Hypothetical Actions and Goals
Situation history a tree structure of possible
behaviors

24
Relational Learning by Observation

Input
Relational Situations
Goal and action selections and rejections
Additional annotations (i.e. important objects)
Background knowledge
Output
Rule based agent program
Learn goal/action selection/termination
generalizing over multiple examples
Inductive Logic Programming to combine rich
knowledge structures

25
Relational Learning by Observation
26
Relational Learning by Observation
Find the common structures in the decision
examples
27
Relational Learning by Observation

Learn relations between what the agent wants,
perceives and knows.

Select a door in the current room, which leads
to a room that contains the item the agent wants
to get
28
Comparing Redux to LBOAdvantages of Redux

No real time constraints on behavior
i.e. no waiting for a 2 hour long goal
can be used to describe unlikely, but critical
situations
i.e. Lets assume that there is a nuclear
melt-down.
Richer annotation opportunities
Increase learning speed and quality
Faster focus where knowledge is lacked most
Immediate expert feedback on how rules behave

29
Comparing Redux to LBODisadvantages of Redux

Cant learn low level behavior.
Contains domain specific components
Although most of Redux is domain independent
Generating behavior may be slower.
Additional annotations improve learning but
require extra expert effort

30
Relational Behavior Trace
Behavior Trace The Set of
Situations in execution history

A Situation
a symbolic snapshot of the observed environment
at a time

31
Annotated Behavior Traces

Behavior is annotated with actions and goals
goto-room(r1), etc.

32
Summary

Diagrammatic behavior specification approach
To extract rich behavior knowledge
Interactive behavior specification
Communication medium between the agents (explicit
goals and assumed situation)
Relational learning by observation approach to
combine multiple complex knowledge sources

33
Future Work