Title: Learning through Interactive Behavior Specifications
1Learning through Interactive Behavior
Specifications
- Tolga Konik
- CSLI, Stanford University
- Douglas Pearson
- Three Penny Software
- John Laird
- University of Michigan
2Goal
- Automatically generate cognitive agents
- Reduce the cost of agent development
- Reduce the expertise required to develop agents.
3Domains
- Autonomous Cognitive agents
- Dynamic Virtual Worlds
- Real time decisions based on knowledge and sensed
data - Soar agent architecture
4Learning by Observation
- Approach
- Observe expert behavior
- Learn to replicate it
- Why?
- We may want human-like agents
- In complex domains, imitating humans maybe easier
than learning from scratch
5Bottleneck in pure Learning by Observation
- PROBLEM
- You cannot observe the internal reasoning of the
expert - SOLUTION
- Ask the expert for additional information
- Goal annotations
- Use additional knowledge sources
- Task domain knowledge
6Learning by Observation
Environment
Goal annotations
Actions
Percepts
Additional Task Knowledge
Learner
7Learning by Observation
Environment
8Learning by ObservationCritic Mode
Environment
critic
Learner
9One Body, Two Minds
?
?
Environment
- How and when to switch control
- How the expert and the agent program communicate
10Diagrammatic Behavior Specification
Learner
11Redux
- Diagrammatic Behavior Specification
-
12Goal Hierarchy
Get-item(Item)
Get-item-different-room(Item)
- Task-Performance knowledge is represented with a
hierarchy of durative goals.
13Goal Hierarchy
Get-item(i3)
Itemi3
Get-item-in-room(Item)
Goto-next-room
Get-item-different-room(Item)
Get-item-in-room(i3)
Go-to(Door)
14Goal Hierarchy
Get-item(i3)
Itemi3
Get-item-different-room(Item)
Get-item-different-room(i3)
Get-item-in-room(Item)
Go-to(Door)
Go-to(d1)
Doord1
15Goal Hierarchy
i3
Get-item(i3)
Get-item-in-room(Item)
Get-item-different-room(i3)
Doord1
16Goal Hierarchy
i3
Get-item(i3)
Get-item-in-room(Item)
Get-item-different-room(i3)
Doord3
17Behavior Specification
- Expert draws initial abstract situation
- Create senario by selecting actions
18Goal Specification
- Goals are explicitly selected
- The agent contributes based on the current
situation, current goal and its knowledge
19Switching Roles
- Expert generates behavior if the agent doesnt
know how to pursue the current goal - Agent may propose goals, subgoals and actions
- If the agent is correct, the expert observes and
validates - Otherwise rejects, corrects, or takes over
- Key to the interaction is shared goals shared
assumption about the current situation
20Goal Hierarchy
- Learning by Observation perspective
- Unobservable mental reasoning of the expert
- Learning Perspective
- Bias hypothesis space
- learn agent problem reduced to learn goal
selection and termination - MI Perspective
- information exchange between the expert and the
agent
21Relevant Knowledge Specification
Prepare food
- Expert can mark important objects in a decision
22Rich Behavior Trace
- Expert specified undesired actions and goals
- Expert rejected actions and goals of the
approximately learned agent program
Watch TV
23Rich Behavior Trace
- Hypothetical Actions and Goals
- Situation history a tree structure of possible
behaviors
24Relational Learning by Observation
- Input
- Relational Situations
- Goal and action selections and rejections
- Additional annotations (i.e. important objects)
- Background knowledge
- Output
- Rule based agent program
- Learn goal/action selection/termination
- generalizing over multiple examples
- Inductive Logic Programming to combine rich
knowledge structures
25Relational Learning by Observation
26Relational Learning by Observation
Find the common structures in the decision
examples
27Relational Learning by Observation
Learn relations between what the agent wants,
perceives and knows.
Select a door in the current room, which leads
to a room that contains the item the agent wants
to get
28Comparing Redux to LBOAdvantages of Redux
- No real time constraints on behavior
- i.e. no waiting for a 2 hour long goal
- can be used to describe unlikely, but critical
situations - i.e. Lets assume that there is a nuclear
melt-down. - Richer annotation opportunities
- Increase learning speed and quality
- Faster focus where knowledge is lacked most
- Immediate expert feedback on how rules behave
29Comparing Redux to LBODisadvantages of Redux
- Cant learn low level behavior.
- Contains domain specific components
- Although most of Redux is domain independent
- Generating behavior may be slower.
- Additional annotations improve learning but
require extra expert effort
30Relational Behavior Trace
Behavior Trace The Set of
Situations in execution history
- A Situation
- a symbolic snapshot of the observed environment
at a time
31Annotated Behavior Traces
- Behavior is annotated with actions and goals
goto-room(r1), etc.
32Summary
- Diagrammatic behavior specification approach
- To extract rich behavior knowledge
- Interactive behavior specification
- Communication medium between the agents (explicit
goals and assumed situation) - Relational learning by observation approach to
combine multiple complex knowledge sources
33Future Work
- Improve mixed initiative interaction of the
interface - Explore domain independent diagrammatic interface
features - Allow the expert to enter context sensitive
knowledge
34Mixed initiative perspective
- Interactive behavior specification
- Diagrammatic representation of behavior
- communication medium between the agents
- Explicit goals and desired behavior
- Facilitates interaction between the agents