Title: Information Processing Technology Office
1 Information Processing Technology
Office Learning Workshop April 12, 2004 Seedling
Overview Learning Hierarchical Reactive
Skills from Reasoning and Experience Institute
for the Study of Learning and Expertise PI Pat
Langley Presenter Ray Mooney
2Learning Objective
- Develop learning methods that operate over rich
knowledge structures which - support both reactive control and problem solving
- are embedded in an integrated cognitive
architecture - that operates in complex physical environments
- Learning mechanisms can acquire and revise such
knowledge more rapidly and effectively than human
programmers can create and debug it manually.
3What is Being Learned?
- ICARUS is an integrated cognitive architecture
that learns - the logical structure of relational skills and
concepts - a hierarchical organization over these elements
- numeric utility functions attached to skills and
concepts - that describe effective means for achieving goals
- that support reactive control of physical agents
- from background knowledge, experience with
executing skills in the environment, and problem
solving - in an incremental, cumulative manner that
responds to changes in tasks and the environment. - This relies on the tight integration of
execution, problem - solving, and learning.
4What is Being Learned?
- For example, in a driving domain, ICARUS would
learn - the structure of driving skills like turning and
passing - the structure of driving concepts (e.g.,
passable) - hierarchical connections (e.g., pass and
change-lanes) - how to achieve high-level goals (e.g., package
delivery) - how to get from one place to another (route
knowledge) - the expected utility of driving skills and
subskills - the expected utility of driving concepts
- This different content is cast within a unified
formalism - that ICARUS provides for encoding knowledge.
5How is Knowledge Being Learned?
- The ICARUS architecture learns
- value functions using a hierarchical variant on
model-based reinforcement learning - new skills and concepts based on the cached
results of means-ends problem solving. - Learning and reasoning are integrated in that
- conceptual inference and hierarchical skills
provide high-level descriptions for reinforcement
learning - problem-solving traces form the basis of new
skills and concepts. - Learning is automatic but could be adapted to
benefit - from advice and traces of expert behavior.
- Structure learning occurs from single instances
value - learning should be much faster than in typical
methods.
6How is Knowledge Being Learned?
Means-ends analysis produces hierarchical skills
E
D
C
10
11
8
9
1
B
A
6
7
2
3
5
4
7How is the Knowledge Represented?
- ICARUS casts both background and learned
knowledge as - logical relational concepts with linear value
functions - logical relational skills with linear value
functions - that are defined in terms of other skills and
concepts. - Background knowledge constrains the learning of
value - functions and provides components for structure
learning. - ICARUS provides a formalism for encoding
knowledge - about physical domains with continuous attributes
- that includes probability of success, expected
duration, and resource requirements - described at multiple levels of abstraction over
both state (with concepts) and time (with skills).
8How is the Knowledge Represented?
(make-right-turn (?self ?corner)
objective ((behind-right-corner ?corner))
start ((in-rightmost-lane ?self)
(ahead-right-corner ?corner) (at-turning-distanc
e ?corner)) requires ((near-block-corner
?corner) (at-turning-speed ?self))
ordered ((begin-right-turn ?self ?corner)
(end-right-turn ?self ?corner)) value (30.0) )
(slow-for-intersection (?self)
percepts ((self ?self speed ?speed) (corner
?corner street-dist ?dist)) objective ((slow-en
ough-intersection ?self)) requires ((near-block-
corner ?corner) actions ((slow-down))
value ( ( -5.2 ?dist) ( 20.3 ?speed)) )
Some ICARUS driving skills
9How is the Knowledge Represented?
(corner-ahead-left (?corner) percepts ((corner
?corner r ?r theta ?theta)) tests ((lt ?theta
0) (gt ?theta -1.571)) value (( ( 5.6 ?r)
( 3.1 ?theta)) ) (in-intersection (?self)
percepts ((self ?self) (corner ?ncorner
street-dist ?sdist)) positives ((near-block-corn
er ?ncorner) (corner-straight-ahead ?scorner))
negatives ((far-block-corner ?fcorner))
tests ((lt ?sdist 0.0)) value (-10.0) )
Some ICARUS driving concepts
10What is the Domain?
- Our initial studies of ICARUS have focused on a
simulated - in-city driving environment that
- requires integration of perception, action, and
cognition - involves both reactive control and goal direction
- supports many distinct tasks of varying
complexity - provides clear opportunities for cumulative
learning - The environment lets us vary domain
characteristics - systematically and record statistics on agent
behavior. - However, ICARUS aims at broad generality and
should - support reasoning and learning in
- both first-person and strategy games
- crisis-response tasks involving physical response
- intelligent assistants for office activities
11How is Progress Being Measured?
- Dependent variables
- Efficiency of task execution (e.g., driving time)
- Quality of task execution (e.g., gas used,
accidents) - Higher-order metrics
- Rate and asymptote of learning curves
- Transfer to related tasks and altered
environments - Independent variables
- Inclusion or omission of learning methods
- Amount of background knowledge available
- Task difficulty and environmental complexity
- Amount of experience, task/environment similarity
12What are the Technical Milestones?
- Year 1
- Learn estimates of driving skills duration and
success - Learn higher-level driving skills via problem
solving - Demonstrate improvement on multi-package delivery
- Year 2
- Learn value functions for driving concepts
- Learn trade-offs among many high-level tasks
- Demonstrate transfer and scaling to complex tasks
- Year 3
- Acquisition of place and route knowledge
- Support episodic memory and perceptual attention
- Demonstrate cumulative learning and change
resilience - We will also examine other domains to ensure
generality.