Title: Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
1Learning Behavioral Parameterization Using
Spatio-Temporal Case-Based Reasoning
- Maxim Likhachev, Michael Kaess, and Ronald C.
Arkin - Mobile Robot Laboratory
- Georgia Tech
This research was funded under the DARPA MARS
program.
2Motivation
- Constant parameterization of robotic behavior
results in inefficient robot performance - Manual selection of right parameters is
difficult and tedious work
3Motivation (contd)
- Use of Case-Based Reasoning (CBR) methodology
- an automatic selection of optimal parameters at
run-time (ICRA01) - each case is a set of behavioral parameters
indexed by environmental features
front-obstructed case
clear-to-goal case
4Motivation for the Current Research
- The CBR module
- improves robot performance (in simulations and on
real robots) - avoids the manual configuration of behavioral
parameters - The CBR module still required the creation of a
case library which - is dependent on a robot architecture
- needs extensive experimentation to optimize cases
- requires good understanding of how CBR works
- Solution to extend the CBR module to learn
- new cases from scratch or optimize existing cases
- in a separate training process or during missions
5Related Work
- Use of Case-Based Reasoning in the selection of
behavioral parameters - ACBARR Georgia Tech 92 , SINS Georgia Tech
93 - KINS Chagas and Hallam
- Automatic optimization of behavioral parameters
- genetic programming (e.g., GA-ROBOT Ram, et.
al.) - reinforcement learning (e.g., Learning Momentum
Lee, et. al.)
6Behavioral Control and CBR Module
- CBR Module controls (case output parameters)
- Weights for each behavior BiasMove Vector
- Noise Persistence Obstacle Sphere
7Case Indices Environmental Features
- Spatial features traversability vector
- split environment into K 4 angular regions
- compute obstacle density within each region
- transform the density into traversability
- Temporal features
- Short-term velocity towards the goal
- Long-term velocity towards the goal
8Overview of non-learning CBR Module
9Making CBR Module to Learn
set of spatially matching cases
Spatial Features Vector Matching (1st stage of
Case Selection)
Temporal Features Vector Matching (2nd stage of
Case Selection)
Feature Identification
spatial temporal feature vectors
current environment
set of spatially and temporally matching cases
all the cases in the library
Case output parameters ( behavioral assemblage
parameters)
Case Application
Random Selection Biased by Case Success and
Spatial and Temporal Similarities
Case Library
case ready for application
last K cases
last K cases with adjusted performance history
Case Adaptation
best matching case
new or existing best matching case
Case switching Decision tree
Old Case Performance Evaluation
New Case Creation (if necessary)
best matching or currently used case
best matching or currently used case
10Extensive Exploration of Cases Modified Case
Selection Process
- Random selection of cases with the probability of
the selection proportional to - spatial similarity with the environment ( 1st
step) - temporal similarity with the environment (2nd
step) - weighted sum of the case past performance and
spatial and temporal similarities (3rd step)
11Positive and Negative Reinforcement Case
Performance Evaluation
- Criteria for the evaluation of the case
performance - the average velocity with which the robot
approaches its goal during the application of the
case - opportunities for intermediate case performance
evaluations - may not always be the right criteria
- such cases exhibit no positive velocity towards
the goal - the evaluation of the performance is delayed by K
(2) cases - case_success (represents case performance) is
- increased if the average velocity is increased
or sustained high - decreased otherwise
12Maximization of Reinforcement Case Adaptation
- Maximize case_success as a noisy function of case
output parameters (behavioral assemblage
parameters) - maintain the adaptation vector A(C) for each case
C - if the last series of adaptations result in the
increase of case_success then continue the
adaptation - O(C) O(C) A(C)
- otherwise switch the direction of the adaptation,
add a random component and scale proportionally
to case_success - A(C) -?A(C) ? R
- O(C) O(C) A(C)
13Maximization of Reinforcement Case Adaptation
(contd)
- Incorporate prior knowledge into the search
- fixed adaptation of the Noise_Gain and
Noise_Persistence parameters based on the short-
and long-term velocities of the robot - Constrain the search
- limit Obstacle_Gain to be higher than the sum of
the other schema gains (to avoid collisions)
14The Growth of the Case Library Case Creation
Decision
- To avoid divergence a new case is created
whenever - case_success of the selected case is high and
spatial and temporal similarities with the
environment are low to moderate - case_success of the selected case is low to
moderate and spatial and temporal similarities
are low - Limit the maximum size of the library (10 in this
work) - New case is initialized with
- the spatial and temporal features of the
environment - the output parameter values of the selected case
15Experimental Analysis Example
Learning CBR first run (starting with an empty
library)
16Experimental Analysis Example
- Learning CBR a run after 54 training runs on
various environments - library of ten cases was learned
- 36 percent shorter travel distance
A case of a clear-to-goal strategy is learned
for such environments
A case of a squeezing strategy is learned for
such environments
17Experiments Statistical Results
- Simulation results (after 250 training runs for
learning CBR system)
Heterogeneous environment
Homogeneous environment
Mission completion rate
non-adaptive
CBR
learning CBR
non-adaptive
learning CBR
CBR
Average number of steps
learning CBR
non-adaptive
non-adapt.
CBR
CBR
learn
18Real Robot Experiments In Progress
- RWI ATRV-Jr
- Sensors
- SICK laser scanners in front and back
- Compass
- Gyroscope
- Experiments in progress, no statistical results
yet
19Conclusions
- New and existing cases are learned and optimized
during a training process or as part of mission
executions - Performance
- substantially better than that of a non-adaptive
system - comparable to a non-learning CBR system
- Neither manual selection of behavioral parameters
nor careful creation and optimization of case
library is required from a user - Future Work
- real robot experiments
- case forgetting component
- integration with other adaptation learning
methods (e.g., Learning Momentum, RL for
Behavioral Assemblage Selection)