Learning Behavioral Parameterization Using SpatioTemporal CaseBased Reasoning - PowerPoint PPT Presentation

About This Presentation

Title:

Learning Behavioral Parameterization Using SpatioTemporal CaseBased Reasoning

Description:

Maxim Likhachev, Michael Kaess, and Ronald C. Arkin. Mobile Robot Laboratory. Georgia Tech ... Maxim Likhachev, Michael Kaess, and Ronald C. Arkin. 2 ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 20

Provided by: maximli

Learn more at: https://sites.cc.gatech.edu

Category:

more less

Transcript and Presenter's Notes

Title: Learning Behavioral Parameterization Using SpatioTemporal CaseBased Reasoning

1
Learning Behavioral Parameterization Using
Spatio-Temporal Case-Based Reasoning

Maxim Likhachev, Michael Kaess, and Ronald C.
Arkin
Mobile Robot Laboratory
Georgia Tech

This research was funded under the DARPA MARS
program.
2
Motivation

Constant parameterization of robotic behavior
results in inefficient robot performance
Manual selection of right parameters is
difficult and tedious work

3
Motivation (contd)

Use of Case-Based Reasoning (CBR) methodology
an automatic selection of optimal parameters at
run-time (ICRA01)
each case is a set of behavioral parameters
indexed by environmental features

front-obstructed case
clear-to-goal case
4
Motivation for the Current Research

The CBR module
improves robot performance (in simulations and on
real robots)
avoids the manual configuration of behavioral
parameters
The CBR module still required the creation of a
case library which
is dependent on a robot architecture
needs extensive experimentation to optimize cases
requires good understanding of how CBR works
Solution to extend the CBR module to learn
new cases from scratch or optimize existing cases
in a separate training process or during missions

5
Related Work

Use of Case-Based Reasoning in the selection of
behavioral parameters
ACBARR Georgia Tech 92 , SINS Georgia Tech
93
KINS Chagas and Hallam
Automatic optimization of behavioral parameters
genetic programming (e.g., GA-ROBOT Ram, et.
al.)
reinforcement learning (e.g., Learning Momentum
Lee, et. al.)

6
Behavioral Control and CBR Module

CBR Module controls (case output parameters)
Weights for each behavior BiasMove Vector
Noise Persistence Obstacle Sphere

7
Case Indices Environmental Features

Spatial features traversability vector
split environment into K 4 angular regions
compute obstacle density within each region
transform the density into traversability

Temporal features
Short-term velocity towards the goal
Long-term velocity towards the goal

8
Overview of non-learning CBR Module
9
Making CBR Module to Learn
set of spatially matching cases
Spatial Features Vector Matching (1st stage of
Case Selection)
Temporal Features Vector Matching (2nd stage of
Case Selection)
Feature Identification
spatial temporal feature vectors
current environment
set of spatially and temporally matching cases
all the cases in the library
Case output parameters ( behavioral assemblage
parameters)
Case Application
Random Selection Biased by Case Success and
Spatial and Temporal Similarities
Case Library
case ready for application
last K cases
last K cases with adjusted performance history
Case Adaptation
best matching case
new or existing best matching case
Case switching Decision tree
Old Case Performance Evaluation
New Case Creation (if necessary)
best matching or currently used case
best matching or currently used case
10
Extensive Exploration of Cases Modified Case
Selection Process

Random selection of cases with the probability of
the selection proportional to
spatial similarity with the environment ( 1st
step)
temporal similarity with the environment (2nd
step)
weighted sum of the case past performance and
spatial and temporal similarities (3rd step)

11
Positive and Negative Reinforcement Case
Performance Evaluation

Criteria for the evaluation of the case
performance
the average velocity with which the robot
approaches its goal during the application of the
case
opportunities for intermediate case performance
evaluations
may not always be the right criteria
such cases exhibit no positive velocity towards
the goal
the evaluation of the performance is delayed by K
(2) cases
case_success (represents case performance) is
increased if the average velocity is increased
or sustained high
decreased otherwise

12
Maximization of Reinforcement Case Adaptation

Maximize case_success as a noisy function of case
output parameters (behavioral assemblage
parameters)
maintain the adaptation vector A(C) for each case
C
if the last series of adaptations result in the
increase of case_success then continue the
adaptation
O(C) O(C) A(C)
otherwise switch the direction of the adaptation,
add a random component and scale proportionally
to case_success
A(C) -?A(C) ? R
O(C) O(C) A(C)

13
Maximization of Reinforcement Case Adaptation
(contd)

Incorporate prior knowledge into the search
fixed adaptation of the Noise_Gain and
Noise_Persistence parameters based on the short-
and long-term velocities of the robot
Constrain the search
limit Obstacle_Gain to be higher than the sum of
the other schema gains (to avoid collisions)

14
The Growth of the Case Library Case Creation
Decision

To avoid divergence a new case is created
whenever
case_success of the selected case is high and
spatial and temporal similarities with the
environment are low to moderate
case_success of the selected case is low to
moderate and spatial and temporal similarities
are low
Limit the maximum size of the library (10 in this
work)
New case is initialized with
the spatial and temporal features of the
environment
the output parameter values of the selected case

15
Experimental Analysis Example
Learning CBR first run (starting with an empty
library)
16
Experimental Analysis Example

Learning CBR a run after 54 training runs on
various environments
library of ten cases was learned
36 percent shorter travel distance

A case of a clear-to-goal strategy is learned
for such environments
A case of a squeezing strategy is learned for
such environments
17
Experiments Statistical Results

Simulation results (after 250 training runs for
learning CBR system)

Heterogeneous environment
Homogeneous environment
Mission completion rate
non-adaptive
CBR
learning CBR
non-adaptive
learning CBR
CBR
Average number of steps
learning CBR
non-adaptive
non-adapt.
CBR
CBR
learn
18
Real Robot Experiments In Progress

RWI ATRV-Jr
Sensors
SICK laser scanners in front and back
Compass
Gyroscope
Experiments in progress, no statistical results
yet

19
Conclusions

New and existing cases are learned and optimized
during a training process or as part of mission
executions
Performance
substantially better than that of a non-adaptive
system
comparable to a non-learning CBR system
Neither manual selection of behavioral parameters
nor careful creation and optimization of case
library is required from a user
Future Work
real robot experiments
case forgetting component
integration with other adaptation learning
methods (e.g., Learning Momentum, RL for
Behavioral Assemblage Selection)