Amir massoud Farahmand - Majid Nili Ahmadabadi - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Amir massoud Farahmand - Majid Nili Ahmadabadi

Description:

... have divided learning in BBS into these two parts: Structure ... Structure Learning in Subsumption Architecture as a good sample for BBS. Purely parallel case ... – PowerPoint PPT presentation

Number of Views:109

Avg rating:3.0/5.0

Slides: 33

Provided by: csUal

Category:

more less

Transcript and Presenter's Notes

Title: Amir massoud Farahmand - Majid Nili Ahmadabadi

1
Behavior Hierarchy Learning in a Behavior-based
System using Reinforcement Learning

Amir massoud Farahmand - Majid Nili Ahmadabadi
Babak Najar Araabi
farahmand_at_ipm.ir, mnili, araabi_at_ut.ac.ir

Department of Electrical and Computer
Engineering University of Tehran Iran
2
Paper Outline

Challenges and Requirements of Robotic Systems
Behavior-based Approach to AI
How should we design a Behavior-based System
(BBS)?!
Learning in BBS
Structure Learning in BBS
Value Function Decomposition
Experiments Multi-Robot Object Lifting
Conclusions, Ongoing Research, and Future Work

3
Challenges and Requirements of Robotic Systems

Challenges
Sensor and Effector Uncertainty
Partial Observability
Non-Stationarity

Requirements
(among many others)
Multi-goal
Robustness
Multiple Sensors
Scalability
Automatic design
Learning

4
Behavior-based Approach to AI

Behavior-based approach as a good candidate for
low-level intelligence.
Behavioral (activity) decomposition
against functional decomposition
Behavior Sensor-gtAction (Direct link between
perception and action)
Situatedness
Situatedness motto The world is its own best
model!
Embodiment
Intelligence as Emergence
(interaction of agent with environment)

5
Behavioral decomposition
avoid obstacles
manipulate the world
sensors
actuators
build maps
explore
locomote
6
Behavior-based System Design

Hand Design
Common in almost everywhere (just ask some people
in IROS04)
Complicated may be infeasible in complex
problems
Even if it is possible to find a working system,
probably it is not optimal.
Evolution
Time consuming
Good solutions can be found
Biologically feasible
Learning
Biologically feasible
Learning is essential for life-time survival of
the agent.
We have focuses on learning in this presentation.

7
The Importance of Learning

Unknown environment/body
exact Model of environment/body is not known
Non-stationary environment/body
Changing environment (offices, houses, streets,
and almost everywhere)
Aging
Designer may not know how to benefit from every
aspects of her agent/environment
Lets the agent learn it by itself (learning as
optimization)
etc

8
Learning in Behavior-based Systems

There are a few works on behavior-based learning
Mataric, Mahadevan, Maes, and ...
but there is no deep investigation about it
(specially mathematical formulation)!

9
Learning in Behavior-based Systems

There are different methods of learning with
different viewpoints, but we have concentrated on
Reinforcement Learning.
Agent Did I perform it correctly?!
Tutor Yes/No!

10
Learning in Behavior-based Systems

We have divided learning in BBS into these two
parts
Structure Learning
How should we organize behaviors in the
architecture assume having a repertoire of
working behaviors
Behavior Learning
How should each behavior behave? (we do not have
a necessary toolbox)

11
Structure Learning Assumptions

Structure Learning in Subsumption Architecture as
a good sample for BBS
Purely parallel case
We know B1, B2, and but we do not know how to
arrange them in the architecture
we know how to avoid obstacles, pick an object,
stop, move forward, turn, but we dont know
which one is superior to others.

12
Structure Learning
build maps
explore
manipulate the world
The agent wants to learn how to arrange these
behaviors in order to get maximum reward from its
environment (or tutor).
locomote
avoid obstacles
Behavior Toolbox
13
Structure Learning
build maps
explore
manipulate the world
locomote
avoid obstacles
Behavior Toolbox
14
Structure Learning
build maps
manipulate the world
explore
locomote
avoid obstacles
1-explore becomes controlling behavior and
suppress avoid obstacles 2-The agent hits a wall!
Behavior Toolbox
15
Structure Learning
build maps
manipulate the world
explore
locomote
avoid obstacles
Tutor (environment) gives explore a punishment
for its being in that place of the structure.
Behavior Toolbox
16
Structure Learning
build maps
manipulate the world
explore
locomote
avoid obstacles
explore is not a very good behavior for the
highest position of the structure. So it is
replaced by avoid obstacles.
Behavior Toolbox
17
Structure Learning Issues

How should we represent structure?
Sufficient (Concept space should be covered by
Hypothesis space)
Tractable (small Hypothesis space)
Well-defined credit assignment
How should we assign credits to architecture?
If the agent receives a reward/punishment, how
should we reward/punish structure of the
architecture?

18
Value Function Decomposition and Structure
Learning

Each structure has a value regarding its
receiving reinforcement signal.

The objective is finding a structure T with a
high value.
We have decomposed value function to simpler
components that enable us to benefit from
previous experiments.

19
Value Function Decomposition

It is possible to decompose total systems value
to value of each behavior in each layer.
We call it Zero-Order method.

20
Value Function DecompositionZero Order Method

It stores the value of behavior-being in a
specific layer.

ZO Value Table in the agents mind
avoid obstacles (0.8)
explore (0.7)
locomote (0.4)
Higher layer
avoid obstacles (0.6)
explore (0.9)
locomote (0.4)
Lower layer
21
Credit Assignment forZero Order Method

Controlling behavior is the only responsible
behavior for the current reinforcement signal.
Appropriate ZO value table updating method is
available.

22
Value Function DecompositionAnother Method
(First Order)

It stores the value of relative order of
behaviors
How much is it good/bad if B1 is being placed
higher than B2?!
V(avoid obstaclesgtexplore) 0.8
V(exploregtavoid obstacles) -0.3
Sorry! Not that easy (and informative) to show
graphically!!
Credits are assigned to all (controlling,
activated) pairs of behaviors.
The agent receives reward while B1 is controlling
and B3 and B5 are activated
(B1gtB3)
(B1gtB5)

23
Structure Representation

Both of these methods are provided with a lot of
probabilistic reasoning which shows how to
decompose total system value to simple components
assign credits
update values table
Check the Proceeding for Mathematical Formulation!

24
Example Multi-RobotObject Lifting

A Group of three robots want to lift an object
using their own local sensors
No central control
No communication
Local sensors
Objectives
Reaching prescribed height
Keeping tilt angle small

25
Example Multi-RobotObject Lifting
Push More
?!
Hurry Up
Stop
Slow Down
Dont Go Fast
Behavior Toolbox
26
Example Multi-RobotObject Lifting
Sample shot of tilt angle of the object after
sufficient learning
27
Example Multi-RobotObject Lifting
Sample shot of height of each robot after
sufficient learning
28
Example Multi-RobotObject Lifting
Sample shot of tilt angle of the object after
sufficient learning
29
Conclusions, Ongoing Research, and Future Work

We have devised two different methods for
structure learning for behavior-based system.
Good results in two different tasks
Multi-robot Object Lifting
An Abstract Problem (not reported yet)

30
Conclusions, Ongoing Research, and Future Work

but from where should we find necessary
behaviors?!
Behavior Learning
We have devised some methods for behavior
learning which will be reported soon.

31
Conclusions, Ongoing Research, and Future Work

However, there are many steps remained for fully
automated agent design
How should we generate new behaviors without even
knowing which sensory information is necessary
for the task (feature selection)
Problem of Reinforcement Signal Design
Designing a good reinforcement signal is not easy
at all.

32
Questions?!

Write a Comment

User Comments (0)