Title: Amir massoud Farahmand - Majid Nili Ahmadabadi
1Behavior Hierarchy Learning in a Behavior-based
System using Reinforcement Learning
- Amir massoud Farahmand - Majid Nili Ahmadabadi
- Babak Najar Araabi
- farahmand_at_ipm.ir, mnili, araabi_at_ut.ac.ir
Department of Electrical and Computer
Engineering University of Tehran Iran
2Paper Outline
- Challenges and Requirements of Robotic Systems
- Behavior-based Approach to AI
- How should we design a Behavior-based System
(BBS)?! - Learning in BBS
- Structure Learning in BBS
- Value Function Decomposition
- Experiments Multi-Robot Object Lifting
- Conclusions, Ongoing Research, and Future Work
3Challenges and Requirements of Robotic Systems
- Challenges
- Sensor and Effector Uncertainty
- Partial Observability
- Non-Stationarity
- Requirements
- (among many others)
- Multi-goal
- Robustness
- Multiple Sensors
- Scalability
- Automatic design
- Learning
4Behavior-based Approach to AI
- Behavior-based approach as a good candidate for
low-level intelligence. - Behavioral (activity) decomposition
- against functional decomposition
- Behavior Sensor-gtAction (Direct link between
perception and action) - Situatedness
- Situatedness motto The world is its own best
model! - Embodiment
- Intelligence as Emergence
- (interaction of agent with environment)
5Behavioral decomposition
avoid obstacles
manipulate the world
sensors
actuators
build maps
explore
locomote
6Behavior-based System Design
- Hand Design
- Common in almost everywhere (just ask some people
in IROS04) - Complicated may be infeasible in complex
problems - Even if it is possible to find a working system,
probably it is not optimal. - Evolution
- Time consuming
- Good solutions can be found
- Biologically feasible
- Learning
- Biologically feasible
- Learning is essential for life-time survival of
the agent. - We have focuses on learning in this presentation.
7The Importance of Learning
- Unknown environment/body
- exact Model of environment/body is not known
- Non-stationary environment/body
- Changing environment (offices, houses, streets,
and almost everywhere) - Aging
- Designer may not know how to benefit from every
aspects of her agent/environment - Lets the agent learn it by itself (learning as
optimization) - etc
8Learning in Behavior-based Systems
- There are a few works on behavior-based learning
- Mataric, Mahadevan, Maes, and ...
- but there is no deep investigation about it
(specially mathematical formulation)!
9Learning in Behavior-based Systems
- There are different methods of learning with
different viewpoints, but we have concentrated on
Reinforcement Learning. - Agent Did I perform it correctly?!
- Tutor Yes/No!
10Learning in Behavior-based Systems
- We have divided learning in BBS into these two
parts - Structure Learning
- How should we organize behaviors in the
architecture assume having a repertoire of
working behaviors - Behavior Learning
- How should each behavior behave? (we do not have
a necessary toolbox)
11Structure Learning Assumptions
- Structure Learning in Subsumption Architecture as
a good sample for BBS - Purely parallel case
- We know B1, B2, and but we do not know how to
arrange them in the architecture - we know how to avoid obstacles, pick an object,
stop, move forward, turn, but we dont know
which one is superior to others.
12Structure Learning
build maps
explore
manipulate the world
The agent wants to learn how to arrange these
behaviors in order to get maximum reward from its
environment (or tutor).
locomote
avoid obstacles
Behavior Toolbox
13Structure Learning
build maps
explore
manipulate the world
locomote
avoid obstacles
Behavior Toolbox
14Structure Learning
build maps
manipulate the world
explore
locomote
avoid obstacles
1-explore becomes controlling behavior and
suppress avoid obstacles 2-The agent hits a wall!
Behavior Toolbox
15Structure Learning
build maps
manipulate the world
explore
locomote
avoid obstacles
Tutor (environment) gives explore a punishment
for its being in that place of the structure.
Behavior Toolbox
16Structure Learning
build maps
manipulate the world
explore
locomote
avoid obstacles
explore is not a very good behavior for the
highest position of the structure. So it is
replaced by avoid obstacles.
Behavior Toolbox
17Structure Learning Issues
- How should we represent structure?
- Sufficient (Concept space should be covered by
Hypothesis space) - Tractable (small Hypothesis space)
- Well-defined credit assignment
- How should we assign credits to architecture?
- If the agent receives a reward/punishment, how
should we reward/punish structure of the
architecture?
18Value Function Decomposition and Structure
Learning
- Each structure has a value regarding its
receiving reinforcement signal.
- The objective is finding a structure T with a
high value. - We have decomposed value function to simpler
components that enable us to benefit from
previous experiments.
19Value Function Decomposition
- It is possible to decompose total systems value
to value of each behavior in each layer. - We call it Zero-Order method.
20Value Function DecompositionZero Order Method
- It stores the value of behavior-being in a
specific layer.
ZO Value Table in the agents mind
avoid obstacles (0.8)
explore (0.7)
locomote (0.4)
Higher layer
avoid obstacles (0.6)
explore (0.9)
locomote (0.4)
Lower layer
21Credit Assignment forZero Order Method
- Controlling behavior is the only responsible
behavior for the current reinforcement signal. - Appropriate ZO value table updating method is
available.
22Value Function DecompositionAnother Method
(First Order)
- It stores the value of relative order of
behaviors - How much is it good/bad if B1 is being placed
higher than B2?! - V(avoid obstaclesgtexplore) 0.8
- V(exploregtavoid obstacles) -0.3
- Sorry! Not that easy (and informative) to show
graphically!! - Credits are assigned to all (controlling,
activated) pairs of behaviors. - The agent receives reward while B1 is controlling
and B3 and B5 are activated - (B1gtB3)
- (B1gtB5)
23Structure Representation
- Both of these methods are provided with a lot of
probabilistic reasoning which shows how to - decompose total system value to simple components
- assign credits
- update values table
- Check the Proceeding for Mathematical Formulation!
24Example Multi-RobotObject Lifting
- A Group of three robots want to lift an object
using their own local sensors - No central control
- No communication
- Local sensors
- Objectives
- Reaching prescribed height
- Keeping tilt angle small
25Example Multi-RobotObject Lifting
Push More
?!
Hurry Up
Stop
Slow Down
Dont Go Fast
Behavior Toolbox
26Example Multi-RobotObject Lifting
Sample shot of tilt angle of the object after
sufficient learning
27Example Multi-RobotObject Lifting
Sample shot of height of each robot after
sufficient learning
28Example Multi-RobotObject Lifting
Sample shot of tilt angle of the object after
sufficient learning
29Conclusions, Ongoing Research, and Future Work
- We have devised two different methods for
structure learning for behavior-based system. - Good results in two different tasks
- Multi-robot Object Lifting
- An Abstract Problem (not reported yet)
30Conclusions, Ongoing Research, and Future Work
- but from where should we find necessary
behaviors?! - Behavior Learning
- We have devised some methods for behavior
learning which will be reported soon.
31Conclusions, Ongoing Research, and Future Work
- However, there are many steps remained for fully
automated agent design - How should we generate new behaviors without even
knowing which sensory information is necessary
for the task (feature selection) - Problem of Reinforcement Signal Design
- Designing a good reinforcement signal is not easy
at all.
32Questions?!