Amir massoud Farahmand - Majid Nili Ahmadabadi - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Amir massoud Farahmand - Majid Nili Ahmadabadi

Description:

... have divided learning in BBS into these two parts: Structure ... Structure Learning in Subsumption Architecture as a good sample for BBS. Purely parallel case ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 33
Provided by: csUal
Category:

less

Transcript and Presenter's Notes

Title: Amir massoud Farahmand - Majid Nili Ahmadabadi


1
Behavior Hierarchy Learning in a Behavior-based
System using Reinforcement Learning
  • Amir massoud Farahmand - Majid Nili Ahmadabadi
  • Babak Najar Araabi
  • farahmand_at_ipm.ir, mnili, araabi_at_ut.ac.ir

Department of Electrical and Computer
Engineering University of Tehran Iran
2
Paper Outline
  • Challenges and Requirements of Robotic Systems
  • Behavior-based Approach to AI
  • How should we design a Behavior-based System
    (BBS)?!
  • Learning in BBS
  • Structure Learning in BBS
  • Value Function Decomposition
  • Experiments Multi-Robot Object Lifting
  • Conclusions, Ongoing Research, and Future Work

3
Challenges and Requirements of Robotic Systems
  • Challenges
  • Sensor and Effector Uncertainty
  • Partial Observability
  • Non-Stationarity
  • Requirements
  • (among many others)
  • Multi-goal
  • Robustness
  • Multiple Sensors
  • Scalability
  • Automatic design
  • Learning

4
Behavior-based Approach to AI
  • Behavior-based approach as a good candidate for
    low-level intelligence.
  • Behavioral (activity) decomposition
  • against functional decomposition
  • Behavior Sensor-gtAction (Direct link between
    perception and action)
  • Situatedness
  • Situatedness motto The world is its own best
    model!
  • Embodiment
  • Intelligence as Emergence
  • (interaction of agent with environment)

5
Behavioral decomposition
avoid obstacles
manipulate the world
sensors
actuators
build maps
explore
locomote
6
Behavior-based System Design
  • Hand Design
  • Common in almost everywhere (just ask some people
    in IROS04)
  • Complicated may be infeasible in complex
    problems
  • Even if it is possible to find a working system,
    probably it is not optimal.
  • Evolution
  • Time consuming
  • Good solutions can be found
  • Biologically feasible
  • Learning
  • Biologically feasible
  • Learning is essential for life-time survival of
    the agent.
  • We have focuses on learning in this presentation.

7
The Importance of Learning
  • Unknown environment/body
  • exact Model of environment/body is not known
  • Non-stationary environment/body
  • Changing environment (offices, houses, streets,
    and almost everywhere)
  • Aging
  • Designer may not know how to benefit from every
    aspects of her agent/environment
  • Lets the agent learn it by itself (learning as
    optimization)
  • etc

8
Learning in Behavior-based Systems
  • There are a few works on behavior-based learning
  • Mataric, Mahadevan, Maes, and ...
  • but there is no deep investigation about it
    (specially mathematical formulation)!

9
Learning in Behavior-based Systems
  • There are different methods of learning with
    different viewpoints, but we have concentrated on
    Reinforcement Learning.
  • Agent Did I perform it correctly?!
  • Tutor Yes/No!

10
Learning in Behavior-based Systems
  • We have divided learning in BBS into these two
    parts
  • Structure Learning
  • How should we organize behaviors in the
    architecture assume having a repertoire of
    working behaviors
  • Behavior Learning
  • How should each behavior behave? (we do not have
    a necessary toolbox)

11
Structure Learning Assumptions
  • Structure Learning in Subsumption Architecture as
    a good sample for BBS
  • Purely parallel case
  • We know B1, B2, and but we do not know how to
    arrange them in the architecture
  • we know how to avoid obstacles, pick an object,
    stop, move forward, turn, but we dont know
    which one is superior to others.

12
Structure Learning
build maps
explore
manipulate the world
The agent wants to learn how to arrange these
behaviors in order to get maximum reward from its
environment (or tutor).
locomote
avoid obstacles
Behavior Toolbox
13
Structure Learning
build maps
explore
manipulate the world
locomote
avoid obstacles
Behavior Toolbox
14
Structure Learning
build maps
manipulate the world
explore
locomote
avoid obstacles
1-explore becomes controlling behavior and
suppress avoid obstacles 2-The agent hits a wall!
Behavior Toolbox
15
Structure Learning
build maps
manipulate the world
explore
locomote
avoid obstacles
Tutor (environment) gives explore a punishment
for its being in that place of the structure.
Behavior Toolbox
16
Structure Learning
build maps
manipulate the world
explore
locomote
avoid obstacles
explore is not a very good behavior for the
highest position of the structure. So it is
replaced by avoid obstacles.
Behavior Toolbox
17
Structure Learning Issues
  • How should we represent structure?
  • Sufficient (Concept space should be covered by
    Hypothesis space)
  • Tractable (small Hypothesis space)
  • Well-defined credit assignment
  • How should we assign credits to architecture?
  • If the agent receives a reward/punishment, how
    should we reward/punish structure of the
    architecture?

18
Value Function Decomposition and Structure
Learning
  • Each structure has a value regarding its
    receiving reinforcement signal.
  • The objective is finding a structure T with a
    high value.
  • We have decomposed value function to simpler
    components that enable us to benefit from
    previous experiments.

19
Value Function Decomposition
  • It is possible to decompose total systems value
    to value of each behavior in each layer.
  • We call it Zero-Order method.

20
Value Function DecompositionZero Order Method
  • It stores the value of behavior-being in a
    specific layer.

ZO Value Table in the agents mind
avoid obstacles (0.8)
explore (0.7)
locomote (0.4)
Higher layer
avoid obstacles (0.6)
explore (0.9)
locomote (0.4)
Lower layer
21
Credit Assignment forZero Order Method
  • Controlling behavior is the only responsible
    behavior for the current reinforcement signal.
  • Appropriate ZO value table updating method is
    available.

22
Value Function DecompositionAnother Method
(First Order)
  • It stores the value of relative order of
    behaviors
  • How much is it good/bad if B1 is being placed
    higher than B2?!
  • V(avoid obstaclesgtexplore) 0.8
  • V(exploregtavoid obstacles) -0.3
  • Sorry! Not that easy (and informative) to show
    graphically!!
  • Credits are assigned to all (controlling,
    activated) pairs of behaviors.
  • The agent receives reward while B1 is controlling
    and B3 and B5 are activated
  • (B1gtB3)
  • (B1gtB5)

23
Structure Representation
  • Both of these methods are provided with a lot of
    probabilistic reasoning which shows how to
  • decompose total system value to simple components
  • assign credits
  • update values table
  • Check the Proceeding for Mathematical Formulation!

24
Example Multi-RobotObject Lifting
  • A Group of three robots want to lift an object
    using their own local sensors
  • No central control
  • No communication
  • Local sensors
  • Objectives
  • Reaching prescribed height
  • Keeping tilt angle small

25
Example Multi-RobotObject Lifting
Push More
?!
Hurry Up
Stop
Slow Down
Dont Go Fast
Behavior Toolbox
26
Example Multi-RobotObject Lifting
Sample shot of tilt angle of the object after
sufficient learning
27
Example Multi-RobotObject Lifting
Sample shot of height of each robot after
sufficient learning
28
Example Multi-RobotObject Lifting
Sample shot of tilt angle of the object after
sufficient learning
29
Conclusions, Ongoing Research, and Future Work
  • We have devised two different methods for
    structure learning for behavior-based system.
  • Good results in two different tasks
  • Multi-robot Object Lifting
  • An Abstract Problem (not reported yet)

30
Conclusions, Ongoing Research, and Future Work
  • but from where should we find necessary
    behaviors?!
  • Behavior Learning
  • We have devised some methods for behavior
    learning which will be reported soon.

31
Conclusions, Ongoing Research, and Future Work
  • However, there are many steps remained for fully
    automated agent design
  • How should we generate new behaviors without even
    knowing which sensory information is necessary
    for the task (feature selection)
  • Problem of Reinforcement Signal Design
  • Designing a good reinforcement signal is not easy
    at all.

32
Questions?!
Write a Comment
User Comments (0)
About PowerShow.com