Hierarchical POMDP Solutions - PowerPoint PPT Presentation

About This Presentation

Title:

Hierarchical POMDP Solutions

Description:

Belief states constitute a sufficient statistic for making decisions (Markov ... Usually agents don't require the entire belief space ... – PowerPoint PPT presentation

Number of Views:82

Avg rating:3.0/5.0

Slides: 60

Provided by: theo1

Learn more at: https://people.csail.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Hierarchical POMDP Solutions

1
Hierarchical POMDP Solutions

Georgios Theocharous

2
Sequential Decision Making Under Uncertainty
What is the optimal policy?
3
Manufacturing Processes(Mahadevan, Theocharous
FLAIRS 98)

Reward
Reward for consuming
Penalize for filling buffers
Penalize for machine breakdown

Actions
Produce
Maintenance

What is the optimal policy?

4
Foveated Active Vision(Minut)

States
Objects

Observations
Local features

Reward
Reward for finding object

Actions
Where to saccade next
What features to use

What is the optimal policy?

5
Many More Partially Observable Problems

Assistive technologies
Web searching, preference elicitation
Sophisticated Computing
Distributed file access, Network trouble-shooting
Industrial
Machine maintenance, manufacturing processes
Social
Education, medical diagnosis, health care
policymaking
Corporate
Marketing, corporate policy
.

6
Overview

Learning models of partially observable problems
is far from a solved problem
Computing policies for partially observable
domains is intractable
We Propose hierarchical solutions
Learn models using less space and time
Compute robust policies that cannot be computed
by previous approaches

7
How?Spatial and Time Abstractions Reduce
Uncertainty
Spatial abstraction
MIT
Temporal abstraction
8
Outline

Sequential decision-making under uncertainty
A Hierarchical POMDP model for robot navigation
Heuristic macro-action selection in H-POMDPs
Near Optimal macro-action selection for arbitrary
POMDPs
Representing H-POMDPs as DBNs
Current and Future directions

9
A Real System Robot Navigation
10
Belief States(Probability Distributions over
states)
True State
Belief State
11
Belief States(Probability Distributions over
states)
True State
Belief State
12
Belief States(Probability Distributions over
states)
True State
Belief State
13
Learning POMDPs

Given As and Zs compute
Ts and Os
Estimate probability distribution
over hidden states
Count number of times a state
was visited
Update T and O and repeat.
It is an Expectation Maximization algorithm
An iterative procedure for doing maximum
likelihood parameter estimation over hidden state
variables
Converges to local maxima

A1
A2
T(S1i,A1a,S2j)
S1
S3
S2
Z1
Z2
Z3
O(O2z,S2i,A1a)
14
Planning in POMDPs

Belief states constitute a sufficient statistic
for making decisions (Markov property holds
Astrom 1965)
Bellman equation

Since we have an infinite state space, the
problem becomes computationally
intractable (PSPACE hard for finite
horizon) (UNDECIDABLE for infinite horizon)
15
Our SolutionSpatial and Temporal Abstraction

Learning
A hierarchical Baum-Welch algorithm, which is
derived from the Baum-Welch algorithm for
training HHMMs (with Rohanimanesh and Mahadevan,
ICRA 2001)
Structure learning from weak priors (with
Mahadevan IROS 2002)
Inference can be done in linear time by
representing H-POMDPs as Dynamic Bayesian
Networks (DBNs) (with Murphy and Kaelbling, ICRA
2004)
Planning
Heuristic macro-action selection (with Mahadevan,
ICRA 2002)
Near optimal macro-action selection (with
Kaelbling, NIPS 2003)
Structure Learning and Planning combined
Dynamic POMDP abstractions (with Mannor and
Kaelbling)

16
Outline

Sequential decision-making under uncertainty
A Hierarchical POMDP model for robot navigation
Heuristic macro-action selection in H-POMDPs
Near Optimal macro-action selection for arbitrary
POMDPs
Representing H-POMDPs as DBNs
Current and Future directions

17
Hierarchical POMDPs
18
Hierarchical POMDPs

ABSTRACT
STATES

ACTIONS
(Fine, Singer, Tishby, MLJ 98)
19
Experimental Environments
600 states
1200 states
20
The Robot Navigation Domain

The robot Pavlov in the real MSU environment
The Nomad 200 simulator

21
Learning Feature Detectors(Mahadevan,
Theocharous, Khaleeli MLJ 98)

736 hand-labeled-grids
8-fold cross-validation
Classification error (m7.33, s3.7)

22
Learning and Planning in H-POMDPs for Robot
Navigation
INITIAL H-POMDP
LEARNING HAND CODING
COMPILATION
TOPOLOGICAL MAP
PLANNING
ENVIRONMENT
PLANNING
PLANNING
EXECUTION
EM
TRAINED H-POMDP
NAVIGATION SYSTEM
23
Outline

Sequential decision-making under uncertainty
A Hierarchical POMDP model for robot navigation
Heuristic macro-action selection in H-POMDPs
Near Optimal macro-action selection for arbitrary
POMDPs
Representing H-POMDPs as DBNs
Current and Future directions

24
Planning in H-POMDPs(Theocharous, Mahadevan
ICRA 2002)
Abstract actions

Hierarchical MDP solutions (using the options
framework Sutton, Precup, Singh, AIJ)
Heuristic POMDP solutions
MLS

Primitive actions
Beliefs b(s)
0.35
0.3
0.2
0.1
0.05
4,
10,
23,
49,
100,
40
10
5
100
20
p(b) go-west
v(go-west)
v(go-east)
25
Plan Execution
26
Plan Execution
27
Plan Execution
28
Plan Execution
29
Intuition

Probability distribution at the higher level
evolves more slowly
The agent does not decide what the best
macro-action to do every time step
Long term actions result in robot localization

30
F-MLS Demo
31
H-MLS Demo
32
Hierarchical is More Successful
Unknown initial position
Success
Environment
Algorithm
MLS
MLS
QMDP
QMDP
33
Hierarchical Takes Less Time to Reach Goal
Unknown initial position
?
Average Steps to Goal
Environment
Algorithm
QMDP
MLS
QMDP
MLS
34
Hierarchical Plans are Computed Faster
Planning Time
Environment
Goal 2
Algorithm
Goal 1
Goal 2
Goal 1
35
Outline

Sequential decision-making under uncertainty
A Hierarchical POMDP model for robot navigation
Heuristic macro-action selection in H-POMDPs
Near Optimal macro-action selection for arbitrary
POMDPs
Representing H-POMDPs as DBNs
Current and Future directions

36
Near Optimal Macro-action Selection(Theocharous,
Kaelbling NIPS 2003)

Usually agents dont require the entire belief
space
Macro-actions can reduce belief space even more
Tested in large scale robot navigation
Only small part of the belief-space is required
Learn approximate POMDP policies fast
High success rate
Better policies
Does information gathering

37
Dynamic Grids
38
The Algorithm
True trajectory
True belief state
Resulting next true belief state
Simulation trajectories from g of macro
A (estimation of value at g)
Value of b is interpolated from its neighbors
Nearest grid point to b
39
Experimental Setup
40
Fewer Number of States
41
Fewer Steps to Goal
42
More Successful
43
Information Gathering

44
Information Gathering(scaling up)
45
Dynamic POMDP Abstractions(Theocharous, Mannor,
Kaelbling)
Entropy thresholds
start
goal
Localization macros
46
Fewer Steps to Goal
47
Outline

Sequential decision-making under uncertainty
A Hierarchical POMDP model for robot navigation
Heuristic macro-action selection in H-POMDPs
Near Optimal macro-action selection for arbitrary
POMDPs
Representing H-POMDPs as DBNs
Current and Future directions

48
Dynamic Bayesian Networks
STATE POMDP
FACTORED DBN POMDP
of parameters
of parameters
49
DBN Inference
L
1
50
Representing H-POMDPs as Dynamic Bayesian
Networks(Theocharous, Murphy, Kaelbling ICRA
2004)
FACTORED DBN H-POMDP
STATE H-POMDP
51
Representing H-POMDPs as Dynamic Bayesian
Networks(Theocharous, Murphy, Kaelbling ICRA
2004)
FACTORED DBN H-POMDP
STATE H-POMDP
52
Representing H-POMDPs as Dynamic Bayesian
Networks(Theocharous, Murphy, Kaelbling ICRA
2004)
FACTORED DBN H-POMDP
STATE H-POMDP
53
Representing H-POMDPs as Dynamic Bayesian
Networks(Theocharous, Murphy, Kaelbling ICRA
2004)
FACTORED DBN H-POMDP
STATE H-POMDP
54
Representing H-POMDPs as Dynamic Bayesian
Networks(Theocharous, Murphy, Kaelbling ICRA
2004)
FACTORED DBN H-POMDP
STATE H-POMDP
55
Complexity of Inference
FACTORED DBN H-POMDP
STATE H-POMDP
DBN H-POMDP
STATE POMDP
56
Hierarchical Localizes better
Original
Factored DBN tied H-POMDP
Factored DBN H-POMDP
DBN H-POMDP
STATE POMDP
Before training
57
Hierarchical Fits Data Better
Original
Factored DBN tied H-POMDP
Factored DBN H-POMDP
DBN H-POMDP
STATE POMDP
Before training
58
Directions for Future Research

In the future we will explore structure learning
Bayesian model selection approaches
Methods for learning compositional hierarchies
(recurrent nets, hierarchical sparse n-grams)
Natural language acquisition methods
Identifying isomorphic processes
Online learning
Interactive Learning
Application to real world problems

59
Major Contributions