11/15: Planning in Belief Space contd..

About This Presentation

Title:

11/15: Planning in Belief Space contd..

Description:

Cheap (but pretty inoptimal idea): Rank the n goals in terms of the expected net ... Very little difference from classical planning (since we already had partial ... – PowerPoint PPT presentation

Number of Views:21

Avg rating:3.0/5.0

Slides: 27

Provided by: min63

Learn more at: https://rakaposhi.eas.asu.edu

Category:

more less

Transcript and Presenter's Notes

Title: 11/15: Planning in Belief Space contd..

1
11/15 Planning in Belief Space contd..
Agenda Long post-mortem on Kanna Rajan Talk
Progression/Regression in Belief Space

Home Work 3 returned Homework 4 assigned

Avg. 61.66667
Std. Dev. 19.16551
Median 57
2
Discussion on Kanna Rajan Talk

Qn What did you mean by the comment to KR that
the difficulty of modeling search control may be
more acute because of hand-coded search control?
HSTS/RAX (the planner underlying MAPGEN) depends
on hand-coded search control rules to tell the
planner how to deal with the choice points during
its search. To write these, you need expertise
with both the HSTS planner and the domain. And
the rules change if the domain changes. I was
wondering if the latter difficulty may be
alleviated if you were to use domain-independent
(or even domain-specific but declarativea la
TLPLANsearch control). Kanna thinks the latter
techniques may not scale
The domain specific search control rules are also
used in ASPENJPLs own temporal planner that
uses local search. In ASPENs case, the control
rules tell it which of the many possible plan
repairs should be picked first. The experience
with ASPEN was that it takes NASA folks time to
(a) encode the new domain AND (b) write the
domain specific rules. The first cannot be
avoided. The second could be, if we use
domain-independent local search heuristics (e.g.
LPG planner)

3
Explaining Plans

Qn KR mentioned that it was very important for
the users to get exlanations for the decisions
the planner made. How do we get them?
There are two types of explanations
Explanations of correctnesse.g. why is this
action in the plan?
Can be given through causal links the action is
in the plan because of the causal links it is
giving. Following them will tell us whether or
not the action is (in)directly supporting some
toplevel goal
Can be computed after the fact (doesnt matter
who made the planif I have the domain theory, I
can compute its explanation of correctness)
Rationale for decisione.g. Why was this action
chosen as against some other action giving
similar effects?
This information needs to be captured during the
search
See Kambhamptis 1990 paper on Information
Requirements for Modification

4
From Kambhampati, 1990
5
Soft Goals

Qn KR mentioned that MAPGEN needs to handle soft
goals. What are these and how are they handled?
Soft Goals are those that dont have to be
achieved for the plan to be considered valid.
However, achieving them can improve the value
of the plan. Soft Goals give planning a more
distinct optimization flavor (If all goals are
soft, then any executable sequence is a valid
plan, and what we are looking for are valid plans
with high quality)
The way MAPGEN handles these seems to be sort of
ad hocthe goals are given priorities. All tier 1
goals are first handled, then tier 2 goals are
handled etc.
A related qn How do we handle soft goals in a
more principled and automated way?
See the papers in AAAI-2004 (by Rao et al) and
ICAPS-2004 (by Smith)
and a summary of the issues in the next slide

6
Handling Soft Goalsan approach

Consider the variant of classical planning
problem called PSP Net Benefit, defined as
follows
There are n goals to be achieved. Each goal g,
when achieved gives a reward Rg
Each action a in the domain has a cost Ca
(expressed in the same units as the reward)
Objective is to find a plan that has the highest
net benefit (which is the difference between
cumulative reward of all the goals achieved by
the plan, and the cumulative cost of all the
actions used in the plan)
How do we solve PSP Net Benefit problem?
Naïve (but guaranteed optimal) idea Consider all
possible subset of the n goals. For each find the
least costly plan (using cost-based planning
graphs). Among all these, pick the one with the
best benefit.
You die because there are 2n different calls to
the planning algorithm ?
Cheap (but pretty inoptimal idea) Rank the n
goals in terms of the expected net benefit of the
plans for achieving them. Work just on the subset
of goals with ve net benefit
You can do this by finding a (cost-sensitive)
relaxed plan for each of the goals. The net
benefit is the reward of the goal minus the cost
of the relaxed plan.
Problem The cost of achieving a goal depends on
what other goals we are planning to achieve in
conjunction. We need to consider residual cost of
achieving a goal gk1 in the context of goals
g1..gk that have already been selected
A less greedy idea Generalize the relaxed plan
extraction procedure such that it takes the
relaxed plan P for achieving g1gk and attempts
to re-use as many of the actions in P as possible
while finding a relaxed plan for gk1
See AltAltps system (in AAAI 2004) for details

7
Belief State Search An Example Problem
Actions A1 M P gt K A2 M Q gt
K A3 M R gt L A4 K gt G A5 L gt G
Plan ??

Initial state M is true and exactly one of P,Q,R
are true
Goal Need G

DNF good for progression (clauses are partial
states)
Init State Formula (p q
r)V(pqr)V(pqr)M DNF
MpqrVMpqrVMpqr CNF (P V Q
V R) (P V Q) (P V R) (Q V R) M
CNF good For regression
8
Progression Regression

Progression with DNF
The constituents (DNF clauses) look like
partial states already. Think of applying action
to each of these constituents and unioning the
result
Action application converts each constituent to a
set of new constituents
Termination when each constituent entails the
goal formula
Regression with CNF
Very little difference from classical planning
(since we already had partial states in classical
planning).
THE Main difference is that we cannot split the
disjunction into search space
Termination when each (CNF) clause is entailed by
the initial state

9
Progression Example
10
Regression Search Example
Actions A1 M P gt K A2 M Q gt
K A3 M R gt L A4 K gt G A5 L gt G
G
A4
G or K must be true before A4 For G to be true
after A4
(G V K)
A5
(G V K V L)
A1
(G V K V L V P) M
Enabling precondition Must be true before A1 was
applied
A2
(G V K V L V P V Q) M
Initially (P V Q V R) (P V Q) (P V R)
(Q V R) M
Initially (P V Q V R) (P V Q) (P V R)
(Q V R) M
A3
Each Clause is Satisfied by a Clause in the
Initial Clausal State -- Done! (5 actions)
(G V K V L V P V Q V R) M
(G V K V L V P V Q V R) M
Goal State G
Clausal States compactly represent disjunction to
sets of uncertain literals Yet, still need
heuristics for the search
11
What happens if we restrict uncertainty?

If initial state contains only the known
variables (either known to be true or known to be
false),
DNF formula has one single constituent
CNF clauses are all singletons
So you can see how we go from 2(2n) to 3n

12
11/17
after all the money we spend on wardrobe and
cosmetic surgeries ?
13
Conformant Planning in Real World 2 examples
No. 42 HOW NOT TO BE SEEN (aka Monty Python on
Conformant Planning) Video shown in class
14
Heuristics for Conformant Planning

First idea Notice that Classical planning
(which assumes full observability) is a
relaxation of conformant planning
So, the length of the classical planning solution
is a lowerbound (admissible heuristic) for
conformant planning
Further, the heuristics for classical planning
are also heuristics for conformant planning
(albeit not very informed probably)
Next idea Let us get a feel for how estimating
distances between belief states differs from
estimating those between states

15
Three issues How many states are there?
How far are each of the states from goal? How
much interaction is there between states?
?For example if the length of plan for
taking S1 to goal is 10, S2 to
goal is 10, the length of plan for taking
both to goal could be anywhere between
10 and Infinity depending on
the interactions Notice that we talk about
state interactions here just
as we talked about goal interactions in
classical planning
Need to estimate the length of combined plan
for taking all states to the goal
16
Belief-state cardinality alone wont be enough

Early work on conformant planning concentrated
exclusively on heuristics that look at the
cardinality of the belief state
The larger the cardinality of the belief state,
the higher its uncertainty, and the worse it is
(for progression)
Notice that in regression, we have the opposite
heuristicthe larger the cardinality, the higher
the flexibility (we are satisfied with any one of
a larger set of states) and so the better it is
From our example in the previous slide,
cardinality is only one of the three components
that go into actual distance estimation.
For example, there may be an action that reduces
the cardinality (e.g. bomb the place ?) but the
new belief state with low uncertainty will be
infinite distance away from the goal.
We will look at planning graph-based heuristics
for considering all three components
(actually, unless we look at cross-world mutexes,
we wont be considering the interaction part)

17
Planning Graph Heuristic Computation

Heuristics
BFS
Cardinality
Max, Sum, Level, Relaxed Plans
Planning Graph Structures
Single, unioned planning graph (SG)
Multiple, independent planning graphs (MG)
Single, labeled planning graph (LUG)
Bryce , et. al, 2004 AAAI MDP workshop

18
Using a Single, Unioned Graph
P
P
P
P
P
M
A1
A1
A1
Q
Q
Q
Q

Minimal
implementation

A2
A2
M
R
R
R
R
A3
A3
M
M
M

Not effective
Lose world specific support information

M
M
K
K
K
Heuristic Estimate 2
A4
A4
L
L
Union literals from all initial states into a
conjunctive initial graph level
A5
G
G
19
Using Multiple Graphs
P
P
P
P
A1
A1
A1

Same-world Mutexes

M
M
M
M
P
K
K
K
A4
A4
M
G
G

Memory Intensive
Heuristic Computation Can be costly

Q
Q
Q
Q
Q
A2
A2
A2
M
M
M
M
M
R
K
K
K
A4
A4
M
G
G
R
R
R
R
A3
A3
A3
M
M
M
M
L
L
L
A5
A5
G
G
20
What about mutexes?

In the previous slide, we considered only relaxed
plans (thus ignoring any mutexes)
We could have considered mutexes in the
individual world graphs to get better estimates
of the plans in the individual worlds (call these
same world mutexes)
We could also have considered the impact of
having an action in one world on the other world.
Consider a patient who may or may not be
suffering from disease D. There is a medicine M,
which if given in the world where he has D, will
cure the patient. But if it is given in the world
where the patient doesnt have disease D, it will
kill him. Since giving the medicine M will have
impact in both worlds, we now have a mutex
between being alive in world 1 and being
cured in world 2!
Notice that cross-world mutexes will take into
account the state-interactions that we mentioned
as one of the three components making up the
distance estimate.
We could compute a subset of same world and cross
world mutexes to improve the accuracy of the
heuristics
but it is not clear whether or not the accuracy
comes at too much additional cost to have
reasonable impact on efficiency.. see Bryce et.
Al. JAIR submission

21
Connection to CGP

CGPthe conformant Graphplandoes multiple
planning graphs, but also does backward search
directly on the graphs to find a solution (as
against using these to give heuristic estimates)
It has to mark sameworld and cross world mutexes
to ensure soundness..

22
Using a Single, Labeled Graph(joint work with
David E. Smith)
Action Labels Conjunction of Labels of
Supporting Literals
Labels signify possible worlds under which a
literal holds
P
P
P
P
P
P
M

Memory Efficient
Cheap Heuristics
Scalable
Extensible

A1
A1
A1
A1
Q
Q
Q
Q
Q
Q
A2
A2
A2
A2
M
R
R
R
R
R
A3
A3
A3
A3
R
M
M
M
M
M
Literal Labels Disjunction of Labels Of
Supporting Actions
K
K
K
Benefits from BDDs
A4
A4
L
L
L
Label Key
True
A5
A5
G
G
(P R) V (Q R)
Q R
P R
(P R) V (Q R) V (P Q)
Heuristic Value 5
P Q
23
Sensing Actions

Sensing actions in essence partition a belief
state
Sensing a formula f splits a belief state B to
Bf Bf
Both partitions need to be taken to the goal
state now
Tree plan
AO search
Heuristics will have to compare two generalized
AND branches
In the figure, the lower branch has an expected
cost of 11,000
The upper branch has a fixed sensing cost of 300
based on the outcome, a cost of 7 or 12,000
If we consider worst case cost, we assume the
cost is 12,300
If we consider both to be equally likey, we
assume 6303.5 units cost
If we know actual probabilities that the sensing
action returns one result as against other, we
can use that to get the expected cost

7
300
12,000
As
A
11,000
24
(No Transcript)
25
Cost models of conditional plans

The execution cost of a conditional plan is
Cost of O5
Prob(pT) cost of A1 A3 Prob(pF)cost
of A2 A3
Can take max(cost A1A3 cost A2A3 )
The planning cost of a conditional plan is
however is proportional to the total size of the
plan (num actions)

O5p?
Y
N
A1
A2
A3
O5p?
Y
N
A1
A2
Need to estimate cost of leaf belief states
26
Slides beyond this point not covered
27
System Architecture
CAltAlt
IPC PDDL Parser
Input for
Input for
Heuristics
A Search Engine (HSP-r)
Planning Graph(s) (IPP)
Extracted From
Condense
Searches
Labels (CUDD)
Model Checker (NuSMV)
Belief States
Guided By
Validates
Off The - Shelf
Custom
28
Sum and Relaxed Plan Are Best for a single Graph
Relaxed Plan is Best Multiple Or Label Graphs
Label Graph using mutexes With relaxed plan is
best overall
29
Relaxed Plan is Best for a single Graph
Sum is Best for Multiple Graphs
Label Graph using mutexes With relaxed plan is
best overall
30
Cardinality does well
Multiple Graph Union Relaxed Plan scales
Label Graph Relaxed Plan Does best
31
Relaxed Plan approaches Scale better with time
approximate to cardinality And quality comparable
to optimal
OptimalApproaches scale poorly
Cardinality approaches are faster But quality
suffers
32
Relaxed Plan approaches Scale better with time
approximate to cardinality And quality comparable
to optimal
OptimalApproaches scale poorly
Cardinality approaches are faster But quality
suffers
33
Contingent Planning

Progression Planner PBSP
LAO type search -- Non-Deterministic Partially
Observable
Build Planning Graph to compute heuristic for
each Belief State
No Mutexes Computed
Added Observational Actions to Domains

34
Relaxed Plan approaches Scale better than
optimal approaches and have Comparable quality
OptimalApproaches scale poorly
Cardinality approaches are faster And scale
better But quality suffers by two orders of
magnitude
35
Conclusions Future Work

Conclusion
Distance Estimations using overlap are more
informed than cardinality and max state to state
heuristics
Multiple Planning Graphs give good heuristics,
but are costly
Labeled Planning graphs reduce cost
Planning Graph Heuristics help control plan
length while scaling to difficult problems
More details in
TR at http//rakaposhi.eas.asu.edu/belief-search
Conformant, Contingent all planning graph types
AAAI-04 MDP workshop
Labeled Planning Graph for conformant planning
Future Work
Stochastic Planning

36
Stochastic Planning
Stochastic Planning Problem
New Approach
Buridan
Relaxation Of Instance
Can use Relaxed Plans that are greedy On
Probability by Using Probability in Planning
Graph (similar to PGraphPlan)
Deterministic Planner (UCPOP)
Non-Deterministic Planner (PBSP or CAltAlt)
Convert Solution to Stochastic Plan
Non- DeterministicPlan
Deterministic Plan
Seed Stochastic Plan
A seed non-deterministic plan is likely to
reflect physics of a stochastic planning problem
better than a seed deterministic plan.
Local Search To Improve Probability of
Satisfaction
Stochastic Plan
37
Distance Estimates
Cardinality
Max State to State
State to State Overlap Belief state to Belief
state
4
7
10
2
3
max
union
6
7
min
min
min
5
4
?
3
4
7
38
Cardinality does well
Multiple Graph Union Relaxed Plan scales
Label Graph Relaxed Plan Does best, mutexes do
help
39
Relaxed Plan approaches Scale better than
optimal approaches, but have quality comparable
to optimal
OptimalApproaches scale poorly
Cardinality approaches are faster And scale
better But quality suffers by an order of
magnitude

Write a Comment

User Comments (0)