Title: 11/15: Planning in Belief Space contd..
111/15 Planning in Belief Space contd..
Agenda Long post-mortem on Kanna Rajan Talk
Progression/Regression in Belief Space
- Home Work 3 returned Homework 4 assigned
Avg. 61.66667
Std. Dev. 19.16551
Median 57
2Discussion on Kanna Rajan Talk
- Qn What did you mean by the comment to KR that
the difficulty of modeling search control may be
more acute because of hand-coded search control? - HSTS/RAX (the planner underlying MAPGEN) depends
on hand-coded search control rules to tell the
planner how to deal with the choice points during
its search. To write these, you need expertise
with both the HSTS planner and the domain. And
the rules change if the domain changes. I was
wondering if the latter difficulty may be
alleviated if you were to use domain-independent
(or even domain-specific but declarativea la
TLPLANsearch control). Kanna thinks the latter
techniques may not scale - The domain specific search control rules are also
used in ASPENJPLs own temporal planner that
uses local search. In ASPENs case, the control
rules tell it which of the many possible plan
repairs should be picked first. The experience
with ASPEN was that it takes NASA folks time to
(a) encode the new domain AND (b) write the
domain specific rules. The first cannot be
avoided. The second could be, if we use
domain-independent local search heuristics (e.g.
LPG planner)
3Explaining Plans
- Qn KR mentioned that it was very important for
the users to get exlanations for the decisions
the planner made. How do we get them? - There are two types of explanations
- Explanations of correctnesse.g. why is this
action in the plan? - Can be given through causal links the action is
in the plan because of the causal links it is
giving. Following them will tell us whether or
not the action is (in)directly supporting some
toplevel goal - Can be computed after the fact (doesnt matter
who made the planif I have the domain theory, I
can compute its explanation of correctness) - Rationale for decisione.g. Why was this action
chosen as against some other action giving
similar effects? - This information needs to be captured during the
search - See Kambhamptis 1990 paper on Information
Requirements for Modification
4From Kambhampati, 1990
5Soft Goals
- Qn KR mentioned that MAPGEN needs to handle soft
goals. What are these and how are they handled? - Soft Goals are those that dont have to be
achieved for the plan to be considered valid.
However, achieving them can improve the value
of the plan. Soft Goals give planning a more
distinct optimization flavor (If all goals are
soft, then any executable sequence is a valid
plan, and what we are looking for are valid plans
with high quality) - The way MAPGEN handles these seems to be sort of
ad hocthe goals are given priorities. All tier 1
goals are first handled, then tier 2 goals are
handled etc. - A related qn How do we handle soft goals in a
more principled and automated way? - See the papers in AAAI-2004 (by Rao et al) and
ICAPS-2004 (by Smith) - and a summary of the issues in the next slide
6Handling Soft Goalsan approach
- Consider the variant of classical planning
problem called PSP Net Benefit, defined as
follows - There are n goals to be achieved. Each goal g,
when achieved gives a reward Rg - Each action a in the domain has a cost Ca
(expressed in the same units as the reward) - Objective is to find a plan that has the highest
net benefit (which is the difference between
cumulative reward of all the goals achieved by
the plan, and the cumulative cost of all the
actions used in the plan) - How do we solve PSP Net Benefit problem?
- Naïve (but guaranteed optimal) idea Consider all
possible subset of the n goals. For each find the
least costly plan (using cost-based planning
graphs). Among all these, pick the one with the
best benefit. - You die because there are 2n different calls to
the planning algorithm ? - Cheap (but pretty inoptimal idea) Rank the n
goals in terms of the expected net benefit of the
plans for achieving them. Work just on the subset
of goals with ve net benefit - You can do this by finding a (cost-sensitive)
relaxed plan for each of the goals. The net
benefit is the reward of the goal minus the cost
of the relaxed plan. - Problem The cost of achieving a goal depends on
what other goals we are planning to achieve in
conjunction. We need to consider residual cost of
achieving a goal gk1 in the context of goals
g1..gk that have already been selected - A less greedy idea Generalize the relaxed plan
extraction procedure such that it takes the
relaxed plan P for achieving g1gk and attempts
to re-use as many of the actions in P as possible
while finding a relaxed plan for gk1 - See AltAltps system (in AAAI 2004) for details
7Belief State Search An Example Problem
Actions A1 M P gt K A2 M Q gt
K A3 M R gt L A4 K gt G A5 L gt G
Plan ??
- Initial state M is true and exactly one of P,Q,R
are true - Goal Need G
DNF good for progression (clauses are partial
states)
Init State Formula (p q
r)V(pqr)V(pqr)M DNF
MpqrVMpqrVMpqr CNF (P V Q
V R) (P V Q) (P V R) (Q V R) M
CNF good For regression
8Progression Regression
- Progression with DNF
- The constituents (DNF clauses) look like
partial states already. Think of applying action
to each of these constituents and unioning the
result - Action application converts each constituent to a
set of new constituents - Termination when each constituent entails the
goal formula - Regression with CNF
- Very little difference from classical planning
(since we already had partial states in classical
planning). - THE Main difference is that we cannot split the
disjunction into search space - Termination when each (CNF) clause is entailed by
the initial state
9Progression Example
10Regression Search Example
Actions A1 M P gt K A2 M Q gt
K A3 M R gt L A4 K gt G A5 L gt G
G
A4
G or K must be true before A4 For G to be true
after A4
(G V K)
A5
(G V K V L)
A1
(G V K V L V P) M
Enabling precondition Must be true before A1 was
applied
A2
(G V K V L V P V Q) M
Initially (P V Q V R) (P V Q) (P V R)
(Q V R) M
Initially (P V Q V R) (P V Q) (P V R)
(Q V R) M
A3
Each Clause is Satisfied by a Clause in the
Initial Clausal State -- Done! (5 actions)
(G V K V L V P V Q V R) M
(G V K V L V P V Q V R) M
Goal State G
Clausal States compactly represent disjunction to
sets of uncertain literals Yet, still need
heuristics for the search
11What happens if we restrict uncertainty?
- If initial state contains only the known
variables (either known to be true or known to be
false), - DNF formula has one single constituent
- CNF clauses are all singletons
- So you can see how we go from 2(2n) to 3n
1211/17
after all the money we spend on wardrobe and
cosmetic surgeries ?
13Conformant Planning in Real World 2 examples
No. 42 HOW NOT TO BE SEEN (aka Monty Python on
Conformant Planning) Video shown in class
14Heuristics for Conformant Planning
- First idea Notice that Classical planning
(which assumes full observability) is a
relaxation of conformant planning - So, the length of the classical planning solution
is a lowerbound (admissible heuristic) for
conformant planning - Further, the heuristics for classical planning
are also heuristics for conformant planning
(albeit not very informed probably) - Next idea Let us get a feel for how estimating
distances between belief states differs from
estimating those between states
15Three issues How many states are there?
How far are each of the states from goal? How
much interaction is there between states?
?For example if the length of plan for
taking S1 to goal is 10, S2 to
goal is 10, the length of plan for taking
both to goal could be anywhere between
10 and Infinity depending on
the interactions Notice that we talk about
state interactions here just
as we talked about goal interactions in
classical planning
Need to estimate the length of combined plan
for taking all states to the goal
16Belief-state cardinality alone wont be enough
- Early work on conformant planning concentrated
exclusively on heuristics that look at the
cardinality of the belief state - The larger the cardinality of the belief state,
the higher its uncertainty, and the worse it is
(for progression) - Notice that in regression, we have the opposite
heuristicthe larger the cardinality, the higher
the flexibility (we are satisfied with any one of
a larger set of states) and so the better it is - From our example in the previous slide,
cardinality is only one of the three components
that go into actual distance estimation. - For example, there may be an action that reduces
the cardinality (e.g. bomb the place ?) but the
new belief state with low uncertainty will be
infinite distance away from the goal. - We will look at planning graph-based heuristics
for considering all three components - (actually, unless we look at cross-world mutexes,
we wont be considering the interaction part)
17Planning Graph Heuristic Computation
- Heuristics
- BFS
- Cardinality
- Max, Sum, Level, Relaxed Plans
- Planning Graph Structures
- Single, unioned planning graph (SG)
- Multiple, independent planning graphs (MG)
- Single, labeled planning graph (LUG)
- Bryce , et. al, 2004 AAAI MDP workshop
18Using a Single, Unioned Graph
P
P
P
P
P
M
A1
A1
A1
Q
Q
Q
Q
A2
A2
M
R
R
R
R
A3
A3
M
M
M
- Not effective
- Lose world specific support information
M
M
K
K
K
Heuristic Estimate 2
A4
A4
L
L
Union literals from all initial states into a
conjunctive initial graph level
A5
G
G
19Using Multiple Graphs
P
P
P
P
A1
A1
A1
M
M
M
M
P
K
K
K
A4
A4
M
G
G
- Memory Intensive
- Heuristic Computation Can be costly
Q
Q
Q
Q
Q
A2
A2
A2
M
M
M
M
M
R
K
K
K
A4
A4
M
G
G
R
R
R
R
A3
A3
A3
M
M
M
M
L
L
L
A5
A5
G
G
20What about mutexes?
- In the previous slide, we considered only relaxed
plans (thus ignoring any mutexes) - We could have considered mutexes in the
individual world graphs to get better estimates
of the plans in the individual worlds (call these
same world mutexes) - We could also have considered the impact of
having an action in one world on the other world.
- Consider a patient who may or may not be
suffering from disease D. There is a medicine M,
which if given in the world where he has D, will
cure the patient. But if it is given in the world
where the patient doesnt have disease D, it will
kill him. Since giving the medicine M will have
impact in both worlds, we now have a mutex
between being alive in world 1 and being
cured in world 2! - Notice that cross-world mutexes will take into
account the state-interactions that we mentioned
as one of the three components making up the
distance estimate. - We could compute a subset of same world and cross
world mutexes to improve the accuracy of the
heuristics - but it is not clear whether or not the accuracy
comes at too much additional cost to have
reasonable impact on efficiency.. see Bryce et.
Al. JAIR submission
21Connection to CGP
- CGPthe conformant Graphplandoes multiple
planning graphs, but also does backward search
directly on the graphs to find a solution (as
against using these to give heuristic estimates) - It has to mark sameworld and cross world mutexes
to ensure soundness..
22Using a Single, Labeled Graph(joint work with
David E. Smith)
Action Labels Conjunction of Labels of
Supporting Literals
Labels signify possible worlds under which a
literal holds
P
P
P
P
P
P
M
- Memory Efficient
- Cheap Heuristics
- Scalable
- Extensible
A1
A1
A1
A1
Q
Q
Q
Q
Q
Q
A2
A2
A2
A2
M
R
R
R
R
R
A3
A3
A3
A3
R
M
M
M
M
M
Literal Labels Disjunction of Labels Of
Supporting Actions
K
K
K
Benefits from BDDs
A4
A4
L
L
L
Label Key
True
A5
A5
G
G
(P R) V (Q R)
Q R
P R
(P R) V (Q R) V (P Q)
Heuristic Value 5
P Q
23Sensing Actions
- Sensing actions in essence partition a belief
state - Sensing a formula f splits a belief state B to
Bf Bf - Both partitions need to be taken to the goal
state now - Tree plan
- AO search
- Heuristics will have to compare two generalized
AND branches - In the figure, the lower branch has an expected
cost of 11,000 - The upper branch has a fixed sensing cost of 300
based on the outcome, a cost of 7 or 12,000 - If we consider worst case cost, we assume the
cost is 12,300 - If we consider both to be equally likey, we
assume 6303.5 units cost - If we know actual probabilities that the sensing
action returns one result as against other, we
can use that to get the expected cost
7
300
12,000
As
A
11,000
24(No Transcript)
25Cost models of conditional plans
- The execution cost of a conditional plan is
- Cost of O5
- Prob(pT) cost of A1 A3 Prob(pF)cost
of A2 A3 - Can take max(cost A1A3 cost A2A3 )
- The planning cost of a conditional plan is
however is proportional to the total size of the
plan (num actions)
O5p?
Y
N
A1
A2
A3
O5p?
Y
N
A1
A2
Need to estimate cost of leaf belief states
26Slides beyond this point not covered
27 System Architecture
CAltAlt
IPC PDDL Parser
Input for
Input for
Heuristics
A Search Engine (HSP-r)
Planning Graph(s) (IPP)
Extracted From
Condense
Searches
Labels (CUDD)
Model Checker (NuSMV)
Belief States
Guided By
Validates
Off The - Shelf
Custom
28Sum and Relaxed Plan Are Best for a single Graph
Relaxed Plan is Best Multiple Or Label Graphs
Label Graph using mutexes With relaxed plan is
best overall
29Relaxed Plan is Best for a single Graph
Sum is Best for Multiple Graphs
Label Graph using mutexes With relaxed plan is
best overall
30Cardinality does well
Multiple Graph Union Relaxed Plan scales
Label Graph Relaxed Plan Does best
31Relaxed Plan approaches Scale better with time
approximate to cardinality And quality comparable
to optimal
OptimalApproaches scale poorly
Cardinality approaches are faster But quality
suffers
32Relaxed Plan approaches Scale better with time
approximate to cardinality And quality comparable
to optimal
OptimalApproaches scale poorly
Cardinality approaches are faster But quality
suffers
33Contingent Planning
- Progression Planner PBSP
- LAO type search -- Non-Deterministic Partially
Observable - Build Planning Graph to compute heuristic for
each Belief State - No Mutexes Computed
- Added Observational Actions to Domains
34Relaxed Plan approaches Scale better than
optimal approaches and have Comparable quality
OptimalApproaches scale poorly
Cardinality approaches are faster And scale
better But quality suffers by two orders of
magnitude
35Conclusions Future Work
- Conclusion
- Distance Estimations using overlap are more
informed than cardinality and max state to state
heuristics - Multiple Planning Graphs give good heuristics,
but are costly - Labeled Planning graphs reduce cost
- Planning Graph Heuristics help control plan
length while scaling to difficult problems - More details in
- TR at http//rakaposhi.eas.asu.edu/belief-search
- Conformant, Contingent all planning graph types
- AAAI-04 MDP workshop
- Labeled Planning Graph for conformant planning
- Future Work
- Stochastic Planning
36Stochastic Planning
Stochastic Planning Problem
New Approach
Buridan
Relaxation Of Instance
Can use Relaxed Plans that are greedy On
Probability by Using Probability in Planning
Graph (similar to PGraphPlan)
Deterministic Planner (UCPOP)
Non-Deterministic Planner (PBSP or CAltAlt)
Convert Solution to Stochastic Plan
Non- DeterministicPlan
Deterministic Plan
Seed Stochastic Plan
A seed non-deterministic plan is likely to
reflect physics of a stochastic planning problem
better than a seed deterministic plan.
Local Search To Improve Probability of
Satisfaction
Stochastic Plan
37Distance Estimates
Cardinality
Max State to State
State to State Overlap Belief state to Belief
state
4
7
10
2
3
max
union
6
7
min
min
min
5
4
?
3
4
7
38Cardinality does well
Multiple Graph Union Relaxed Plan scales
Label Graph Relaxed Plan Does best, mutexes do
help
39Relaxed Plan approaches Scale better than
optimal approaches, but have quality comparable
to optimal
OptimalApproaches scale poorly
Cardinality approaches are faster And scale
better But quality suffers by an order of
magnitude