11/15: Planning in Belief Space contd.. - PowerPoint PPT Presentation

About This Presentation
Title:

11/15: Planning in Belief Space contd..

Description:

Cheap (but pretty inoptimal idea): Rank the n goals in terms of the expected net ... Very little difference from classical planning (since we already had partial ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 27
Provided by: min63
Category:

less

Transcript and Presenter's Notes

Title: 11/15: Planning in Belief Space contd..


1
11/15 Planning in Belief Space contd..
Agenda Long post-mortem on Kanna Rajan Talk
Progression/Regression in Belief Space
  • Home Work 3 returned Homework 4 assigned

Avg. 61.66667
Std. Dev. 19.16551
Median 57
2
Discussion on Kanna Rajan Talk
  • Qn What did you mean by the comment to KR that
    the difficulty of modeling search control may be
    more acute because of hand-coded search control?
  • HSTS/RAX (the planner underlying MAPGEN) depends
    on hand-coded search control rules to tell the
    planner how to deal with the choice points during
    its search. To write these, you need expertise
    with both the HSTS planner and the domain. And
    the rules change if the domain changes. I was
    wondering if the latter difficulty may be
    alleviated if you were to use domain-independent
    (or even domain-specific but declarativea la
    TLPLANsearch control). Kanna thinks the latter
    techniques may not scale
  • The domain specific search control rules are also
    used in ASPENJPLs own temporal planner that
    uses local search. In ASPENs case, the control
    rules tell it which of the many possible plan
    repairs should be picked first. The experience
    with ASPEN was that it takes NASA folks time to
    (a) encode the new domain AND (b) write the
    domain specific rules. The first cannot be
    avoided. The second could be, if we use
    domain-independent local search heuristics (e.g.
    LPG planner)

3
Explaining Plans
  • Qn KR mentioned that it was very important for
    the users to get exlanations for the decisions
    the planner made. How do we get them?
  • There are two types of explanations
  • Explanations of correctnesse.g. why is this
    action in the plan?
  • Can be given through causal links the action is
    in the plan because of the causal links it is
    giving. Following them will tell us whether or
    not the action is (in)directly supporting some
    toplevel goal
  • Can be computed after the fact (doesnt matter
    who made the planif I have the domain theory, I
    can compute its explanation of correctness)
  • Rationale for decisione.g. Why was this action
    chosen as against some other action giving
    similar effects?
  • This information needs to be captured during the
    search
  • See Kambhamptis 1990 paper on Information
    Requirements for Modification

4
From Kambhampati, 1990
5
Soft Goals
  • Qn KR mentioned that MAPGEN needs to handle soft
    goals. What are these and how are they handled?
  • Soft Goals are those that dont have to be
    achieved for the plan to be considered valid.
    However, achieving them can improve the value
    of the plan. Soft Goals give planning a more
    distinct optimization flavor (If all goals are
    soft, then any executable sequence is a valid
    plan, and what we are looking for are valid plans
    with high quality)
  • The way MAPGEN handles these seems to be sort of
    ad hocthe goals are given priorities. All tier 1
    goals are first handled, then tier 2 goals are
    handled etc.
  • A related qn How do we handle soft goals in a
    more principled and automated way?
  • See the papers in AAAI-2004 (by Rao et al) and
    ICAPS-2004 (by Smith)
  • and a summary of the issues in the next slide

6
Handling Soft Goalsan approach
  • Consider the variant of classical planning
    problem called PSP Net Benefit, defined as
    follows
  • There are n goals to be achieved. Each goal g,
    when achieved gives a reward Rg
  • Each action a in the domain has a cost Ca
    (expressed in the same units as the reward)
  • Objective is to find a plan that has the highest
    net benefit (which is the difference between
    cumulative reward of all the goals achieved by
    the plan, and the cumulative cost of all the
    actions used in the plan)
  • How do we solve PSP Net Benefit problem?
  • Naïve (but guaranteed optimal) idea Consider all
    possible subset of the n goals. For each find the
    least costly plan (using cost-based planning
    graphs). Among all these, pick the one with the
    best benefit.
  • You die because there are 2n different calls to
    the planning algorithm ?
  • Cheap (but pretty inoptimal idea) Rank the n
    goals in terms of the expected net benefit of the
    plans for achieving them. Work just on the subset
    of goals with ve net benefit
  • You can do this by finding a (cost-sensitive)
    relaxed plan for each of the goals. The net
    benefit is the reward of the goal minus the cost
    of the relaxed plan.
  • Problem The cost of achieving a goal depends on
    what other goals we are planning to achieve in
    conjunction. We need to consider residual cost of
    achieving a goal gk1 in the context of goals
    g1..gk that have already been selected
  • A less greedy idea Generalize the relaxed plan
    extraction procedure such that it takes the
    relaxed plan P for achieving g1gk and attempts
    to re-use as many of the actions in P as possible
    while finding a relaxed plan for gk1
  • See AltAltps system (in AAAI 2004) for details

7
Belief State Search An Example Problem
Actions A1 M P gt K A2 M Q gt
K A3 M R gt L A4 K gt G A5 L gt G
Plan ??
  • Initial state M is true and exactly one of P,Q,R
    are true
  • Goal Need G

DNF good for progression (clauses are partial
states)
Init State Formula (p q
r)V(pqr)V(pqr)M DNF
MpqrVMpqrVMpqr CNF (P V Q
V R) (P V Q) (P V R) (Q V R) M
CNF good For regression
8
Progression Regression
  • Progression with DNF
  • The constituents (DNF clauses) look like
    partial states already. Think of applying action
    to each of these constituents and unioning the
    result
  • Action application converts each constituent to a
    set of new constituents
  • Termination when each constituent entails the
    goal formula
  • Regression with CNF
  • Very little difference from classical planning
    (since we already had partial states in classical
    planning).
  • THE Main difference is that we cannot split the
    disjunction into search space
  • Termination when each (CNF) clause is entailed by
    the initial state

9
Progression Example
10
Regression Search Example
Actions A1 M P gt K A2 M Q gt
K A3 M R gt L A4 K gt G A5 L gt G
G
A4
G or K must be true before A4 For G to be true
after A4
(G V K)
A5
(G V K V L)
A1
(G V K V L V P) M
Enabling precondition Must be true before A1 was
applied
A2
(G V K V L V P V Q) M
Initially (P V Q V R) (P V Q) (P V R)
(Q V R) M
Initially (P V Q V R) (P V Q) (P V R)
(Q V R) M
A3
Each Clause is Satisfied by a Clause in the
Initial Clausal State -- Done! (5 actions)
(G V K V L V P V Q V R) M
(G V K V L V P V Q V R) M
Goal State G
Clausal States compactly represent disjunction to
sets of uncertain literals Yet, still need
heuristics for the search
11
What happens if we restrict uncertainty?
  • If initial state contains only the known
    variables (either known to be true or known to be
    false),
  • DNF formula has one single constituent
  • CNF clauses are all singletons
  • So you can see how we go from 2(2n) to 3n

12
11/17
after all the money we spend on wardrobe and
cosmetic surgeries ?
13
Conformant Planning in Real World 2 examples
No. 42 HOW NOT TO BE SEEN (aka Monty Python on
Conformant Planning) Video shown in class
14
Heuristics for Conformant Planning
  • First idea Notice that Classical planning
    (which assumes full observability) is a
    relaxation of conformant planning
  • So, the length of the classical planning solution
    is a lowerbound (admissible heuristic) for
    conformant planning
  • Further, the heuristics for classical planning
    are also heuristics for conformant planning
    (albeit not very informed probably)
  • Next idea Let us get a feel for how estimating
    distances between belief states differs from
    estimating those between states

15
Three issues How many states are there?
How far are each of the states from goal? How
much interaction is there between states?
?For example if the length of plan for
taking S1 to goal is 10, S2 to
goal is 10, the length of plan for taking
both to goal could be anywhere between
10 and Infinity depending on
the interactions Notice that we talk about
state interactions here just
as we talked about goal interactions in
classical planning
Need to estimate the length of combined plan
for taking all states to the goal
16
Belief-state cardinality alone wont be enough
  • Early work on conformant planning concentrated
    exclusively on heuristics that look at the
    cardinality of the belief state
  • The larger the cardinality of the belief state,
    the higher its uncertainty, and the worse it is
    (for progression)
  • Notice that in regression, we have the opposite
    heuristicthe larger the cardinality, the higher
    the flexibility (we are satisfied with any one of
    a larger set of states) and so the better it is
  • From our example in the previous slide,
    cardinality is only one of the three components
    that go into actual distance estimation.
  • For example, there may be an action that reduces
    the cardinality (e.g. bomb the place ?) but the
    new belief state with low uncertainty will be
    infinite distance away from the goal.
  • We will look at planning graph-based heuristics
    for considering all three components
  • (actually, unless we look at cross-world mutexes,
    we wont be considering the interaction part)

17
Planning Graph Heuristic Computation
  • Heuristics
  • BFS
  • Cardinality
  • Max, Sum, Level, Relaxed Plans
  • Planning Graph Structures
  • Single, unioned planning graph (SG)
  • Multiple, independent planning graphs (MG)
  • Single, labeled planning graph (LUG)
  • Bryce , et. al, 2004 AAAI MDP workshop

18
Using a Single, Unioned Graph
P
P
P
P
P
M
A1
A1
A1
Q
Q
Q
Q
  • Minimal
  • implementation

A2
A2
M
R
R
R
R
A3
A3
M
M
M
  • Not effective
  • Lose world specific support information

M
M
K
K
K
Heuristic Estimate 2
A4
A4
L
L
Union literals from all initial states into a
conjunctive initial graph level
A5
G
G
19
Using Multiple Graphs
P
P
P
P
A1
A1
A1
  • Same-world Mutexes

M
M
M
M
P
K
K
K
A4
A4
M
G
G
  • Memory Intensive
  • Heuristic Computation Can be costly

Q
Q
Q
Q
Q
A2
A2
A2
M
M
M
M
M
R
K
K
K
A4
A4
M
G
G
R
R
R
R
A3
A3
A3
M
M
M
M
L
L
L
A5
A5
G
G
20
What about mutexes?
  • In the previous slide, we considered only relaxed
    plans (thus ignoring any mutexes)
  • We could have considered mutexes in the
    individual world graphs to get better estimates
    of the plans in the individual worlds (call these
    same world mutexes)
  • We could also have considered the impact of
    having an action in one world on the other world.
  • Consider a patient who may or may not be
    suffering from disease D. There is a medicine M,
    which if given in the world where he has D, will
    cure the patient. But if it is given in the world
    where the patient doesnt have disease D, it will
    kill him. Since giving the medicine M will have
    impact in both worlds, we now have a mutex
    between being alive in world 1 and being
    cured in world 2!
  • Notice that cross-world mutexes will take into
    account the state-interactions that we mentioned
    as one of the three components making up the
    distance estimate.
  • We could compute a subset of same world and cross
    world mutexes to improve the accuracy of the
    heuristics
  • but it is not clear whether or not the accuracy
    comes at too much additional cost to have
    reasonable impact on efficiency.. see Bryce et.
    Al. JAIR submission

21
Connection to CGP
  • CGPthe conformant Graphplandoes multiple
    planning graphs, but also does backward search
    directly on the graphs to find a solution (as
    against using these to give heuristic estimates)
  • It has to mark sameworld and cross world mutexes
    to ensure soundness..

22
Using a Single, Labeled Graph(joint work with
David E. Smith)
Action Labels Conjunction of Labels of
Supporting Literals
Labels signify possible worlds under which a
literal holds
P
P
P
P
P
P
M
  • Memory Efficient
  • Cheap Heuristics
  • Scalable
  • Extensible

A1
A1
A1
A1
Q
Q
Q
Q
Q
Q
A2
A2
A2
A2
M
R
R
R
R
R
A3
A3
A3
A3
R
M
M
M
M
M
Literal Labels Disjunction of Labels Of
Supporting Actions
K
K
K
Benefits from BDDs
A4
A4
L
L
L
Label Key
True
A5
A5
G
G
(P R) V (Q R)
Q R
P R
(P R) V (Q R) V (P Q)
Heuristic Value 5
P Q
23
Sensing Actions
  • Sensing actions in essence partition a belief
    state
  • Sensing a formula f splits a belief state B to
    Bf Bf
  • Both partitions need to be taken to the goal
    state now
  • Tree plan
  • AO search
  • Heuristics will have to compare two generalized
    AND branches
  • In the figure, the lower branch has an expected
    cost of 11,000
  • The upper branch has a fixed sensing cost of 300
    based on the outcome, a cost of 7 or 12,000
  • If we consider worst case cost, we assume the
    cost is 12,300
  • If we consider both to be equally likey, we
    assume 6303.5 units cost
  • If we know actual probabilities that the sensing
    action returns one result as against other, we
    can use that to get the expected cost

7
300
12,000
As
A
11,000
24
(No Transcript)
25
Cost models of conditional plans
  • The execution cost of a conditional plan is
  • Cost of O5
  • Prob(pT) cost of A1 A3 Prob(pF)cost
    of A2 A3
  • Can take max(cost A1A3 cost A2A3 )
  • The planning cost of a conditional plan is
    however is proportional to the total size of the
    plan (num actions)

O5p?
Y
N
A1
A2
A3
O5p?
Y
N
A1
A2
Need to estimate cost of leaf belief states
26
Slides beyond this point not covered
27
System Architecture
CAltAlt
IPC PDDL Parser
Input for
Input for
Heuristics
A Search Engine (HSP-r)
Planning Graph(s) (IPP)
Extracted From
Condense
Searches
Labels (CUDD)
Model Checker (NuSMV)
Belief States
Guided By
Validates
Off The - Shelf
Custom
28
Sum and Relaxed Plan Are Best for a single Graph
Relaxed Plan is Best Multiple Or Label Graphs
Label Graph using mutexes With relaxed plan is
best overall
29
Relaxed Plan is Best for a single Graph
Sum is Best for Multiple Graphs
Label Graph using mutexes With relaxed plan is
best overall
30
Cardinality does well
Multiple Graph Union Relaxed Plan scales
Label Graph Relaxed Plan Does best
31
Relaxed Plan approaches Scale better with time
approximate to cardinality And quality comparable
to optimal
OptimalApproaches scale poorly
Cardinality approaches are faster But quality
suffers
32
Relaxed Plan approaches Scale better with time
approximate to cardinality And quality comparable
to optimal
OptimalApproaches scale poorly
Cardinality approaches are faster But quality
suffers
33
Contingent Planning
  • Progression Planner PBSP
  • LAO type search -- Non-Deterministic Partially
    Observable
  • Build Planning Graph to compute heuristic for
    each Belief State
  • No Mutexes Computed
  • Added Observational Actions to Domains

34
Relaxed Plan approaches Scale better than
optimal approaches and have Comparable quality
OptimalApproaches scale poorly
Cardinality approaches are faster And scale
better But quality suffers by two orders of
magnitude
35
Conclusions Future Work
  • Conclusion
  • Distance Estimations using overlap are more
    informed than cardinality and max state to state
    heuristics
  • Multiple Planning Graphs give good heuristics,
    but are costly
  • Labeled Planning graphs reduce cost
  • Planning Graph Heuristics help control plan
    length while scaling to difficult problems
  • More details in
  • TR at http//rakaposhi.eas.asu.edu/belief-search
  • Conformant, Contingent all planning graph types
  • AAAI-04 MDP workshop
  • Labeled Planning Graph for conformant planning
  • Future Work
  • Stochastic Planning

36
Stochastic Planning
Stochastic Planning Problem
New Approach
Buridan
Relaxation Of Instance
Can use Relaxed Plans that are greedy On
Probability by Using Probability in Planning
Graph (similar to PGraphPlan)
Deterministic Planner (UCPOP)
Non-Deterministic Planner (PBSP or CAltAlt)
Convert Solution to Stochastic Plan
Non- DeterministicPlan
Deterministic Plan
Seed Stochastic Plan
A seed non-deterministic plan is likely to
reflect physics of a stochastic planning problem
better than a seed deterministic plan.
Local Search To Improve Probability of
Satisfaction
Stochastic Plan
37
Distance Estimates
Cardinality
Max State to State
State to State Overlap Belief state to Belief
state
4
7
10
2
3
max
union
6
7
min
min
min
5
4
?
3
4
7
38
Cardinality does well
Multiple Graph Union Relaxed Plan scales
Label Graph Relaxed Plan Does best, mutexes do
help
39
Relaxed Plan approaches Scale better than
optimal approaches, but have quality comparable
to optimal
OptimalApproaches scale poorly
Cardinality approaches are faster And scale
better But quality suffers by an order of
magnitude
Write a Comment
User Comments (0)
About PowerShow.com