Title: CSE 574 Planning
1CSE 574 Planning Learning(which is actually
more of the former and less of the latter)
- Subbarao Kambhampati
- http//rakaposhi.eas.asu.edu/cse574
2The Indian Standard Time
- Right now, it is 345AM in the morning in India
- Where I was for the whole break
- And only got back yesterday
- And my body thinks it is still in India
- I could never stay awake after 3AM
- And the greedy Mariott closed the only
half-decent coffee shop around here. - So. Wake me up if you see me dozing off
3Most everything Will be on Homepage
4Logistics
- Office hours After class 430-530 and by
appointment - No official TA
- Romeo Sanchez (rsanchez_at_asu.edu)
- And Binh Minh Do (binhminh_at_asu.edu)
- Will kindly provide unofficial TA support
- Caveats
- Graduate level class. No text bookyou will read
papers. Participation required and essential - Evaluation (subject to change)
- Participation (20)
- Do readings before classes. Attend classes. Take
part in discussions. Be scribes for class
discussions. - Projects/Homeworks (35)
- May involve using existing planners, writing new
domains - Semester project (20)
- Either a term paper or a code-based project
- Mid-term and Final (25)
5Your introductions
- Name
- Standing
- Area(s) of interest
- Reasons if any for taking the course
- Do you prefer
- Homeworks/class projects OR
- Semester long individual project?
6Planning The big picture
- Synthesizing goal-directed behavior
- Planning involves
- Action selection Handling causal dependencies
- Action sequencing and handling resource
allocation - typically called SCHEDULING
- Depending on the problem, plans can be
- action sequences
- or policies (action trees, state-action
mappings etc.)
7The Many Complexities of Planning
(Static vs. Dynamic)
(Observable vs. Partially Observable)
Environment
perception
(perfect vs. Imperfect)
(Full vs. Partial satisfaction)
(Instantaneous vs. Durative)
action
Goals
(Deterministic vs. Stochastic)
The Question
What action next?
8Planning (Classical Planning)
(Static)
Environment
(Observable)
Goals
perception
action
(perfect)
(deterministic)
What action next?
I initial state G goal state
Oi
(prec)
(effects)
I
G
Oi
Oj
Ok
Om
9(No Transcript)
10Class of 23rd January
- I am less jet-lagged
- (waking up only at 3AM)
- I discovered Side-bar café
- (near law-library)
- Even started sadism
- (homework assignments)
- In shortgeneral sweetness and light all around
11Applications (Current Potential)
- Scheduling problems with action choices as well
as resource handling requirements - Problems in supply chain management
- HSTS (Hubble Space Telescope scheduler)
- Workflow management
- Autonomous agents
- RAX/PS (The NASA Deep Space planning agent)
- Software module integrators
- VICAR (JPL image enhancing system) CELWARE
(CELCorp) - Test case generation (Pittsburgh)
- Interactive decision support
- Monitoring subgoal interactions
- Optimum AIV system
- Plan-based interfaces
- E.g. NLP to database interfaces
- Plan recognition
- Web-service composition
12Lots of activity...
New people. Conferences. Workshops.
Competitions. Inter-planetary explorations. So,
Why the increased interest?
- Significant scale-up in the last 4-5 years
- Before we could synthesize about 5-6 action plans
in minutes - Now, we can synthesize 100-action plans in
minutes - Further scale-up with domain-specific control
- Significant strides in our understanding
- Rich connections between planning and CSP(SAT) OR
(ILP) - Vanishing separation between planning
Scheduling - New ideas for heuristic control of planners
- Wide array of approaches for customizing planners
with domain-specific knowledge
13Broad Aims Biases of the First Part
- AIM We will concentrate on planning in
deterministic, quasi-static and fully observable
worlds - Will start with classical domains but
- discuss handling durative actions and
- numeric constraints, as well as replanning
BIAS To the extent possible, we shall shun
brand-names and concentrate on
unifying themes Better understanding of existing
planners Normalized comparisons between
planners Evaluation of trade-offs provided by
various design choices Better understanding of
inter-connections Hybrid planners using multiple
refinements Explication of the connections
between planning, CSP, SAT and ILP
14Overview for the first part
- The Planning problem
- Our focus
- Modeling, Proving correctness
- Refinement Planning Formal Framework
- Conjunctive refinement planners
- Disjunctive refinement planners
- Refinement of disjunctive plans
- Solution extraction from disjunctive plans
- Direct, Compiled (SAT, CSP, ILP,BDD)
- Heuristics/Optimizations
- Customizing Planners
- User-assisted Customization
- Automated customization
- Support for non-classical worlds
15Why Care about classical Planning?
- Most of the recent advances occurred in
neo-classical planning - Many stabilized environments satisfy
neo-classical assumptions - It is possible to handle minor assumption
violations through replanning and execution
monitoring - This form of solution has the
advantage of relying on widely-used (and often
very efficient) classical planning technology
Boutilier, 2000 - Techniques developed for neo-classical planning
often shed light on effective ways of handling
non-classical planning worlds - Currently, most of the efficient techniques for
handling non-classical scenarios are still based
on ideas/advances in classical planning
16..As such, the classcial model can b viewed as a
way of approximating the solution of the
underlying POMDP. This form of solution has
the advantage of relying on widely-used (and
often very efficient) classical planning
technology Also put some of the
classification stuff?
17The (too) many brands of classical planners
Planning as Theorem Proving (Greens planner)
Planning as Search
Search in the space of States (progression,
regression, MEA) (STRIPS, PRODIGY, TOPI, HSP,
HSP-R, UNPOP, FF)
Planning as Model Checking
Search in the space of Plans (total order,
partial order, protections, MTC) (Interplan,SNLP,T
OCL, UCPOP,TWEAK)
Search in the space of Task networks (reduction
of non-primitive tasks) (NOAH, NONLIN,
O-Plan, SIPE)
Planning as CSP/ILP/SAT/BDD (Graphplan, IPP,
STAN, SATPLAN, BLackBOX,GP-CSP,BDDPlan)
18A Unifying View
19Modeling Planning ProblemsActions, States,
Correctness
PART I.0
20Transition Sytem Perspective
- We can think of the agent-environment dynamics in
terms of the transition systems - A transition system is a 2-tuple ltS,Agt where
- S is a set of states
- A is a set of actions, with each action a being a
subset of SXS - Transition systems can be seen as graphs with
states corresponding to nodes, and actions
corresponding to edges - If transitions are not deterministic, then the
edges will be hyper-edgesi.e. will connect
sets of states to sets of states - The agent may know that its initial state is some
subset S of S - If the environment is not fully observable, then
Sgt1 . - S can be gt 1 even in fully-observable domains
(if we want to do find policies rather than
plans) - It may consider some subset Sg of S as desirable
states - Finding a plan is equivalent to finding
(shortest) paths in the graph corresponding to
the transition system
21Transition System Models
A transition system is a two tuple ltS, Agt Where
S is a set of states A is a set of
transitions each transition a is a subset
of SXS --If a is a (partial) function then
deterministic transition --otherwise, it is
a non-deterministic transition --It is
a stochastic transition If there are
probabilities associated with each state a takes
s to --Finding plans becomes is equivalent
to finding paths in the transition system
Each action in this model can be Represented by
incidence matrices (e.g. below) The set of all
possible transitions Will then simply be the SUM
of the Individual incidence matrices
Transition system models are called Explicit
state-space models In general, we would like
to represent the transition systems more
compactly e.g. State variable representation
of states. These latter are called Factored
models
22Manipulating Transition Systems
23MDPs as general cases of transition systems
- An MDP (Markov Decision Process) is a general
(deterministic or non-deterministic) transition
system where the states have Rewards - In the special case, only a certain set of goal
states will have high rewards, and everything
else will have no rewards - In the general case, all states can have varying
amount of rewards - Planning, in the context of MDPs, will be to find
a policy (a mapping from states to actions)
that has the maximal expected reward - We will talk about MDPs later in the semester
24Problems with transition systems
- Transition systems are a great conceptual tool to
understand the differences between the various
planning problems - However direct manipulation of transition
systems tends to be too cumbersome - The size of the explicit graph corresponding to a
transition system is often very large (see
Homework 1 problem 1) - The remedy is to provide compact
representations for transition systems - Start by explicating the structure of the
states - e.g. states specified in terms of state variables
- Represent actions not as incidence matrices but
rather functions specified directly in terms of
the state variables - An action will work in any state where some state
variables have certain values. When it works, it
will change the values of certain (other) state
variables