Title: Policy%20Generation%20for%20Continuous-time%20Stochastic%20Domains%20with%20Concurrency
1Policy Generation forContinuous-time Stochastic
Domains with Concurrency
Håkan L. S. Younes Reid G. Simmons
Carnegie Mellon University Carnegie Mellon University
2Introduction
- Policy generation for asynchronous stochastic
systems - Rich goal formalism
- policy generation and repair
- Solve relaxed problem using deterministic
temporal planner - Decision tree learning to generalize plan
- Sample path analysis to guide repair
3Motivating Example
- Deliver package from CMU to Honeywell
PIT
CMU
Pittsburgh
MSP
Honeywell
Minneapolis
4Elements of Uncertainty
- Uncertain duration of flight and taxi ride
- Plane can get full without reservation
- Taxi might not be at airport when arriving in
Minneapolis - Package can get lost at airports
Asynchronous events ? not semi-Markov
5Asynchronous Events
- While the taxi is on its way to the airport, the
plan may become full
fill plane _at_ t0
driving taxiplane not full
driving taxi plane full
Arrival time distribution
F(tt gt t0)
F(t)
6Rich Goal Formalism
- Goals specified as CSL formulae
- ? true a ? ? ? ?? P? (? U T ?)
- Goal example
- Probability is at least 0.9 that the package
reaches Honeywell within 300 minutes without
getting lost on the way - P0.9 (?lostpackage U 300 atpkg,honeywell)
7Problem Specification
- Given
- Complex domain model
- Stochastic discrete event system
- Initial state
- Probabilistic temporally extended goal
- CSL formula
- Wanted
- Policy satisfying goal formula in initial state
8Generate, Test and Debug Simmons, AAAI-88
Generate initial policy
good
Test if policy is good
bad
repeat
Debug and repair policy
9Generate
- Ways of generating initial policy
- Generate policy for relaxed problem
- Use existing policy for similar problem
- Start with null policy
- Start with random policy
Generate
Test
Debug
10Test Younes et al., ICAPS-03
- Use discrete event simulation to generate sample
execution paths - Use acceptance sampling to verify probabilistic
CSL goal conditions
Generate
Test
Debug
11Debug
- Analyze sample paths generated in test step to
find reasons for failure - Change policy to reflect outcome of failure
analysis
Generate
Test
Debug
12Closer Look at Generate Step
Gener
Generate initial policy
Test if policy is good
Debug and repair policy
13Policy Generation
Probabilistic planning problem
Eliminate uncertainty
Deterministic planning problem
Solve using temporal planner(e.g. VHPOP Younes
Simmons, JAIR 20)
Temporal plan
Generate training databy simulating plan
State-action pairs
Decision tree learning
Policy (decision tree)
14Conversion to Deterministic Planning Problem
- Assume we can control nature
- Exogenous events are treated as actions
- Actions with probabilistic effects are split into
multiple deterministic actions - Trigger time distributions are turned into
interval duration constraints - Objective Find some execution trace satisfying
path formula ?1 U T ?2 of probabilistic goal P?
(?1 U T ?2)
15Generating Training Data
enter-taxime,pgh-taxi,cmu
s0 enter-taxime,pgh-taxi,cmu
s1 depart-taxime,pgh-taxi,cmu,pgh-airport
depart-planeplane,pgh-airport,mpls-airport
s2 idle
depart-taxime,pgh-taxi,cmu,pgh-airport
s3 leave-taxime,pgh-taxi,pgh-airport
arrive-taxipgh-taxi,cmu,pgh-airport
s4 check-inme,plane,pgh-airport
leave-taxime,pgh-taxi,pgh-airport
s5 idle
check-inme,plane,pgh-airport
s0
s3
s6
s1
s4
s2
s5
16Policy Tree
atpgh-taxi,cmu
atme,cmu
atplane,mpls-airport
atmpls-taxi,mpls-airport
atme,pgh-airport
enter-taxi
depart-taxi
inme,plane
movingmpls-taxi,mpls-airport,honeywell
atme,mpls-airport
check-in
movingpgh-taxi,cmu,pgh-airport
enter-taxi
depart-taxi
leave-taxi
idle
idle
leave-taxi
idle
17Closer Look at Debug Step
Generate initial policy
Test if policy is good
Debug
Debug and repair policy
18Policy Debugging
Sample execution paths
Sample path analysis
Failure scenarios
Solve deterministic planning problem taking
failure scenario into account
Temporal plan
Generate training databy simulating plan
State-action pairs
Incremental decision tree learningUtgoff et
al., MLJ 29
Revised policy
19Sample Path Analysis
- Construct Markov chain from paths
- Assign values to states
- Failure 1 Success 1
- All other
- Assign values to events
- V(s') V(s) for transition s?s' caused by e
- Generate failure scenarios
20Sample Path Analysis Example
Sample paths
e1
e2
s0
s1
s2
e1
e4
e2
s0
s1
s4
s2
e3
s0
s3
21Failure Scenarios
Failure paths
e1
e2
s0
s1
s2
e1
e4
e2
s0
s1
s4
s2
Failure path 1 Failure path 2 Failure scenario
e1 _at_ 1.2 e1 _at_ 1.6 e1 _at_ 1.4
e2 _at_ 4.4 e4 _at_ 4.5 e2 _at_ 4.6
- e2 _at_ 4.8 -
22Additional Training Data
leave-taxime,pgh-taxi,cmu
s0 leave-taxime,pgh-taxi,cmu
s1 make-reservationme,plane,cmu
depart-planeplane,pgh-airport,mpls-airport
s2 enter-taxime,pgh-taxi,cmu
fill-planeplane,pgh-airport
s3 depart-taxime,pgh-taxi,cmu,pgh-airport
make-reservationme,plane,cmu
s4 idle
enter-taxime,pgh-taxi,cmu
s5 idle
depart-taxime,pgh-taxi,cmu,pgh-airport
arrive-taxipgh-taxi,cmu,pgh-airport
s0
s6
s5
s4
s1
s3
s2
23Revised Policy Tree
atpgh-taxi,cmu
atme,cmu
has-reservationme,plane
has-reservationme,plane
enter-taxi
depart-taxi
make-reservation
leave-taxi
24Summary
- Planning with stochastic asynchronous events
using a deterministic planner - Decision tree learning to generalize
deterministic plan - Sample path analysis for generating failure
scenarios to guide plan repair
25Coming Attractions
- Decision theoretic planning with asynchronous
events - A formalism for stochastic decision processes
with asynchronous events, MDP Workshop at
AAAI-04 - Solving GSMDPs using continuous phase-type
distributions, AAAI-04