Policy%20Generation%20for%20Continuous-time%20Stochastic%20Domains%20with%20Concurrency

About This Presentation

Title:

Policy%20Generation%20for%20Continuous-time%20Stochastic%20Domains%20with%20Concurrency

Description:

Uncertain duration of flight and taxi ride. Plane can get full without reservation ... Use discrete event simulation to generate sample execution paths ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 26

Provided by: hakany3

Learn more at: http://www.tempastic.org

Category:

more less

Transcript and Presenter's Notes

Title: Policy%20Generation%20for%20Continuous-time%20Stochastic%20Domains%20with%20Concurrency

1
Policy Generation forContinuous-time Stochastic
Domains with Concurrency
Håkan L. S. Younes Reid G. Simmons
Carnegie Mellon University Carnegie Mellon University
2
Introduction

Policy generation for asynchronous stochastic
systems
Rich goal formalism
policy generation and repair
Solve relaxed problem using deterministic
temporal planner
Decision tree learning to generalize plan
Sample path analysis to guide repair

3
Motivating Example

Deliver package from CMU to Honeywell

PIT
CMU
Pittsburgh
MSP
Honeywell
Minneapolis
4
Elements of Uncertainty

Uncertain duration of flight and taxi ride
Plane can get full without reservation
Taxi might not be at airport when arriving in
Minneapolis
Package can get lost at airports

Asynchronous events ? not semi-Markov
5
Asynchronous Events

While the taxi is on its way to the airport, the
plan may become full

fill plane _at_ t0
driving taxiplane not full
driving taxi plane full
Arrival time distribution
F(tt gt t0)
F(t)
6
Rich Goal Formalism

Goals specified as CSL formulae
? true a ? ? ? ?? P? (? U T ?)
Goal example
Probability is at least 0.9 that the package
reaches Honeywell within 300 minutes without
getting lost on the way
P0.9 (?lostpackage U 300 atpkg,honeywell)

7
Problem Specification

Given
Complex domain model
Stochastic discrete event system
Initial state
Probabilistic temporally extended goal
CSL formula
Wanted
Policy satisfying goal formula in initial state

8
Generate, Test and Debug Simmons, AAAI-88
Generate initial policy
good
Test if policy is good
bad
repeat
Debug and repair policy
9
Generate

Ways of generating initial policy
Generate policy for relaxed problem
Use existing policy for similar problem
Start with null policy
Start with random policy

Generate
Test
Debug
10
Test Younes et al., ICAPS-03

Use discrete event simulation to generate sample
execution paths
Use acceptance sampling to verify probabilistic
CSL goal conditions

Generate
Test
Debug
11
Debug

Analyze sample paths generated in test step to
find reasons for failure
Change policy to reflect outcome of failure
analysis

Generate
Test
Debug
12
Closer Look at Generate Step
Gener
Generate initial policy
Test if policy is good
Debug and repair policy
13
Policy Generation
Probabilistic planning problem
Eliminate uncertainty
Deterministic planning problem
Solve using temporal planner(e.g. VHPOP Younes
Simmons, JAIR 20)
Temporal plan
Generate training databy simulating plan
State-action pairs
Decision tree learning
Policy (decision tree)
14
Conversion to Deterministic Planning Problem

Assume we can control nature
Exogenous events are treated as actions
Actions with probabilistic effects are split into
multiple deterministic actions
Trigger time distributions are turned into
interval duration constraints
Objective Find some execution trace satisfying
path formula ?1 U T ?2 of probabilistic goal P?
(?1 U T ?2)

15
Generating Training Data
enter-taxime,pgh-taxi,cmu
s0 enter-taxime,pgh-taxi,cmu
s1 depart-taxime,pgh-taxi,cmu,pgh-airport
depart-planeplane,pgh-airport,mpls-airport
s2 idle
depart-taxime,pgh-taxi,cmu,pgh-airport
s3 leave-taxime,pgh-taxi,pgh-airport
arrive-taxipgh-taxi,cmu,pgh-airport
s4 check-inme,plane,pgh-airport
leave-taxime,pgh-taxi,pgh-airport
s5 idle

check-inme,plane,pgh-airport
s0
s3
s6
s1
s4
s2
s5
16
Policy Tree
atpgh-taxi,cmu
atme,cmu
atplane,mpls-airport
atmpls-taxi,mpls-airport
atme,pgh-airport
enter-taxi
depart-taxi
inme,plane
movingmpls-taxi,mpls-airport,honeywell
atme,mpls-airport
check-in
movingpgh-taxi,cmu,pgh-airport
enter-taxi
depart-taxi
leave-taxi
idle
idle
leave-taxi
idle
17
Closer Look at Debug Step
Generate initial policy
Test if policy is good
Debug
Debug and repair policy
18
Policy Debugging
Sample execution paths
Sample path analysis
Failure scenarios
Solve deterministic planning problem taking
failure scenario into account
Temporal plan
Generate training databy simulating plan
State-action pairs
Incremental decision tree learningUtgoff et
al., MLJ 29
Revised policy
19
Sample Path Analysis

Construct Markov chain from paths
Assign values to states
Failure 1 Success 1
All other
Assign values to events
V(s') V(s) for transition s?s' caused by e
Generate failure scenarios

20
Sample Path Analysis Example
Sample paths
e1
e2
s0
s1
s2
e1
e4
e2
s0
s1
s4
s2
e3
s0
s3
21
Failure Scenarios
Failure paths
e1
e2
s0
s1
s2
e1
e4
e2
s0
s1
s4
s2
Failure path 1 Failure path 2 Failure scenario
e1 _at_ 1.2 e1 _at_ 1.6 e1 _at_ 1.4
e2 _at_ 4.4 e4 _at_ 4.5 e2 _at_ 4.6
- e2 _at_ 4.8 -
22
Additional Training Data
leave-taxime,pgh-taxi,cmu
s0 leave-taxime,pgh-taxi,cmu
s1 make-reservationme,plane,cmu
depart-planeplane,pgh-airport,mpls-airport
s2 enter-taxime,pgh-taxi,cmu
fill-planeplane,pgh-airport
s3 depart-taxime,pgh-taxi,cmu,pgh-airport
make-reservationme,plane,cmu
s4 idle
enter-taxime,pgh-taxi,cmu
s5 idle

depart-taxime,pgh-taxi,cmu,pgh-airport
arrive-taxipgh-taxi,cmu,pgh-airport
s0
s6
s5
s4
s1
s3
s2
23
Revised Policy Tree
atpgh-taxi,cmu
atme,cmu

has-reservationme,plane
has-reservationme,plane
enter-taxi
depart-taxi
make-reservation
leave-taxi
24
Summary

Planning with stochastic asynchronous events
using a deterministic planner
Decision tree learning to generalize
deterministic plan
Sample path analysis for generating failure
scenarios to guide plan repair

25
Coming Attractions

Decision theoretic planning with asynchronous
events
A formalism for stochastic decision processes
with asynchronous events, MDP Workshop at
AAAI-04
Solving GSMDPs using continuous phase-type
distributions, AAAI-04

Write a Comment

User Comments (0)