Dynamic Programming as Sequential Decision Making - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Dynamic Programming as Sequential Decision Making

Description:

Need to find a function at each time k to map xk to uk , uk=mk(xk) ... There is no value of information, uk =mk (xk), xk is determined from x0 and previous controls ... – PowerPoint PPT presentation

Number of Views:61

Avg rating:3.0/5.0

Slides: 21

Provided by: seanwa6

Category:

more less

Transcript and Presenter's Notes

Title: Dynamic Programming as Sequential Decision Making

1
Dynamic Programming as Sequential Decision Making

Lecture 20 Dynamic Programming as Sequential
Decision Making
Sequential Decision Making
Example Inventory Control
Value of Information
Dynamic Programming Algorithm
Principle of Optimality
Deterministic Finite State Systems
Shortest Paths

Lecture 18 Introduction to Dynamic Programming
Structure of Dynamic Programs
Calculating Binomial Coefficient
Pascals Triangle
Coins Revisited
Lecture 19 DNA Sequence Alignment
Review Primer of Genome Science Handout
Needleman/Wunsch example
Review Assignment 5

Reference Dynamic Programming and Optimal
Control, D. Bertsekas
2
Sequential Decision Making

Given a discrete-time system (difference
equation)
wk are disturbances over which we have no control
Model the impact of external influences by
independent identically distributed random
variables (think a roll of the dice)
uk are the control variables we can choose from
an admissible set

Is this sequence well defined?
3
Sequential Decision Making

Given a discrete-time system (difference
equation)
wk are disturbances over which we have no control
Model the impact of external influences by
independent identically distributed random
variables (think a roll of the dice)
uk are the control variables we can choose from
an admissible set

Does this make sense?
4
Sequential Decision Making

Since wk is random, what does it mean to choose
uk to optimize the value of the cost function?

Does this make sense?
5
Sequential Decision Making

Given a discrete-time system (difference
equation)
wk are disturbances over which we have no control
Model the impact of external influences by
independent identically distributed random
variables (think a roll of the dice)
uk are the control variables we can choose from
an admissible set

Choose uk to optimize an additive cost function

6
Example Inventory Management

Inventory Management is one primary function of
Enterprise Resource Planning (ERP) software

Model the system
xk is the stock available at the beginning of
period k
uk is the stock ordered (and immediately
delivered) at the beginning of the kth period,
ukgt0
wk is the demand during the kth period, with
known probability distribution
Excess demand (resulting in negative values of x)
is backlogged and filled as soon as inventory is
available

7
Example Inventory Management

Inventory Management is one primary function of
Enterprise Resource Planning (ERP) software

Model the system

Define cost function
Pay r(xk) for either storing excess inventory
(xkgt0), or for shortage costs (xklt0)
Pay purchasing cost cuk where c is cost per unit
ordered
Pay an End of Season cost, R(xN), for inventory
left after N periods

8
Example Inventory Management

Inventory Management is one primary function of
Enterprise Resource Planning (ERP) software

Model the system

Define cost function

Want to choose (u0, u1, , uN-1) to minimize the
total expected cost. Two important ways of doing
this
Open Loop Control Decide (u0, u1, , uN-1) all
at once
Closed Loop Control Use xk to improve each
decision. Need to find a function at each time k
to map xk to uk , ukmk(xk). Decide (m0, m1, ,
mN-1) all at once, not u

9
Example Inventory Management

Inventory Management is one primary function of
Enterprise Resource Planning (ERP) software

Model the system

Define cost function

Want to choose (u0, u1, , uN-1) to minimize the
total expected cost

10
Value of Information
11
Dynamic Programming Algorithm

Principle of Optimality (why DP works)
Let pm0, m1, , mN-1 be an optimal policy
for the basic problem
Suppose that when using p, a state xi occurs at
time i with some probability
Consider the subproblem starting from xi at time
i minimizing the cost-to-go from i to N
Then, the truncated policy mi, mi1, , mN-1
is optimal for the subproblem.

Why?
12
Optimality in Driving
Shortest route from AF to Provo passes through
Orem, so...
13
Optimality in Driving
Shortest route from AF to Orem follows the
shortest route from AF to Provo.
14
Dynamic Programming Algorithm

Principle of Optimality (why DP works)
Let pm0, m1, , mN-1 be an optimal policy
for the basic problem
Suppose that when using p, a state xi occurs at
time i with some probability
Consider the subproblem starting from xi at time
i minimizing the cost-to-go from i to N
Then, the truncated policy mi, mi1, , mN-1
is optimal for the subproblem.

Prove it!
15
Dynamic Programming Algorithm

Re-consider the Inventory Management example
Use Principle of Optimality, working backwards in
time
Period N-1 assume xN-1 is given
J(xN-1)ER(xN)r(xN-1)cuN-1
r(xN-1) is fixed, regardless of choice
for uN-1
choose uN-1gt0 to minimize
JN-1(xN-1 ) r(xN-1) cuN-1 ER(xN)
r(xN-1) cuN-1 ER(xN-1uN-1-wN-1)
Need to compute J for all values of
xN-1, get m(xN-1)
Period N-2 assume xN-2 is given
choose uN-1gt0 to minimize
JN-2 (xN-2 ) r(xN-2)cuN-2EJN-1(xN-2u
N-2-wN-2 )?m(xN-2)
Period k Jk (xk ) r(xk) min cuk
EJk1(xkuk-wk ) ?m(xk)

16
Dynamic Programming Algorithm

Theorem For every initial state x0, the
optimal cost J(x0) of the basic problem is equal
to J0(x0), where the function J0 is given by the
last step of the following algorithm, which
proceeds backward in time from period N-1 to
period 0
The uk that minimizes the right hand side,
given xk, is a function mk (xk) for each k, and
the policy pm0, m1, , mN-1 is optimal.

17
Special Case Deterministic Finite-State Systems

Suppose wk is fixed to take on only one value
There is no value of information, uk mk (xk), xk
is determined from x0 and previous controls
No need for feedback
Choose u0, u1, , uN-1 directly, instead of
mi, mi1, , mN-1
Suppose state space is finite i.e. xk is chosen
from a finite set for every k
Given a state xk , a control uk is associated
with the transition fk(xk,,uk) and a cost
gk(xk,,uk)
Equivalently represented as a graph
Nodes are states
Edges are transitions
Every edge has a cost associated with it

18
Special Case Deterministic Finite-State Systems
DP Shortest Path!
19
Special Case Deterministic Finite-State Systems
5
Senine
4

Seon
1
3
Initial state
0
Shum
Artificial Terminal Node
2
0
Limnah
1
0

Stage 1
Stage 2
Stage N-1
Stage N
Stage 0
Amount to make change N7
Edge weights are 1 except when transitioning from
zero to zero, in which case theyre zero.
Shortest path is optimal!
20
Dynamic Programming as Sequential Decision Making