Title: Chapter 11 Dynamic Programming
1Chapter 11Dynamic Programming
- by Dr. Peitsang Wu
- Department of Industrial
- Engineering and Management
- I-Shou University
2An Example
3Network Representation
4A Solution
- By using the minimum technique for selecting the
shortest step offered by each successive step, we
will have the possible shortest path A?B ? F ? I
? J, with cost 13. - When replacing A?B ? F with A?D ? F , we get
another path with cost only 11. - One possible approach is to enumerate all the
possible routes, which is 18 routes. This is
so-called exhaust enumeration method.
5Dynamic Programming
- Stage
- State
- Decision variable
- Optimal policy (Optimal solution)
6Dynamic Programming
- Dynamic programming does not exist a standard
mathematical formulation of the dynamic
programming problem. Rather, dynamic programming
is a general type of approach to problem solving,
and the particular equations used must be
developed to fit each situation.
7Dynamic Programming
- Dynamic programming starts with small portion of
the original problem and finds the optimal
solution for this smaller problem. It then
gradually enlarges the problem, finding the
current optimal solution from the preceding one,
until the original problem is solved in its
entirety.
8Formulation
- Let decision variable xn, (n1,2,3,4) be the
immediate destination on stage n. The route
selected is A? x1 ? x2 ? x3 ? x4, where x4 is J. - Let fn(s, xn ) be the total cost of the best
overall policy for the remaining stages, given
that you are in state s, ready to start stage n,
and select xn as the immediate destination. - Given s and n, let xn denotes any value of xn
(not necessary unique) that minimizes fn(s, xn ),
and let f n(s) be the corresponding minimum
value of fn(s, xn ).
9Formulation
- Thus
- where
- fn(s, xn ) immediate cost (at
stage n) - minimum future
cost (stages - n1 onward)
Cs,xnf n1( xn ), the value of Cs,xn is given
by the preceding tables for by is (the current
state) and j xn (the immediate destination),
here f 5( J ) 0. - Objective is to find f 1(A) and the
corresponding route.
10Solution
11Solution
12Solution
x3
s
13Solution
14Solution
x2
s
15Solution
16Solution
x1
s
17Optimal Solution
18Characteristic of DP Problems
- The problem can be divided into stage, with a
policy decision required at each stage. - Each stage has a number of states associated with
the beginning of that stage. - The effect of policy decision at each stage to
transform the current state to a state associated
with the beginning of the next stage (possibly
according to a probability distribution). - The solution procedure is designed to find an
optimal policy for the overall problem,i.e.,a
prescription of the optimal policy decision at
each stage for each of the possible states.
19Characteristic of DP Problems
- Given the current state, an optimal policy for
the remaining stages is independent of the policy
decisions adopted in previous stages.Therefore,
the optimal immediate decision depends on only
the current states and not on how you got there.
This is the principle of optimality for DP. - The solution procedure begins by finding optimal
policy for the last stage. - A recursive relationship that identifies the
optimal policy for stage n, given the optimal
policy for the stage n1, is available.
20Characteristic of DP Problems
- For the stagecoach problem, this recursive
relationship was -
- Therefore, finding the optimal policy
decision when you start in state s at stage n
requires finding the minimizing value xn. - For this particular problem, the corresponding
minimum cost is achieved by using the value of xn
and following the optimal policy when you start
in state xn at stage n1.
21Characteristic of DP Problems
- The precise form of the recursive relationship
differs somewhat among DP problem. However
notation analogous to that introduced in
preceding section will continue to be used here,
as summarized below. - Nnumber of stages.
- nlabel for current stage (n1,2,,N)
- sn current state for stage n.
- xndecision variable for stage n.
- optimal value of xn
22Characteristic of DP Problems
- fn(sn,xn) contribution of stages n, n1,.,N to
objective function if system starts in state sn
at stage n, immediate decision is xn, and optimal
decision are made thereafter. - .
- The recursive relationship will always be of the
- form or
- where fn(sn,xn) would be written in terms of sn,
xn, - , and probably some measure of
the - immediate contribution of xn to the objective
- function.
23Characteristic of DP Problems
- It is the inclusion of on the right
hand side, so that is defined in terms
of that makes the expression for
a recurring relationship. - The recursive relationship keeps recurring as we
move backward stage. When the current stage
number n is decreased by 1, the new
function is derived by using the
function that was just derived during the
preceding iteration and then this process keeps
repeating.
24Characteristic of DP Problems
- When we use this recursive relationship, the
solution procedure starts at the end and moves
backward stage by stage-each time finding the
optimal policy for that stage-until it finds the
optimal policy starting at the initial stage.
This optimal policy immediately yields an optimal
solution for the entire problem, namely, for
the initial state s1, then for the resulting
state s2, and so forth to for the resulting
stage sN.
25Characteristic of DP Problems
- A table such as the following would be obtained
for each stage (nN, N-1, .,1).
xn
sn
26Deterministic DP
- Deterministic dynamic programming can be
described diagrammatically as shown in Fig. 11.3.
Thus, at stage n the process will be in some
state sn.
27Dynamic Programming
- Making policy decision x, then moves the process
to some state sn1 at stage n 1. - The contribution thereafter to the objective
function under an optimal policy has been
previously calculated to be - Optimizing with respect to xn then gives
- . After
and are found for each possible value
of sn, the solution procedure is ready to move
back one stage.
28Ex 2 Distributing Medical Teams to Countries
29Formulation
- This problem requires making three interrelated
decisions, namely, how many medical teams to
allocate to each of the three countries. - The decision variables xn (n 1, 2, 3) are the
number of teams to allocate to stage (country) n. - sn number of medical teams still available for
allocation to remaining countries(n, . . . , 3). - To state the overall problem mathematically, let
pi(xi) be the measure of performance from
allocating xi medical teams to country i, as
given in Table 11.1.
30(No Transcript)
31Basic Structure
32Formulation
- and xi are nonnegative integers. Using the
notation presented in Sec. 11.2, we see that - fn(sn, xn) is
- where the maximum is taken over xn1,,x3 such
that . And the xi are
nonnegative integers. In addition,
33Formulation
- Therefore,
- Consequently, the recursive relationship
relating functions for
this problem is - For the last stage (n3)
34Solution Procedure
35Solution Procedure
- Stage n 2
- Formula
-
- x2 0
- x2 1
- x2 2
-
Because the objective is -
maximization, - with
.
36Solution Procedure
37Solution Procedure
- Stage n 1
- Formula
-
- The similar
calculations for x1 2, - 3, 4 (try it)
verify that - with
, as shown in - the following
table.
38Solution Procedure
39Solution Procedure
- Thus, the optimal solution has , which
makes s2 5 14, so , which - makes s3 4 3, so . Since,
this (1, 3, 1) allocation of medical teams to
the three countries will yield an estimated total
of 170,000 additional person-years of life, which
is at least 5,000 more than for any other
allocation. - These results of the dynamic programming analysis
also are summarized in Fig. 11.6.
40(No Transcript)
41Ex3 Wyndor Glass Company Problem
42Formulation
- This problem requires making two interrelated
decisions, namely, the level of activity 1,
denoted by x1, and the level of activity 2,
denoted by x2 - let stage n activity n (n 1, 2).Thus, xn is
the decision variable at stage n. - Interpret the right-hand side of these
constraints (4, 12, and 18) as the total
available amount of resources 1, 2, and 3. - State sn amount of respective resources still
available for allocation to remaining activities.
43Formulation
- sn (R1, R2, R3) ,where Ri is the amount of
resource i remaining to be allocated (i1, 2, 3).
Therefore, - s1 ( , , ),
- s2 ( , , )
- However, when we begin by solving for stage 2, we
do not yet know the value of xi, and so we use
s2 (R1, R2, R3) at that point.
44Formulation
- f2 (R1, R2, R3, x2)
- contribution of activity 2 to z if system starts
in state (RI, R2, R3) at stage 2 and decision is
x2 -
- f1 (4, 12, 18, x1)
- contribution of activities 1 and 2 to z if
system starts in state (4, 12, 18) at stage 1,
immediate decision is x1, and then optimal
decision is made at stage 2, -
45Formulation
46Basic structure
47Solution Procedure
- Stage 2 To solve at the last stage (n 2), Eq.
(1) indicates that must be the largest value
of x2 that simultaneously satisfies 2x2 ? R2, 2x2
?R3, and x2?0 - n2
48Solution Procedure
- Stage 1 (R1, R2, R3)( , , )
- so that
49Solution Procedure
Achieve their maximum at x1 , it follows
that and that this maximum is .
50Solution Procedure
- n 1
- Because leads to, R1 ?
, R2 , R3 ? ( ) , for stage
2, the n 2 table yields ,
is the optimal solution for this
problem.
51Inventory Problem
- A company knows that the demand for its product
during each of the next four months will be as
follows month 1, 1 unit month 2, 3 units month
3, 2 units month 4, 4 units. - During a month in which any units are produced, a
set up cost of 3 is incurred. - In addition, there is a variable cost of 1 for
every unit produced. - At the end of each month, a holding cost of 0.5
per unit on hand is incurred.
52Inventory Problem
- Capacity limitation allow maximum of 5 units to
be produced during each month. - The size of the companys warehouse restricts the
ending inventory for each month to at most 4
units. - Assume that 0 units are on hand at the beginning
of the first month. - The company wants to determine a production
schedule that will meet all demands on time and
will minimize the sum of production and holding
cost during the four months..
53Definition
- Stage time.
- State the beginning inventory level.
- Decision Variable xt(i) to be a production level
during month t that minimizes the total cost
during months t, t 1, ..., 4 if i units are on
hand at the beginning of month t. - Define ft(i) to be the minimum cost of meeting
demands for months t, t 1, .. . , 4 if i units
are on hand at the beginning of month t. - Define c(x) to be the cost of producing x units
during a period.
54Solution Procedure
- Stage 4 During month 4, the firm will produce
just enough units to ensure that the month 4
demand of 4 units is met. This yields - f4(0) c(4) and x4(0) -
- f4(1) c(3) and x4(1) -
- f4(2) c(2) and x4(2) -
- f4(3) c(1) and x4(3) -
- f4(4) c(0) and x4(4) -
55Solution Procedure
- Stage 3 The cost f3(i) is the minimum cost
incurred during months 3 and 4 if the inventory
at the beginning of month 3 is i. - For each possible production level x during month
3, the total cost during months 3 and 4 is - Therefore,
- x must be a member of 0, 1, 2, 3, 4, 5, and x
must satisfy
56Solution Procedure
57Solution Procedure
58Solution Procedure
59Solution Procedure
- Stage 2 The cost f2(i) is the minimum cost
incurred during months 2 and 3 if the inventory
at the beginning of month 2 is i. - For each possible production level x during month
2, the total cost during months 2 and 3 is -
- Therefore,
- x must be a member of 0, 1, 2, 3, 4, 5, and x
must satisfy
60Solution Procedure
61Solution Procedure
62Solution Procedure
63Solution Procedure
- Stage 1 The cost f1(i) is the minimum cost
incurred during months 1 and 2 if the inventory
at the beginning of month 1 is i. - For each possible production level x during month
2, the total cost during months 1 and 2 is - Therefore,
- x must be a member of 0, 1, 2, 3, 4, 5, and x
must satisfy
64Solution Procedure
65Solution Procedure
66Optimal Schedule
- Since our initial inventory is 0 units, the cost
for the four months will be f1(0) . - To attain f1(0), we must produce x1( ) unit
during month 1. - The inventory of month 2 will be -
. Thus we should produce x2( ) units. - At month 3, our inventory will be -
. Hence, during month 3, we need to produce x3(
) units. - At month 4 will begin with -
units on hand. Thus, x4( ) units should be
produced during month 4.
67Optimal Schedule
- In summary, the optimal production schedule
incurs a total cost of and produces unit
during month 1, units during month 2, units
during month 3, and units during month 4.
68General Resource Allocation Problem
- Suppose we have w units of resource, and T
activities to which the resource can be
allocated. - Activity t is implemented at a level xt, then gt
(xt) units of the resource are used by activity
t, and a benefit rt (xt) is obtained.
69General Resource Allocation Problem
- Define ft (d) to be the maximum benefit that can
be obtained from activity t, t1,, T if d units
of the resource can be allocated to activities t,
t1,, T. - where xt must be a nonnegative integer
satisfying gt (xt)?d.
70Knapsack Problem
- Consider the following Knapsack problem
- Three different type of item can be used to fill
in 10-lb knapsack. - Want to use dynamic programming to solve it.
- Same procedure of resource allocation problem.
71Solution Procedure
- Stage 3
- where 5x3 ? d and is a nonnegative integer.
- f3(10) and x3(10)
- f3(9)f3(8)f3(7)f3(6)f3(5)
- with x3(9)x3(8)x3(7)x3(6)x3(5)
- f3(4)f3(3)f3(2)f3(1)f3(0)
- with x3(4)x3(3)x3(2)x3(1)x3(0)
-
72Solution Procedure
- Stage 2
- where 3x2 ? d and is a nonnegative integer.
73Solution Procedure
74Solution Procedure
- Stage 2
- ? f1( ) and x1( )
- f2( ) and x2( )
- f3( ) and x3( )
75Alternative Procedure for KP
- Suppose g(w) is the maximum benefit that can be
gained for a w-lb knapsack. - Let bj be the benefit earned from a single type j
item, and wj is the weight of a single type j
item.
76Alternative Procedure for KP
- To fill a w-lb knapsack optimally, we must begin
by putting some type of item into the knapsack. - If we put a type j item into a w-lb knapsack, the
best we can do is earn bj(best we can do from
(w-wj)-lb knapsack). - And type j item can be placed into a w-lb
knapsack only if wj?w. - Define x(w) to be any type of item that attains
the maximum benefit and x(w) 0 to mean that no
item can fit into a w-lb knapsack.
77Alternative Procedure for KP
- g(0)g(1)g(2)0, and x(0)x(1)x(2)0.
- g(3) and x(3) .
78Equipment Replacement Problem
- An auto repair shop always needs to have an
engine analyzer available. - A new engine analyzer costs 1000.
- The cost of maintaining an analyzer during i-th
year operation is m160, m280, m3120. - An analyzer may be kept for 1, 2, or 3 years, and
after that it may be traded in for a new one. - The trade in price (salvage value si), for an
i-year-old analyzer is s1800, s2600,
s3500. - The shop wants to determine optimal policy for
replacement during the next 5 years..
79Solution Procedure
- Define g(t) to be the minimum net cost incurred
from year t until year 5 given that a new machine
has been replaced. - Define x to be the time at which the replacement
occurs. - Define ctx to be the net cost of purchasing a
machine at time t and operating it until time x. - where t1? x ? t3and x ? 5, g(5)0.
80Solution Procedure
- The net costmaintenance costs replacement
costs salvage value.
81Solution Procedure
82Optimal Schedule
- The machine purchase at time 0 should replace
machine at time 1 or time 3. - If replace at time 1, the new time 1 machine may
be trade in at time 2 or time 4 - If the time 1 machine trade in at time2, the new
machine should keep until time 5. - And so on.
- The replacement policies are
- (1) trade in at time 1, 2, and 5.
- (2) trade in at time 1, 4, and 5.
- (3) trade in at time 3, 4, and 5.