Title: CPS Scheduling Policy Design with Utility and Stochastic Execution*
1CPS Scheduling Policy Design with
Utility and Stochastic Execution
- Chris Gill
- Associate Professor
- Department of Computer Science and Engineering
- Washington University, St. Louis, MO, USA
- cdgill_at_cse.wustl.edu
Georgia Tech CPS Summer School Atlanta, GA, June
23-25, 2010
Research supported in part by NSF grants
CNS-0716764 (Cybertrust) and CCF-0448562 (CAREER)
and driven by numerous contributions from
post-doctoral student Robert Glaubius doctoral
student Terry Tidwell undergraduate students
Braden Sidoti, David Pilla, Justin Meden, Carter
Bass, Eli Lasker, Micah Wylde, and Cameron Cross
and Prof. William D. Smart
2Washington University in St. Louis
3Dept. of Computer Science and Engineering
- 24 faculty members and 70 Ph.D. students working
in - real-time and embedded systems, robotics,
graphics, computer vision, HCI, AI,
bioinformatics, networking, high-performance
architectures, chip multi-processors, mobile
computing, sensor networks, optimization - PhD students are fully funded, and we emphasize
individual mentorship and interdisciplinary work - Recent graduates are on faculty at U. Mass,
UT-Austin, Rochester, RIT, CMU, Michigan St., and
UNC-Charlotte - Graduate study application deadline for Fall 2011
is January 15 http//www.cse.wustl.edu
4Why Pursue CPS Research?
- Systems are increasingly being designed to
interact with the physical world - This trend offers compelling new research
challenges that motivate our work - Consider for example the domain of mobile robotics
my name is
Lewis Media and Machines Laboratory Washington
University in St. Louis
5Why is This Work CPS Research?
- As in many other systems, resources must be
shared among competing tasks - Fail-safe modes may reduce consequences of
resource-induced timing failures, but precise
scheduling matters - The physical properties of some resources
motivate new models and techniques
my name is
Lewis Media and Machines Laboratory Washington
University in St. Louis
6Which Problem Features are Interesting?
- Sharing e.g., a camera between navigation and
image capture tasks - (1) in general doesnt allow efficient
preemption - (2) involves stochastically distributed
durations - Also important in general
- (3) scalability (many tasks sharing such a
resource) - (4) task utility/availability
Lewis Media and Machines Laboratory Washington
University in St. Louis
7System Model Assumptions
- We model time as being discrete
- E.g., based on some multiple of the Linux jiffy
- States and scheduling decisions align with those
quanta - Separate tasks require a shared resource
- Access is mutually exclusive (a task binds the
resource) - Binding durations are independent and
non-preemptive - Tasks duration distributions are known (or
learned 1) - Each task is always available to run (relaxed in
part III) - Goal precise resource allocation among tasks 5
- E.g., 21 utilization share targets for tasks A
vs. B - Need a deterministic scheduling policy (decides
which task gets the resource when) that best fits
that goal
8Part I
Utilization State Spaces and Markov Decision
Processes
9Towards Optimal Policies
- A Markov decision process (MDP) is a 4-tuple
(X,A,C,T) that matches our system model well - X a finite set of states (e.g., utilizations of
8 vs. 17 quanta) - A the set of actions (giving resource to a
particular task) - C cost function for taking an action in a state
- T transition function (probability of moving
from one state to another state based on the
action chosen) - Solving the MDP gives a policy that maps each
state to an action to minimize long term expected
costs - However, to do that we need a finite set of
states
10Share Aware Scheduling
- A system state cumulative resource usage of each
task - Dispatching a task moves the system
stochastically through the state space according
to that tasks duration
(8,17)
11Share Aware Scheduling
- Utilization target induces a ray ?u??0 through
the state space - Encode each states goodness (relative to the
share) as a cost - Require that costs grow with distance from
utilization ray
?u
u(1/3,2/3)
12Transition Structure
- Transitions are state-independent
- I.e., relative distribution over successor states
is the same in each state
13Cost Structure
- States along same line parallel to the
utilization ray have equal cost
14Equivalence Classes
- Transition and cost structure thus induce
equivalence classes - Equivalent states have the same optimal long-term
cost and policy!
15Periodicity
- Periodic structure allows us to represent each
equivalence class with a single exemplar 4
16Wrapping the State Model
- Remove all but one exemplar from each equivalence
class - Actions and costs remain unchanged
- Remap any dangling transitions (to removed
states) to the corresponding exemplar
(0,0)
17Truncating the State Model
- Inexpensive states are nearer the utilization
target - Good policies should keep costs small
- Can truncate the state space by bounding sizes of
costs considered
18Bounding the State Model
- Map any dangling transitions produced by
truncation, to a high-cost absorbing state - This guarantees that we will be able to find
bounded-cost policies if they exist - Bounded costs also guarantee bounded deviation
from the resource share (precision)
19A Scheduling Policy Design Approach
- Iteratively increase the bounds and re-solve the
resulting MDP - As the bounds increase, the bounded model
solution converges towards the optimal wrapped
model policy
20Automating Model Discovery
- ESPI Expanding State Policy Iteration 3
- Start with a policy that only reaches finitely
many states from (0,,0). - E.g., always run the most underutilized task.
- Enumerate enough states to evaluate and improve
that policy - If policy can not be improved, stop
- Otherwise, repeat from (2) with newly improved
policy
21Policy Evaluation Envelope
- Enumerate states reachable from the initial state
- Explore state space breadth-first under the
current policy, starting from the initial state
(0,0)
22Policy Improvement Envelope
- Consider alternative actions
- Close under the current policy using
breadth-first expansion - Evaluate and improve the policy within this
envelope
23ESPI Termination
- As long as the initial policy has finite closure,
each ESPI iteration terminates (this is satisfied
by starting with the heuristic policy that always
runs the most underutilized task) - Policy strictly improves at each iteration
- Anecdotally, ESPI terminates on all of the task
scheduling MDPs to which we have applied it
24Comparing Design Methods
- Policy performance is shown normalized and
centered on the ESPI solution data - Larger bounded state models yield the ESPI
solution
25Part II
Scalability and Approximation Techniques
26What About Scalability?
- MDP representation allows consistent
approximation of the optimal scheduling policy - Empirically, bounded model and ESPI solutions
appear to be near-optimal - However, approach scales exponentially in number
of tasks so while it may be good for (e.g.)
sharing an actuator, it wont apply directly to
larger task sets
27What our Policies Say about Scalability
- To overcome limitations of MDP based approach, we
focus attention on a restricted class of
appropriate scheduling policies - Examining the policies produced by the MDP based
approach gives insights into choosing (and into
parameterizing) appropriate policies 2
28Two-task MDP Policy
- Scheduling policies induce a partition on a 2-D
state space with boundary parallel to the share
target - Establish a decision offset d to identify the
partition boundary - Sufficient in 2-D, but what about in higher
dimensions?
29Time Horizons Suggest a Generalization
Htx x1x2xnt
?u
(0,0,2)
?u
(0,2,0)
H0
H1
(0,0)
(2,0,0)
H0
H1
H2
H3
H4
H2
30Three-task MDP Policy
t 10
t 20
t 30
- Action partitions meet along a decision ray that
is parallel to the utilization ray - Action partitions are roughly cone-shaped
31Parameterizing a Partition
- Specify a decision offset at the intersection of
partitions - Anchor action vectors at the decision offset to
approximate partitions - A conic policy selects the action vector best
aligned with the displacement between the query
state and the decision offset
a1
a2
a3
32Conic Policy Parameters
- Decision offset d
- Action vectors a1,a2,,an
- Sufficient to partition each time horizon into n
regions - Allows good policy parameters to be found through
local search
33Comparing Policies
- Policy found by ESPI (for small numbers of
tasks) - pESPI(x) chooses action at state x per solved
MDP - Simple heuristics (for all numbers of tasks)
- punderused(x) runs the most underutilized task
- pgreedy(x) minimizes immediate cost from state
x - Conic approach (for all numbers of tasks)
- pconic(x) selects action with best aligned
action vector -
34Policy Comparison on a 4 Task Problem
- Task durations random histograms over 2,32
- 100 iterations of Monte Carlo conic parameter
search - ESPI outperforms, conic eventually approximates
well
35Policy Comparison on a Ten Task Problem
Repeated the same experiment for 10 tasks ESPI is
omitted (intractable here) Conic outperforms
greedy underutilized heuristics
36Comparison with Varying s of Tasks
100 independent problems for each (avg, 95
conf) ESPI only tractable through all 2 and 3
task cases Conic approximates ESPI, then
outperforms others
37Part III
Expanding our Notions of Utility and Availability
38Time-Utility Functions
Previously, utility was proximity to utilization
target now we let tasks utility and job
availability vary
time-utility function (TUF) name
period boundary
termination time
termination time
period boundary
Time
Availability variable qi is defined over 0,1
0, tmi/pi or 0,1 tmi/pi
39Utility Execution ? Utility Density
A tasks time-utility function and its execution
time distribution (e.g., Di(1) Di(2) 50)
give a distribution of utility for scheduling the
task
40Actions and State Space Structure
- State space can be more compact here than in
parts I and II dimensions are task availability
(e.g., over (q1, q2)) vs. time - Can wrap the state space over the hyper-period of
all tasks (e.g., D1(1) D2(1) 1 tm1 p1
4 tm2 p2 2) - Scheduling actions induce a transition structure
over states (e.g., idle action do nothing
action i run task i)
action 2
action 1
idle action
time
time
time
41Reachable States, Successors, Rewards
States with the same task availability and the
same relative position within the hyper-period
have the same successor state and reward
distributions
reachable states
42Evaluation
(downward step)
Different TUF shapes are useful to characterize
tasks utilities (e.g., deadline-driven,
work-ahead, jitter-sensitive cases) We chose
three representative shapes, and randomized their
key parameters ui, tmi, cpi (we also randomized
80/20 task load parameters li, thi, wi)
(linear drop)
termination times
utility bounds
(target sensitive)
critical points
43How Much Better is Optimal Scheduling?
Greedy (Generic Benefit) vs. Optimal (MDP)
Utility Accrual
2 tasks
3 tasks
TUF nuances matter e.g., work conserving
approach degrades target sensitive policy
4 tasks
5 tasks
P. Li, PhD Dissertation, VA Tech, 2004
44Divergence Increases with of Tasks
Note we can solve 5 task MDPs for periodic task
sets (smaller state spaces scalability is an
ongoing issue)
45Conclusions
- We have developed new techniques for designing
non-preemptive scheduling policies for tasks with
stochastic resource usage durations - MDP-based methods are effective for 2 or 3 task
utilization share problems (e.g., for an
actuator) - Conic policy performance is competitive with ESPI
for smaller problems, and for larger problems
improves on the underutilized and greedy policies - Ongoing work is focused on identifying and
evaluating important categories of time-utility
functions and tailoring our approach to address
their nuances
46Publications
- 1 R. Glaubius, T. Tidwell, C. Gill, and W.D.
Smart, Real-Time Scheduling via Reinforcement
Learning, UAI 2010 - 2 R. Glaubius, T. Tidwell, B. Sidoti, D. Pilla,
J. Meden, C. Gill, and W.D. Smart, Scalable
Scheduling Policy Design for Open Soft Real-Time
Systems, RTAS 2010 (received Best Student Paper
award) - 3 R. Glaubius, T. Tidwell, C. Gill, and W.D.
Smart, Scheduling Policy Design for Autonomic
Systems, International Journal on Autonomous and
Adaptive Communications Systems, 2(3)276-296,
2009 - 4 R. Glaubius, T. Tidwell, C. Gill, and W.D.
Smart, Scheduling Design and Verification for
Open Soft Real-Time Systems, RTSS 2008 - 5 T. Tidwell, R. Glaubius, C. Gill, and W.D.
Smart, Scheduling for Reliable Execution in
Autonomic Systems, ATC 2008
47Thanks, and hopeto see you at CPSWeek 2011!
- Chris Gill Associate Professor of
Computer Science and Engineering