Title: Scheduling for Reliable Execution in Autonomic Systems
1Scheduling for Reliable Execution in Autonomic
Systems
- Terry Tidwell, Robert Glaubius,
- Christopher Gill, and William D. Smart
- ttidwell, rlg1, cdgill, wds_at_cse.wustl.edu
- Department of Computer Science and Engineering
- Washington University, St. Louis, MO, USA
5th International Conference on Autonomic and
Trusted Computing (ATC-08) June 23-25, 2008,
Oslo, Norway
Research supported in part by NSF awards
CNS-0716764 (Cybertrust) and CCF-0448562 (CAREER)
2Motivation Autonomic Systems
Example Autonomic Systems (1) Self-Maintaining
Clusters, (2) Lewis (WUSTL MM Lab)
- Interact with variable environment
- Varying degrees of autonomy
- Performance is deadline sensitive
- Many activities must run at once
- Device interrupt handing, computation
- Comm w/ other systems/operators
- Need reliable activity execution
- Scheduling with shared resources and competing,
variable execution times - How to guarantee utilizations?
Wireless Communication
Remote Operator Station (for all but full
autonomy)
3System Model
- Threads of execution run on a shared resource
- Require mutually exclusive access (e.g., to a
CPU) to run - Each thread binds the resource when it runs
- A thread binds resource for a duration then
releases it - Modeled using discrete variables that count time
quanta - Variable execution times with known distributions
- We assume that each threads run-time
distribution is known and bounded, and
independent of the others - Non-preemptive scheduler (repeats perpetually)
- Scheduler chooses which thread to run (based on
policy) - Scheduler dispatches thread which runs until it
yields - Scheduler waits until the thread releases the
resource
4Scheduling Policy Design Considerations
- We summarize system state as a vector of integers
- Representing thread utilizations
- Threads run times come from known, bounded
distributions - Scheduling a thread changes the system state
- Utilization changes after the thread runs based
on its run time - State transition probabilities are based the run
time distributions - This forms a basis for policy design and
optimization
probability
time
probability
time
5From Run Times to a Scheduling MDP
- We model thread scheduling decisions as a Markov
Decision Process (MDP) based on thread run times - The MDP is given by 4-tuple (X,A,R,T)
- X the set of process states
- Correspond to thread utilization states
- A the set of actions
- I.e., scheduling a particular thread
- R reward function for taking an action in a
state - Expected utility of taking that action
- Distance of the next state(s) from a desired
utilization (vector) - T transition function
- Encodes the probability of moving from one state
to another state for each action - Solve MDP optimal (per accumulated reward)
policy
6Bounding the Utilization State Space
- To bound the state space, we used a system
termination notion in our initial approach - Produces absorbing states where utilization stays
same - E.g., 0,3 and 1,2 etc.
- Drawbacks
- Artifacts, limited horizon
- Recent advances now allow us to remove this
restriction - See the project web site (URL given at the end of
this talk)
- The optimal policy differed from our naïve
expectations (next 2 slides)
7Generated Scheduling Policy Example
- Policy (top figure) is obtained from
distributions of thread run times target
utilization vector - Horizontal axis thread 1 util.
- Vertical axis thread 2 util.
- Dark gray schedule thread 1
- Moves state horizontally, to right
- Light gray schedule thread 2
- Moves state vertically, upward
- Notice darker ray in the middle
- Shows desired target utilization
- A vector in the number of threads
- At high level looks like what wed expect
8Scheduling Policy Artifacts
- Near the origin (top left)
- Dark gray region overlaps the target utilization
ray - Due to thread 1 distribution being shifted more
toward 0 - More likely to land nearer target utilization if
thread 1 is scheduled - Near the horizon (top right)
- Dark gray region first bulges out
- Due to low variance for thread 1
- , and then converges back to ray
- Due to absorbing state proximity
9Verification State Space
- We can also generate a verification state space
- A verification state (box) combines utilization
state (circle) subsets reachable on a scheduling
action - Transitions condensed from the utilization state
space - Note that verification states also often overlap
- E.g., utilization state 2,1 is in two
verification states
10Verification State Space Size and Cost
- State space exponential in the number of threads
and in the history size (from termination time) - Scheduling reduces cost but only delays explosion
- Work to reduce is ongoing
11Related Work
- Reference monitor approaches
- Interposition architectures
- E.g., Ostia user/kernel-level (Garfinkel et al.)
- Separation kernels
- E.g., ARINC-653, MILS (Vanfleet et al.)
- Scheduling policy design
- Hierarchical scheduling
- E.g., HLS and its extensions (Regehr et al.)
- E.g., Group scheduling (Niehaus et al.)
- Quasi-cyclic state space reduction
- E.g., Bogor (Robby et al.)
12Concluding Remarks
- MDP approach maintains rational scheduling
control - Even when thread run times vary stochastically
- Encodes rather than presupposes utilizations
- Allows policy verification over utilization
states - Ongoing and Future Work
- State space reduction via quasi-cyclic structure
- Verification over continuous/discrete states
- Kernel-level non-bypassable policy enforcement
- Automated learning to discover scheduling MDPs
- Project web page
- Supported by NSF grant CNS-0716764
- http//www.cse.wustl.edu/cdgill/Cybertrust
-