Title: Coordinated Scheduling of Jobs in Distributed Systems
1Coordinated Scheduling of Jobs in Distributed
Systems
- Dr. Dimitrios S. Nikolopoulos
- CSL/UIUC
2Outline
- Motivation for coordinated scheduling
- Gang scheduling
- Coscheduling with implicit communicated
information - Dynamic coscheduling
- Dynamic coscheduling with tight memory resources
3Motivation
- Processes in distributed systems need to
communicate - Parallel jobs
- Client/server
- Peer-to-peer
- What happens if we send a message to a process
which is not scheduled to receive the message ?
4Uncoordinated scheduling
running
P1 sends to P2
blocked
P2 not running
unutilized time
5Cascaded effect
running
P1 sends to P2
blocked
P2 not running
P1 not running
P2 not running
6Purpose of coscheduling
- Try to schedule simultaneously communicating
processes - If for any reason the peer is not scheduled
release the processors/resources for other jobs
7Ideal coscheduling
running
P1 sends to P2
blocked
P1 not running, run another job
P2 not running run another job
P1 not running
8Forms of coscheduling
- Explicit coscheduling a.k.a. gang scheduling
- Implicit coscheduling
- Dynamic coscheduling
9Gang scheduling
job1
T01
job2
T02
T03
job3
T04
job1
T05
job2
T06
job3
T07
job1
T08
job2
T09
job3
T010
job1
T011
job2
T012
job3
job1
T013
job2
T014
T015
job3
10Gang scheduling
- Good for tightly synchronized programs
- Inappropriate for client server or peer-to-peer
- Hard to implement event on tightly coupled
multiprocessors - Requires centralized controller
- Even distributed or hierarchical control schemes
for gang scheduling dont scale well enough for
large systems - Requires time quanta in the order of 10s of
seconds
11 Coscheduling with implicit information
- Try to coschedule processes controlled by
different controllers (i.e. operating systems)
using implicit information available locally - Types of implicit information
- The turnaround time of a message
- The frequency of arrivals of messages
12Basic Idea
- Try to infer if your peer is scheduled or not by
checking - The time it takes your peer to respond to a
message - The number of messages sent by/to your peer
recently
13Implicit vs. dynamic coscheduling
- Implicit coscheduling is based on the
responsiveness of the peer - The difficulty of implicit coscheduling
algorithms is to determine how much time a job
should wait for its peer to respond - Dynamic coscheduling is based on message
send/receive frequencies - The difficulty of dynamic coscheduling
14Basic impl. coscheduling algorithm
- Wait for a predefined amount of time
- This time might vary at runtime according to
measurements obtained at runtime, but the way we
compute this time is predefined - If a message does not arrive within the
predefined time interval release the processor
15Waiting Interval
- Simple scenario ping-pong, or protocols with
handshaking - Wait at least for 2l 2c o (turnaround time
for message including overhead) - Competitive solution wait for another 2l2co
- This limits the cost we pay for waiting
needlessly to 4l4c2o
16Competitive waiting
running
l
P1 sends to P2
blocked
co
l
co
17Problems with competitive waiting
- Difficult to compute analytically even for simple
communication patterns - Example Hard to find analytical solution for
barriers (Arpaci-Dusseau, ACM TOCS) - Impossible to compute analytically for
unstructured or complex communication patterns - Personalized, broadcast, shuffles, etc.
- Requires modifications in the communication
libraries and the operating system.
18Dynamic coscheduling
- Proposed to cope with the difficulties of
implicit coscheduling - Implementation
- Computing the waiting interval
- Main idea
- If a process is not scheduled when a message
arrives for that process schedule the process
asap!
19Intrusive dynamic coscheduling
- Upon message arrival schedule the receiving
process - Intrusive because it might mess up the scheduling
queue - Frequently communicating jobs are treated
favorably compared to other non-communicating
jobs
20Less intrusive dynamic coscheduling
- Periodic priority boost
- In each time slice
- Boost the priority of communicating processes
according to the recent message send/receive
history
co
21Benefits of dynamic coscheduling
- Effective coscheduling if the communication
pattern is unstructured, unknown, or hard to
analyze for determining the right waiting
interval - Simple implementation
- You still have to modify the operating system
- But the information/variables you have to access
are already there (message buffers, priority)
22Coscheduling and thrashing prevention
- Adaptive scheduling under memory pressure on
multiprogrammed clusters (NikolopoulosPolychronop
oulos, IEEE/ACM ccGrid02) - partly based on
- Adaptive scheduling under memory pressure on
multiprogrammed SMPs - (NikolopoulosPolychronopoulos, IEEE IPDPS02)
23Thrashing prevention
- Thrashing the situation in which a computer
pages data to/from the disk without doing useful
work - Reason the running programs consume the memory
of the system - Impact severe (slowdowns of a factor of 100x),
because I/O from the disk is hundreds of times
slower than accessing memory
24Adaptive algorithm for thrashing prevention
- Programs prevent thrashing by
- Dynamically detecting paging at memory allocation
points - Backing-off until paging stops
- Memory-hungry jobs make room for memory resident
jobs - Algorithm works for dynamic workloads (random
arrivals and departures of jobs)
25Thrashing prevention on an SMP
26Coscheduling under memory constraints
- If one system of a distributed system thrashes
the effect may be sensed throughout the system
(busy-waiting jobs) - Coscheduling solves the problem of needless
waiting, but not the problem of thrashing. We
either need a large number of jobs to sustain
utilization or prevent thrashing (proposed
solution)
27Combining coscheduling with thrashing prevention
- Scenarios
- Non-communicating jobs that fit in memory -gt
nothing to do - Communicating jobs that fit in memory -gt
coscheduling takes priority - Non-communicating jobs that fit in memory -gt
thrashing prevention takes priority - Communicating jobs that do not fit in memory -gt
coscheduling or thrashing prevention first ?
28Combining coscheduling with thrashing prevention
- If coscheduling takes priority
- Communicating job will consume the message but..
- The system will page
- If thrashing prevention takes priority
- Communicating job will be delayed
- Paging will be avoided
- The priority of the delayed job will be boosted
so as soon as the job fits in memory it will
consume the message - We choose the second option due to the
unpredictable latency of paging
29Details
- Implementation of paging prevention in Linux uses
a shared-memory interface to communicate memory
utilization info - Memory requirements of jobs are estimated
dynamically by intercepting system calls - Dynamic coscheduling requires no more than 10
lines of code added to the Linux scheduler
30Details
- Starvation is avoided by running jobs that are
suspended more than 10 time needed to load their
resident (just a heuristic) - Large jobs are stalled at memory allocation
points, before the beginning of computation (I.e.
once a job establishes its working set it runs
and only jobs submitted later may stall due to
memory pressure)
31Results
32Results