Title: Cilk Presentation CSCI585 Cluster Programming
1Cilk PresentationCSCI585 Cluster Programming
2Outline
- Overview
- Cilk Programming Language
- Program structure
- Memory usage
- Stack Heap Memory
- Shared Memory
- Cilk Features
- Cilk Runtime system
- Multithreaded Computational Model
- Work Stealing
- Cilk Scheduling technique
3Overview
- A programming Language for multithreaded parallel
programming based on ANSI C - Designed for Shared Memory multi Processors
(SMPs) - It is algorithmic in that the runtime systems
scheduler guaranties efficient and predictable
performance - It allows the programmer to concentrate on the
program logic and its runtime system takes care
of the other details as load balancing and
communication protocols - It doesnt use message passing techniques
4Cilk Programming Language Program Structure
- Procedure is the building block of a Cilk
program. - It is like an ordinary C function with new
keywords. - The Cilk keywords are only used within a Cilk
procedure.
include ltstdli.hgt include ltstdio.hgt include
ltcilk.hgt cilk int fib (int n) if (nlt2) return
n else int x,y x spawn fib
(n-1) y spawn fib(n-2)
sync return (xy)
5Cilk Programming LanguageProgram Structure
Cont..
- A Cilk program execution can be viewed as a
directed acyclic graph. - A procedure is considered as a sequence of non
blocking threads represented horizontally in the
graph - Each thread can spawn a new thread as represented
by the vertical lines - Each thread must wait for its children to return
their results in order for it to return.
6Cilk Programming Language Program Structure
Cont..
- The command cilk -02 fib.cilk o fib compiles
the program and outputs the object file ready for
execution - The command fib nproc 4 30 runs the program on
4 processes giving it the argument 30 - Using different flags when you run the program
can generate important statistics like --stats
which generates the time taken for execution by
each processor.
7Cilk Programming Language Stack Heap Memory
- As in C Cilk has two types of memory, Heap
memory and stack memory - In Cilk the stack is called cactus stack and this
is because more than one procedure access it
(parallel) - Each procedure sees it a different view of the
stack - It has the same limitations as the stack in C
i.e. a sibling cant pass a variable that it has
allocated to its parent
8Cilk Programming LanguageShared Memory
- Cilk supports shared memory
- Shared memory can occur when parallel procedures
access global objects or indirectly when we pass
a pointer to a spawned procedure - According to the logic of the code, the use of
shared memory may lead to race conditions - See examples on next slide
9Cilk Programming LanguageShared Memory Cont..
cilk int foo (void) int x 0, y spawn
bar(x) x x 1 sync return
(x) cilk void bar (int px)
px px 1 return
cilk int foo (void) int x 0, y spawn
bar(x) y x 1 sync return
(y) cilk void bar (int px)
printf("d", px 1) return
10Cilk Programming Language Cilk Features
- Locking Cilk provides mutual exclusive locks
through the use of critical sections of the code - Inlets This feature allows you to use the value
returned from a child thread in an expression - Aborting A procedure can cancel unnecessary
spawned work - Interacting with Cilk scheduler This is
important to make use of already allocated memory
instead of using new one (using SYNCHED)
11Cilk Runtime systemMultithreaded Computational
Model
- A Cilk procedure can be viewed as a dag
- This means that there are dependencies between
the different threads - These dependencies form a partial order
permitting many ways to schedule the threads in
the dag - The efficiency of the computation depends on the
way that the dag is mapped to the underlying
processors - Each Cilk program produces a well structured dag
that can be scheduled efficiently
12Cilk Runtime systemWork-Stealing
- The Cilk runtime system implements a scheduling
policy that depends on work-stealing - When a processor runs out of work it randomly
picks another processor and takes work from it - To understand how this works, lets remember what
happens in the computation of a C program - When a call to a child is done the state of the
parent is pushed on the stack - Then when the call to the child is processed, the
state of the parent is popped and computation
continues at the parent level
13Cilk Runtime systemWork-Stealing Cont..
- Work stealing is done using the following
algorithm - Since a Cilk procedure performs serially on one
processor as a c program - Locally, a processor executes procedures in a
serial manner exploring the spawn tree in a depth
first manner - When a child is spawned the state of the parent
is pushed to the stack and the processor starts
working on the child - When another processor request work, it steals
work from the other end of the stack which in
this case will be far from the work area of the
victim processor
14Cilk Runtime systemCilk Scheduling technique
- The Cilk language has two types of schedulers
- Nanoscheduler its job is to schedule procedures
within a single processor - Microscheduler its job is to schedule procedures
on multiple processors
15Cilk Runtime systemNanoscheduler
- The nanoscheduler has to make decisions very
quickly and this is why it is compiled within the
code by the Cilk2c compiler - It is very easy to implement as it converts each
spawn call to its equivalent procedure call and
each synch call to no operation - In order to be used with the microscheduler we
need to keep track of the current scheduling
state - The nanoscheduler uses a deque of frames for this
purpose
16Cilk Runtime systemNanoscheduler Cont..
- A cilk frame is a data structure that can hold
the state of the procedure - A deque is a double ended queue
- A deque is considered as a stack from the
nanoscheduler perspective - The deque is tightly coupled to the C stack with
a one to one correspondence between the frames in
the deque and the activation frames on the c
stack - When a procedure is first called, it initializes
a frame and pushes it on the stack - When a spawn occurs, the state of the current
procedure is saved on the stack - When the procedure completes, its frame is popped
17Cilk Runtime systemMicroscheduler
- The microschedulers job is to schedule
procedures across a fixed set of processors - It is implemented as a randomized work stealing
scheduler - When a processor runs out of work, it picks a
processor randomly and checks if its deque has
frames to be computed - It takes the oldest frame and places the
corresponding procedure on its nanscheduler
18Cilk Runtime systemMicroscheduler Cont..
- The stolen procedure is called the slow version
of the procedure - As being invoked from a generic scheduling
routine, it has a standard interface - It has lots of overheads as
- It has a pointer to a frame as its input variable
instead of having the data directly. It has to
extract the input data - It has to pack the results instead of returning
them directly