Parallelizing C Programs Using Cilk - PowerPoint PPT Presentation

About This Presentation
Title:

Parallelizing C Programs Using Cilk

Description:

Parallelizing C Programs Using Cilk Mahdi Javadi Cilk Language Cilk is a language for multithreaded parallel programming based on C. The programmer should not worry ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 13
Provided by: Mary4150
Category:

less

Transcript and Presenter's Notes

Title: Parallelizing C Programs Using Cilk


1
Parallelizing C Programs Using Cilk
  • Mahdi Javadi

2
Cilk Language
  • Cilk is a language for multithreaded parallel
    programming based on C.
  • The programmer should not worry about scheduling
    the computation to run efficiently.
  • There are three additional keywords cilk, spawn
    and sync.

3
Example Fibonacci
  • Int fib (int n)
  • int x, y
  • if (nlt2) return n
  • x fib (n-1)
  • y fib (n-2)
  • return xy

cilk Int fib (int n) int x, y if (nlt2)
return n x spawn fib (n-1) y
spawn fib (n-2) sync return
xy
4
Performance Measures
  • Tp execution time on P processors.
  • T1 is called work.
  • T8 is called span.
  • Obvious lower bounds
  • Tp T1/P
  • Tp T8
  • p T1/T8 is called parallelism. Using more than p
    processors makes little sense.

5
Cilk Compiler
  • The file extension should be .cilk.
  • Example
  • gt cilkc -O3 fib.cilk -o fib
  • To find the 30th Fibonacci number using 4 CPUs
  • gt fib --nproc 4 30
  • To collect timings of each processor and compute
    the span (not efficient)
  • gt cilkc -cilk-profile -cilk-span -O3 fib.cilk
    -o fib

6
Example Matrix Multiplication
  • Suppose we want to multiply two n by n matrices
  • We can recursively formulate the problem
  • i.e. one n by n matrix multiplication reduces to
  • 8 multiplications and for additions of (n/2) by
    (n/2) submatrices.

7
Multiplication Procedure
  • Mult(C, A, B, n)
  • if (n 1) C1,1 A1,1.B1,1
  • else
  • spawn Mult(C11,A11,B11,n/2)
  • spawn Mult(C22,A21,B12,n/2)
  • spawn Mult(T11,A12,B21,n/2)
  • spawn Mult(T22,A22,B22,n/2)
  • sync
  • Add(C,T,n)

8
Addition Procedure
  • Add(C,T,n)
  • if (n 1) C1,1 C1,1T1,1
  • else
  • spawn Add(C11,T11,n/2)
  • spawn Add(C22,T22,n/2)
  • sync
  • T1 (work) for addition O(n2).
  • T8(span) for addition O(log(n)).

9
Complexity of Multiplication
  • We know that matrix multiplication is O(n3) hence
    T1 (work) for multiplication O(n3).
  • T8 M8(n) M8(n/2) O(log(n)) O(log2(n)).
  • p T1 / T8 O(n3) / O(log2(n)).
  • To multiply 1000 by 1000 p 107 ( a lot of CPUs
    !!!)

10
Discrete Fourier Transform
  • DFT(n,w,p,)
  • ...
  • t w2 mod p
  • DFT(n/2,t,p,)
  • DFT(n/2,t,p,)
  • w1 1
  • for (i 0 i lt n/2 i)
  • ai
  • w1 w1.w mod p

cilk DFT(n,w,p,) ... t w2 mod p spawn
DFT(n/2,t,p,) spawn DFT(n/2,t,p,) sync
spawn ParCom(n,a,p,1,) cilk
ParCom(n,a,p,m,) if (n lt 512) spawn
ParCom(n/2,a,p,1,) m m . wn/2 mod p spawn
ParCom(n/2,an/2,p,m,) sync
11
Complexity of ParCom
  • The sequential combining does n/2 multiplication.
  • T8 (span) for ParCom
  • T8(n) T8(n/2) O(log(n)) T8(n)
    O(log2(n)).
  • p O(n/log2(n)).
  • We run FFT on stan which has 4 CPUs.
  • Thus p gt 4 does not make sense, so we cut off
    the parallelism at some level of recursion to
    speed up the program.

12
Timings
  • Sequential FFT 123789 (ms)

processors Par time (ms) Speed up
4 32837 3.77
3 44315 2.79
2 66262 1.87
1 124006 0.998
Write a Comment
User Comments (0)
About PowerShow.com