Parallel programming paradigms - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Parallel programming paradigms

Description:

Express coarse-grain parallelism in applications in order to utilize multiple ... to express the parallelism ... Encapsulate parallel parts in functions. ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 39
Provided by: xyu
Category:

less

Transcript and Presenter's Notes

Title: Parallel programming paradigms


1
Parallel programming paradigms
  • Express coarse-grain parallelism in applications
    in order to utilize multiple processors/cores in
    a parallel and distributed system.
  • Task level parallelism/thread level parallelism
    versus instruction level parallelism.
  • Mechanism to express the parallelism (execution
    model)
  • Shared address space / distributed address space
    (memory model)
  • Implicit communication / explicit communication
  • General purpose /special purpose

2
Parallel Programming models
  • What is a programming model?
  • An abstract virtual machine.
  • A view of data and execution
  • The logical interface between architecture amd
    applications.

3
Parallel Programming Models
  • Why programming model?
  • Decouple applications and architectures
  • Write applications that run effectively across
    architectures.
  • Design new architectures that effectively support
    legacy applications.
  • Programming model design considerations
  • Expose model architecture features
  • Maintain ease of use

4
Language features for supporting parallel
programming models
  • Explicitly concurrent languages languages with
    parallel/concurrent constructs.
  • UPC, Java, Ada
  • Compiler-supported extensions annotations are
    added to indicate the parallelism in a sequential
    program. The compiler uses the annotations and
    automatically generates parallel code.
  • HPF, Cilk, OpenMP
  • Library packages outside the language.
  • Pthreads, MPI

5
Common parallel programming models
  • Shared Memory (pthreads)
  • Multiple threads work on shared memory
  • Message Passing (MPI)
  • Multiple processes work on independent memory,
    use explicit communication to obtain information
    about other processes.
  • Data parallel (HPF)
  • Distributed data across different nodes, each
    node works on its own data.
  • Opposite to the task parallelism.
  • Shared memory data parallel (OpenMP)
  • Partitioned shared memory (UPC)
  • Hybrid OpenMP MPI
  • Remote procedure call

6
Programming models
7
Pthreads A shared memory programming model
  • POSIX standard shared memory multithreading
    interface.
  • Not just for parallel programming, but for
    general multithreaded programming
  • Provide primitives for thread management and
    synchronization.
  • Threads are commonly associated with shared
    memory architectures and operating systems.
  • Necessary for unleashing the computing power of
    SMT and CMP processors.
  • Making it easy and efficient is very important at
    this time.

8
Pthreads execution model
  • A single process can have multiple, concurrent
    execution paths.
  • a.out creates a number of threads that can be
    scheduled and run concurrently.
  • Each thread has local data, but also, shares the
    entire resources (global data) of a.out.
  • Any thread can execute any subroutine at the same
    time as other threads.
  • Threads communicate through global memory.

9
Fork-join model for executing threads in an
application
Master thread
Fork
Parallel region
Join
10
What does the developer have to do?
  • Decide how to decompose the computation into
    parallel parts.
  • Create and destroy threads to support the
    decomposition
  • Add synchronization to make sure dependences are
    covered.

11
Creation
  • Thread equivalent of fork()
  • int pthread_create(
  • pthread_t thread,
  • pthread_attr_t attr,
  • void (start_routine)(void ),
  • void arg
  • )
  • Returns 0 if OK, and non-zero (gt 0) if error.

12
Termination
  • Thread Termination
  • Return from initial function.
  • void pthread_exit(void status)
  • Process Termination
  • exit() called by any thread
  • main() returns

13
Waiting for child thread
  • int pthread_join( pthread_t tid, void status)
  • Equivalent of waitpid()for processes

14
Detaching a thread
  • The detached thread can act as daemon thread
  • The parent thread doesnt need to wait
  • int pthread_detach(pthread_t tid)
  • Detaching self
  • pthread_detach(pthread_self())

15
Example of thread creation
16
General pthread structure
  • A thread is a concurrent execution of a function
  • The threaded version of the program must be
    restructured such that the parallel part forms a
    separate function.

17
Matrix Multiply
  • For (I0 Iltn I)
  • for (j0 jltn j)
  • cIj 0
  • for (k0 kltn k)
  • cIj cIj aIk
    bkj

18
Parallel Matrix Multiply
  • All I- or j-iterations can be run in parallel
  • If we have p processors, n/p rows to each
    processor
  • Corresponds to partitioning I-loop

19
Matrix Multiply parallel part
  • void mmult(void s)
  • int slice (int ) s
  • int from slice n / p
  • int to ((slice 1)n/p)
  • for (Ifrom Iltto I)
  • for (j0 jltn j)
  • cIj 0
  • for (k0 kltn k)
  • cIj aIkbkj

20
Matrix Multiply Main
  • Int main()
  • pthread_t thrdp
  • int parap
  • for (I0 Iltp I)
  • paraI I
  • pthread_create(thrdI, NULL, mmult,
    (void )paraI)
  • for (Ifrom Iltto I)
  • pthread_join(thrdI, NULL)

21
General Program Structure
  • Encapsulate parallel parts in functions.
  • Use function arguments to parametrize what a
    particular thread does.
  • Call pthread_create() with the function and
    arguments, save thread identifier returned.
  • Call pthread_join() with that thread identifier

22
Pthreads synchronization
  • Create/exit/join
  • Provides coarse grain synchronizations
  • Requires thread creation/destruction
  • Need for finer-grain synchronization
  • Mutex locks, condition variables, semaphores

23
Mutex lock for mutual exclusion
  • int counter 0
  • void thread_func(void arg)
  • int val
  • / unprotected code why? /
  • val counter
  • counter val 1
  • return NULL

24
Mutex locks lock
  • pthread_mutex_lock(pthread_mutex_t mutex)
  • Tries to acquire the lock specified by mutex
  • If mutex is already locked, then the calling
    thread blocks until mutex is unlocked.

25
Mutex locks unlock
  • pthread_mutex_unlock(pthread_mutex_t mutex)
  • If the calling thread has mutex currently locked,
    this will unlock the mutex.
  • If other threads are blocked waiting on this
    mutex, one will unblock and acquire mutex.
  • Which one is determined by the scheduler.

26
Mutex example
  • int counter 0
  • ptread_mutex_t mutex PTHREAD_MUTEX_INITIALIZER
  • void thread_func(void arg)
  • int val
  • / protected by mutex /
  • Pthread_mutex_lock( mutex )
  • val counter
  • counter val 1
  • Pthread_mutex_unlock( mutex )
  • return NULL

27
Condition Variable for signaling
  • Think of Producer consumer problem
  • Producers and consumers run in separate threads.
  • Producer produces data and consumer consumes
    data.
  • Producer has to inform the consumer when data is
    available
  • Consumer has to inform producer when buffer space
    is available

28
Condition variables wait
  • Pthread_cond_wait(pthread_cond_t cond,
    pthread_mutex_t mutex)
  • Blocks the calling thread, waiting on cond.
  • Unlock the mutex
  • Re-acquires the mutex when unblocked.

29
Condition variables signal
  • Pthread_cond_signal(pthread_cond_t cond)
  • Unblocks one thread waiting on cond.
  • The scheduler determines which thread to unblock.
  • If no thread waiting, then signal is a no-op

30
Producer consumer program without condition
variables
31
  • / Globals /
  • int data_avail 0
  • pthread_mutex_t data_mutex PTHREAD_MUTEX_INITIAL
    IZER
  • void producer(void )
  • Pthread_mutex_lock(data_mutex)
  • Produce data
  • Insert data into queue
  • data_avail1
  • Pthread_mutex_unlock(data_mutex)

32
  • void consumer(void )
  • while( !data_avail )
  • / do nothing keep looping!!/
  • Pthread_mutex_lock(data_mutex)
  • Extract data from queue
  • if (queue is empty)
  • data_avail 0
  • Pthread_mutex_unlock(data_mutex)
  • consume_data()

33
Producer consumer with condition variables
34
  • int data_avail 0
  • pthread_mutex_t data_mutex PTHREAD_MUTEX_INITIAL
    IZER
  • pthread_cont_t data_cond PTHREAD_COND_INITIALIZE
    R
  • void producer(void )
  • Pthread_mutex_lock(data_mutex)
  • Produce data
  • Insert data into queue
  • data_avail 1
  • Pthread_cond_signal(data_cond)
  • Pthread_mutex_unlock(data_mutex)

35
  • void consumer(void )
  • Pthread_mutex_lock(data_mutex)
  • while( !data_avail )
  • / sleep on condition variable/
  • Pthread_cond_wait(data_cond, data_mutex)
  • / woken up /
  • Extract data from queue
  • if (queue is empty)
  • data_avail 0
  • Pthread_mutex_unlock(data_mutex)
  • consume_data()

36
A note on condition variables
  • A signal is forgotten if there is no
    corresponding wait that has already occurred.
  • If you want the signal to be remembered, use
    semaphores.

37
Semaphores
  • Counters for resources shared between threads.
  • Sem_wait(sem_t sem)
  • Blocks until the semaphore vale is non-zero
  • Decrements the semaphore value on return.
  • Sem_post(sem_t sem)
  • Unblocks the semaphore and unblocks one waiting
    thread
  • Increments the semaphore value otherwise

38
Pipelined task parallelism with semaphore
  • P1 for (I0 Iltnum_pics, read(in_pic) I)
  • int_pic_1I trans1(in_pic)
  • sem_post(event_1_2I)
  • P2 for (I0 Iltnum_pics I)
  • sem_wait(event_1_2I)
  • int_pic_2I trans2(int_pic_1I)
  • sem_post(event_2_3I)
Write a Comment
User Comments (0)
About PowerShow.com