Parallel Programming with PThreads - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Parallel Programming with PThreads

Description:

Parallel Programming with PThreads Barrier Execution time of 1000 sequential and logarithmic barriers as a function of number of threads on a 32 processor SGI Origin ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 52
Provided by: Quinn4
Learn more at: http://dna.cs.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: Parallel Programming with PThreads


1
Parallel Programming with PThreads
2
Threads
  • Sometimes called a lightweight process
  • smaller execution unit than a process
  • Consists of
  • program counter
  • register set
  • stack space
  • Threads share
  • memory space
  • code section
  • OS resources(open files, signals, etc.)

3
Threads
4
POSIX Threads
  • Thread API available on many OSs
  • include ltpthread.hgt
  • cc myprog.c o myprog -lpthread
  • Thread creation
  • int pthread_create(pthread_t thread,
    pthread_attr_t attr, void
    (start_routine)(void ), void arg)
  • Thread termination
  • void pthread_exit(void retval)
  • Waiting for Threads
  • int pthread_join(pthread_t th, void
    thread_return)

5
(No Transcript)
6
Thread Issues
  • Timing
  • False Sharing
  • Variables are not shared but are so close
    together, they are within the same cache line.
  • writes to a shared cache line invalidate other
    processes caches.
  • Locking Overhead

7
Thread Timing
  • Scenario 1
  • Thread T1 creates thread T2
  • T2 requires data from T1 that will be placed in
    global memory
  • T1 places the data in global memory after it
    creates T2
  • Assumes T1 will be able to place the data before
    T2 starts or before T2 needs the data
  • Scenario 2
  • T1 creates T2 and needs to pass data to T2
  • T1 gives T2 a pointer to a variable on its stack
  • What happens if / when T1 finishes?

8
Thread Timing
  • Set up all requirements before creating the
    thread
  • It is possible that the created thread will start
    and run to completion before the creating thread
    gets scheduled again
  • Producer Consumer relationships
  • make sure the data is placed before it is needed
    (Dont rely on timing)
  • make sure the data is there before consuming
    (Dont rely on timing)
  • make sure the data lasts until all potential
    consumers have consumed the data (Dont rely
    on timing)
  • Use synchronization primitives to enforce order

9
False Sharing
Variables are not shared but are so close
together, they are within the same cache line.
P1
P2
Read A Write A (invalidate cache line)
Read B Write B (invalidate cache line)
Cache
Cache
Location A
Location B
Cache Line
Memory
10
Effects of False Sharing
11
Synchronization Primitives
  • int pthread_mutex_init( pthread_mutex_t
    mutex_lock, const pthread_mutexattr_t
    lock_attr)
  • int pthread_mutex_lock( pthread_mutex_t
    mutex_lock)
  • int pthread_mutex_unlock( pthread_mutex_t
    mutex_lock)
  • int pthread_mutex_trylock( pthread_mutex_t
    mutex_lock)

12
include ltpthread.hgt void find_min(void
list_ptr) pthread_mutex_t minimum_value_lock int
minimum_value, partial_list_size main() minim
um_value MIN_INT pthread_init() pthread_mute
x_init(minimum_value_lock, NULL) /inititaliz
e lists etc, create and join threads/ void
find_min(void list_ptr) int
partial_list_ptr, my_min MIN_INT,
i partial_list_ptr (int )list_ptr for (i
0 i lt partial_list_size i) if
(partial_list_ptri lt my_min) my_min
partial_list_ptri pthread_mutex_lock(minimum_v
alue_lock) if (my_min lt minimum_value) minimum
_value my_min pthread_mutex_unlock(minimum_val
ue_lock) pthread_exit(0)
13
Locking Overhead
  • Serialization points
  • Minimize the size of critical sections
  • Be careful
  • Rather than wait, check if lock is available
  • pthread_mutex_trylock
  • If already locked, will return EBUSY
  • Will require restructuring of code

14
/ Finding k matches in a list / void
find_entries(void start_pointer) / This is
the thread function / struct database_record
next_record int count current_pointer
start_pointer do next_record
find_next_entry(current_pointer) count
output_record(next_record) while (count lt
requested_number_of_records) int
output_record(struct database_record record_ptr)
int count pthread_mutex_lock(output_count_lo
ck) output_count count output_count
pthread_mutex_unlock(output_count_lock) if
(count lt requested_number_of_records)
print_record(record_ptr) return (count)
15
/ rewritten output_record function / int
output_record(struct database_record record_ptr)
int count int lock_status
lock_statuspthread_mutex_trylock(output_count_l
ock) if (lock_status EBUSY)
insert_into_local_list(record_ptr) return(0)
else count output_count output_count
number_on_local_list 1 pthread_mutex_unlock
(output_count_lock) print_records(record_ptr,
local_list, requested_number_of_records -
count) return(count number_on_local_list
1)
16
Mutex features/issues
  • Limited to just mutex
  • Use posix semaphores for more control
  • Can only have one process in mutex
  • What if you get in and then realize things arent
    quite ready yet?
  • Must exit the mutex and start over
  • Can we avoid going to the back of the line?

17
Condition Variables for Synchronization
  • A condition variable allows a thread to block
    itself until specified data reaches a predefined
    state.
  • A condition variable is associated with a
    predicate.
  • When the predicate becomes true, the condition
    variable is used to signal one or more threads
    waiting on the condition.
  • A single condition variable may be associated
    with more than one predicate.
  • A condition variable always has a mutex
    associated with it.
  • A thread locks this mutex and tests the predicate
    defined on the shared variable.
  • If the predicate is not true, the thread waits on
    the condition variable associated with the
    predicate using the function pthread_cond_wait.

18
Using Condition Variables
Main Thread Declare and initialize global
data/variables which require synchronization
(such as "count") Declare and initialize a
condition variable object Declare and
initialize an associated mutex Create
threads A and B to do work Thread A Do
work up to the point where a certain condition
must occur (such as "count" must reach a
specified value) Lock associated mutex and
check value of a global variable Call
pthread_cond_wait() to perform a blocking wait
for signal from Thread-B. Note that a
call to pthread_cond_wait() automatically and
atomically unlocks the associated mutex
variable so that it can be used by Thread-B.
When signalled, wake up. Mutex is automatically
and atomically locked. Explicitly unlock
mutex Continue Thread B Do work
Lock associated mutex Change the value
of the global variable that Thread-A is waiting
upon. Check value of the global Thread-A
wait variable. If it fulfills the desired
condition, signal Thread-A. Unlock mutex.
Continue
19
Condition Variables for Synchronization
  • Pthreads provides the following functions for
    condition variables
  • int pthread_cond_wait(pthread_cond_t cond,
  • pthread_mutex_t mutex)
  • int pthread_cond_signal(pthread_cond_t cond)
  • int pthread_cond_broadcast(pthread_cond_t cond)
  • int pthread_cond_init(pthread_cond_t cond,
  • const pthread_condattr_t attr)
  • int pthread_cond_destroy(pthread_cond_t cond)

20
Producer-Consumer Using Locks
  • pthread_mutex_t task_queue_lock
  • int task_available
  • ...
  • main()
  • ....
  • task_available 0
  • pthread_mutex_init(task_queue_lock, NULL)
  • ....
  • void producer(void producer_thread_data)
  • ....
  • while (!done())
  • inserted 0
  • create_task(my_task)
  • while (inserted 0)
  • pthread_mutex_lock(task_queue_lock)
  • if (task_available 0)
  • insert_into_queue(my_task)
  • task_available 1

21
Producer-Consumer Using Locks
  • void consumer(void consumer_thread_data)
  • int extracted
  • struct task my_task
  • / local data structure declarations /
  • while (!done())
  • extracted 0
  • while (extracted 0)
  • pthread_mutex_lock(task_queue_lock)
  • if (task_available 1)
  • extract_from_queue(my_task)
  • task_available 0
  • extracted 1
  • pthread_mutex_unlock(task_queue_lock)
  • process_task(my_task)

22
Producer-Consumer Using Condition Variables
  • pthread_cond_t cond_queue_empty, cond_queue_full
  • pthread_mutex_t task_queue_cond_lock
  • int task_available
  • / other data structures here /
  • main()
  • / declarations and initializations /
  • task_available 0
  • pthread_init()
  • pthread_cond_init(cond_queue_empty, NULL)
  • pthread_cond_init(cond_queue_full, NULL)
  • pthread_mutex_init(task_queue_cond_lock, NULL)
  • / create and join producer and consumer threads
    /

23
Producer-Consumer Using Condition Variables
  • void producer(void producer_thread_data)
  • int inserted
  • while (!done())
  • create_task()
  • pthread_mutex_lock(task_queue_cond_lock)
  • while (task_available 1)
  • pthread_cond_wait(cond_queue_empty,
  • task_queue_cond_lock)
  • insert_into_queue()
  • task_available 1
  • pthread_cond_signal(cond_queue_full)
  • pthread_mutex_unlock(task_queue_cond_lock)

24
Producer-Consumer Using Condition Variables
  • void consumer(void consumer_thread_data)
  • while (!done())
  • pthread_mutex_lock(task_queue_cond_lock)
  • while (task_available 0)
  • pthread_cond_wait(cond_queue_full,
  • task_queue_cond_lock)
  • my_task extract_from_queue()
  • task_available 0
  • pthread_cond_signal(cond_queue_empty)
  • pthread_mutex_unlock(task_queue_cond_lock)
  • process_task(my_task)

25
Condition Variables
  • Rather than just signaling one blocked thread, we
    can signal all
  • int pthread_cond_broadcast(pthread_cond_t cond)
  • Can also have a timeout
  • int pthread_cond_timedwait( pthread_cond_t cond,
    pthread_mutex_t mutex,
    const struct timespec abstime)

26
include ltpthread.hgt include ltstdio.hgt include
ltstdlib.hgt define NUM_THREADS 3 define TCOUNT
10 define COUNT_LIMIT 12 int count 0 int
thread_ids5 0,1,2,3,4 pthread_mutex_t
count_mutex pthread_cond_t count_threshold_cv i
nt main(int argc, char argv) int i, rc
pthread_t threads5 pthread_attr_t attr
/ Initialize mutex and condition variable
objects / pthread_mutex_init(count_mutex,
NULL) pthread_cond_init (count_threshold_cv,
NULL) / For portability, explicitly create
threads in a joinable state /
pthread_attr_init(attr) pthread_attr_setdetach
state(attr, PTHREAD_CREATE_JOINABLE)
pthread_create(threads4, attr, watch_count,
(void )thread_ids4) pthread_create(threads
3, attr, watch_count, (void )thread_ids3)
pthread_create(threads2, attr, watch_count,
(void )thread_ids2) pthread_create(threads
1, attr, inc_count, (void )thread_ids1)
pthread_create(threads0, attr, inc_count,
(void )thread_ids0) / Wait for all
threads to complete / for (i 0 i lt
NUM_THREADS i) pthread_join(threadsi,
NULL) printf ("Main() Waited on d
threads. Done.\n", NUM_THREADS) / Clean up
and exit / pthread_attr_destroy(attr)
pthread_mutex_destroy(count_mutex)
pthread_cond_destroy(count_threshold_cv)
pthread_exit (NULL)
27
int count 0 int thread_ids5
0,1,2,3,4 pthread_mutex_t count_mutex pthread_
cond_t count_threshold_cv
void inc_count(void idp) int j,i double
result0.0 int my_id idp for (i0 i lt
TCOUNT i) pthread_mutex_lock(count_mutex
) count / Check the value of
count and signal waiting thread when condition
is reached. Note that this occurs while
mutex is locked. / if (count
COUNT_LIMIT) pthread_cond_broadcast(
count_threshold_cv) printf("inc_count()
thread d, count d Threshold reached.\n",
my_id, count)
printf("inc_count() thread d, count d,
unlocking mutex\n", my_id, count)
pthread_mutex_unlock(count_mutex) / Do
some work so threads can alternate on mutex lock
/ for (j0 j lt 1000 j) result
result (double)random()
pthread_exit(NULL)
void watch_count(void idp) int my_id
idp printf("Starting watch_count() thread
d\n", my_id) / Lock mutex and wait for
signal. Note that the pthread_cond_wait routine
will automatically and atomically unlock mutex
while it waits. Also, note that if COUNT_LIMIT
is reached before this routine is run by the
waiting thread, the loop will be skipped to
prevent pthread_cond_wait from never
returning. / pthread_mutex_lock(count_mutex)
if (count lt COUNT_LIMIT)
pthread_cond_wait(count_threshold_cv,
count_mutex) printf("watch_count() thread
d Condition signal received.\n", my_id)
pthread_mutex_unlock(count_mutex)
pthread_exit(NULL)
28
Composite Synchronization Constructs
  • By design, Pthreads provide support for a basic
    set of operations.
  • Higher level constructs can be built using basic
    synchronization constructs.
  • Consider Read-Write Locks and Barriers

29
Barriers
  • A barrier holds a thread until all threads
    participating in the barrier have reached it.
  • Some versions of the Pthreads library support
    barriers (not required)
  • pthread_barrier_t barr
  • attributes NULL
  • pthread_barrier_init(barr, attributes, nthreads)
  • pthread_barrier_wait(barr)
  • pthread_barrier_destroy(barr)
  • pthread barriers are available on most Linux
    implementations.

30
Barrier Implementation
  • Barriers can be implemented using a counter, a
    mutex and a condition variable.
  • A single integer is used to keep track of the
    number of threads that have reached the barrier.
  • If the count is less than the total number of
    threads, the threads execute a condition wait.
  • The last thread entering (and setting the count
    to the number of threads) wakes up all the
    threads using a condition broadcast.

31
Barriers
  • typedef struct
  • pthread_mutex_t count_lock
  • pthread_cond_t ok_to_proceed
  • int count
  • mylib_barrier_t
  • void mylib_init_barrier(mylib_barrier_t b)
  • b -gt count 0
  • pthread_mutex_init((b -gt count_lock), NULL)
  • pthread_cond_init((b -gt ok_to_proceed), NULL)

32
Barriers
  • void mylib_barrier (mylib_barrier_t b, int
    num_threads)
  • pthread_mutex_lock((b -gt count_lock))
  • b -gt count
  • if (b -gt count num_threads)
  • b -gt count 0
  • pthread_cond_broadcast((b -gt ok_to_proceed))
  • else
  • while (pthread_cond_wait((b -gt ok_to_proceed),
  • (b -gt count_lock)) !
    0)
  • pthread_mutex_unlock((b -gt count_lock))

33
Barriers
  • Linear barrier.
  • The trivial lower bound on execution time of this
    function is O(n) for n threads.
  • This implementation of a barrier can be speeded
    up using multiple barrier variables organized in
    a tree.
  • Use n/2 condition variable-mutex pairs for
    implementing a barrier for n threads.
  • At the lowest level, threads are paired up and
    each pair of threads shares a single condition
    variable-mutex pair.
  • Once both threads arrive, one of the two moves
    on, the other one waits.
  • This process repeats up the tree.
  • This is called a log barrier and its runtime
    grows as O(log p).

34
Barrier
  • Execution time of 1000 sequential and logarithmic
    barriers as a function of number of threads on a
    32 processor SGI Origin 2000.

35
Barriers
  • Use pthread condition variables and mutexes.
  • Is this the best way?
  • Forces a thread to sleep and give up the
    processor
  • Rather than give up the processor, just wait on a
    variable.
  • busy wait
  • Will it be faster?

36
QbusyBarrier
void qbusy_init_barrier(qbusy_barrier_t b, int
nthreads) b -gt wait (int
)malloc(nthreads sizeof(int)) b -gt
count 0 pthread_mutex_init((b -gt
count_lock), NULL) void qbusy_barrier
(qbusy_barrier_t b, int iproc, int
num_threads) int i
float tmp if (num_threads
1) return
b-gtwaitiproc 1
pthread_mutex_lock((b -gt count_lock))
b -gt count if (b -gt
count num_threads)
pthread_mutex_unlock((b -gt
count_lock)) /
Now release the hounds /
for (i 0 i lt num_threads i)
b-gtwaiti 0
else
pthread_mutex_unlock((b -gt count_lock))
while(b-gtwaitiproc)
37
Read-Write Locks
  • In many applications, a data structure is read
    frequently but written infrequently.
  • For such applications, we should use read-write
    locks.
  • A read lock is granted when there are other
    threads that may already have read locks.
  • If there is a write lock on the data (or if there
    are queued write locks), the thread performs a
    condition wait.
  • If there are multiple threads requesting a write
    lock, they must perform a condition wait.

38
Read-Write Locks
  • The lock data type mylib_rwlock_t holds the
    following
  • a count of the number of readers,
  • the writer (a 0/1 integer specifying whether a
    writer is present),
  • a condition variable readers_proceed that is
    signaled when readers can proceed,
  • a condition variable writer_proceed that is
    signaled when one of the writers can proceed,
  • a count pending_writers of pending writers, and
  • a mutex read_write_lock associated with the
    shared data structure

39
Read-Write Locks
  • typedef struct
  • int readers
  • int writer
  • pthread_cond_t readers_proceed
  • pthread_cond_t writer_proceed
  • int pending_writers
  • pthread_mutex_t read_write_lock
  • mylib_rwlock_t
  • void mylib_rwlock_init (mylib_rwlock_t l)
  • l -gt readers l -gt writer l -gt pending_writers
    0
  • pthread_mutex_init((l -gt read_write_lock),
    NULL)
  • pthread_cond_init((l -gt readers_proceed), NULL)
  • pthread_cond_init((l -gt writer_proceed), NULL)

40
Read-Write Locks
  • void mylib_rwlock_rlock(mylib_rwlock_t l)
  • / if there is a write lock or pending writers,
    perform condition wait.. else increment count of
    readers and grant read lock /
  • pthread_mutex_lock((l -gt read_write_lock))
  • while ((l -gt pending_writers gt 0) (l -gt writer
    gt 0))
  • pthread_cond_wait((l -gt readers_proceed),
  • (l -gt read_write_lock))
  • l -gt readers
  • pthread_mutex_unlock((l -gt read_write_lock))

41
Read-Write Locks
  • void mylib_rwlock_wlock(mylib_rwlock_t l)
  • / if there are readers or writers, increment
    pending writers count and wait. On being woken,
    decrement pending writers count and increment
    writer count /
  • pthread_mutex_lock((l -gt read_write_lock))
  • while ((l -gt writer gt 0) (l -gt readers gt 0))
  • l -gt pending_writers
  • pthread_cond_wait((l -gt writer_proceed),
  • (l -gt read_write_lock))
  • l -gt pending_writers --
  • l -gt writer
  • pthread_mutex_unlock((l -gt read_write_lock))

42
Read-Write Locks
  • void mylib_rwlock_unlock(mylib_rwlock_t l)
  • / if there is a write lock then unlock, else if
    there are read locks, decrement count of read
    locks. If the count is 0 and there is a pending
    writer, let it through, else if there are pending
    readers, let them all go through /
  • pthread_mutex_lock((l -gt read_write_lock))
  • if (l -gt writer gt 0)
  • l -gt writer 0
  • else if (l -gt readers gt 0)
  • l -gt readers --
  • pthread_mutex_unlock((l -gt read_write_lock))
  • if ((l -gt readers 0) (l -gt pending_writers
    gt 0))
  • pthread_cond_signal((l -gt writer_proceed))
  • else if (l -gt readers gt 0)
  • pthread_cond_broadcast((l -gt readers_proceed))

43
Semaphores
  • Synchronization tool provided by the OS
  • Integer variable and 2 operations
  • Wait(s) while (s lt 0) do noop
    /sleep/ s s - 1
  • Signal(s) s s 1
  • All modifications to s are atomic

44
The critical section problem
Shared semaphore mutex 1
repeat wait(mutex) critical
section signal(mutex) remainder
section until false
45
Using semaphores
  • Two processes P1 and P2
  • Statements S1 and S2
  • S2 must execute only after S1

46
Bounded Buffer Solution
Shared semaphore empty n, full 0, mutex 1
repeat produce an item in nextp wait(empty) w
ait(mutex) add nextp to the buffer signal(mut
ex) signal(full) until false
repeat wait(full) wait(mutex) remove an
item from buffer place it in nextc signal(mutex
) signal(empty) consume the item in
nextc until false
47
Readers - Writers (priority?)
Shared Semaphore mutex1, wrt 1 Shared
integer readcount 0
wait(mutex) readcount readcount 1 if
(readcount 1) wait(wrt) signal(mutex) read
the data wait(mutex) readcount readcount -
1 if (readcount 0) signal(wrt) signal(mutex)

wait(wrt) write to the data object signal(wrt)
48
Readers Writers (priority?)
outerQ, rsem, rmutex, wmutex, wsem 1
wait (outerQ) wait (rsem) wait
(rmutex) readcnt if (readcnt 1) wait
(wsem) signal(rmutex) signal
(rsem) signal (outerQ) READ wait (rmutex)
readcnt-- if (readcnt 0) signal
(wsem) signal (rmutex)
wait (wsem) writecnt if (writecnt
1) wait (rsem) signal (wsem) wait
(wmutex) WRITE signal (wmutex) wait (wsem)
writecnt-- if (writecnt 0) signal
(rsem) signal (wsem)
49
Unix Semaphores
  • Are a generalization of the counting semaphores
    (more operations are permitted).
  • A semaphore includes
  • the current value S of the semaphore
  • number of processes waiting for S to increase
  • number of processes waiting for S to be 0
  • System calls
  • semget creates an array of semaphores
  • semctl allows for the initialization of
    semaphores
  • semop performs a list of operations one on each
    semaphore (atomically)

50
Unix Semaphores
  • Each operation to be done is specified by a value
    sop.
  • Let S be the semaphore value
  • if sop gt 0 (signal operation)
  • S is incremented and process awaiting for S to
    increase are awaken
  • if sop 0
  • If S0 do nothing
  • if S!0, block the current process on the event
    that S0
  • if sop lt 0 (wait operation)
  • if S gt sop then S S - sop then if S
    lt0 wait

51
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com