Title: Pthreads examples
1Pthreads examples
- Traveling Salesman (TSP) (task parallelism)
- SOR (data parallelism)
2TSP (Traveling Salesman)
- Goal
- Given a list of cities, a matrix of distances
between them, and a starting city - Find the shortest tour in which all cities are
visited exactly once - Example of an NP-hard search problem
- Algorithm branch-and-bound
3Branching
- Initialization
- Go from starting city to each of remaining cities
- Put research partial path into priority queue,
ordered by the current length of the tour - Further (repeatedly)
- Take head element out of priority queue
- Expand by each one remaining cities
- Put resulting partial path into priority queue
4Finding the solution
- Eventually, a complete path will be found.
- Remember its length as the current shortest path.
- Every time a complete path is found, check if we
need to update current best path. - When priority queue is empty, best path is found.
5Using a simple bound
- Once a complete path is found, we have a lower
bound on the length of the shortest path. - No use in exploring partial path that is already
longer than the current bound - Such partial paths can be removed from the queue.
6Sequential TSP data structure
- Priority queue of partial paths.
- Current best solution and its length.
- For simplicity, we will ignore bounding.
7Sequential TSP
- Init_q() init_best()
- While ((p dequeue()) ! NULL)
- for each expansion by one city
- q addcity (p)
- if (complete(q)) update_best(q)
- else enqueue(q)
-
8Parallel TSP Possibilities
- Have each process do one expansion.
- Have each process do expansion of one partial
path - Have each process do expansion of multiple
partial paths. - Issue of granularity/performance, not an issue of
correctness. - Assumption a thread expands one partial path at
one time.
9Parallel TSP synchronization
- True dependence between process that puts partial
path in queue and the one that takes it out. - Dependences arise dynamically
- Required synchronization need to make process
wait if q is empty.
10Parallel TSP
Dequeue wait if q is empty Enqueue signal that
q is no longer empty
- Thread I
- while ((pdequeue())!NULL)
- for each expansion by one city
- q addcity(p)
- if complete (q) update_best(q)
- else enqueue(q)
-
-
11Parallel TSP more synchronization
- All threads operate, potentially at the same
time, on q and best. - This must not be allowed to happen.
- Critical section only one process can execute in
critical section at once. - Enqueue/dequeue must be protected.
- Update best must be protected.
12Parallel TSP synchronization summary
- Need critical section
- In update_best
- In enqueue/dequeue
- In dequeue
- Wait if q is empty
- Terminate if all processes are waiting
- In enqueue
- Signal q is no longer empty
13Parallel TSP mutual exclusion
- Enqueue()/dequeue()
- pthread_mutex_lock(queue)
-
- pthread_mutex_unlock(queue)
-
- Update_best()
- pthread_mutex_lock(best)
-
- pthread_mutex_unlock(best)
14Parallel TSP signal/wait
- Dequeue()
- pthread_mutex_lock(queue)
- while ((q is empty) and (not done))
- waiting
- if (waiting p)
- done true
- pthread_cond_broadcast(empty)
- else
- pthread_cond_wait(empty, queue)
- waiting --
-
-
- if (done) return NULL
- else remove and return head of the queue
- pthread_mutex_unlock(queue)
15Second application SOR sequential version
16Parallel SOR
- First (I, j) loop nest can be parallelized
- Second (I, j) loop nest can be parallelized
- Must wait until all processors have finished
first loop nest before starting second - Must wait until all processors have finished
second loop nest of the previous iteration before
starting first loop nest in the next iteration. - Given n/p rows to each processor.
17Pthreads SOR first loop
18Pthreads SOR second loop
19Pthreads SOR main program
20Barrier synchronization
- A barrier operation causes a thread to wait until
all threads have reached the barrier operation. - At that point, all proceed.
- Can be used to replace creating and destroying
threads multiple times with lower overheads.
21Barrier implementation in pthreads
- Count the number of arrivals at the barrier
- Wait if this is not the last arrival
- Make everyone unblock if this is the last
arrival. - Use mutex to protect the count.
22Barrier implementation in pthreads
23Parallel SOR with barriers
- Void sor(void arg)
- initialization
- for some number of iteartions
- for (I) for (j ) temp
- barrier()
- for (I) for (j ) grid
- barrier()
-
24Parallel SOR with barriers main
25Pthreads discussion
- Expressiveness
- Task parallelism
- Data parallelism
- Ease of use
- Explicit thread maintenance
- Explicit synchronization among threads
- Lock, conditional variables, semaphors, where do
we normally implement these? - These are the mechanisms to get the job done.
26Pthreads discussion
- Exposing architecture features
- Same approach as running sequential programs
architecture oblivious - What does it take to run threads efficiently?
- Is it the same when multiple threads run on SMP,
CMP, and SMT processors? - Independent and co-operative threads
- Resource (cache, functional unit, load/store
unit) contention issues - Are resources exposed?
- Solution more OS/architecture support.
27Pthreads conclusion
- On the software layers, pthreads provides the
functionality of what layer? - Good expressiveness, supports all types of
parallelisms. - Bad usability similar to directly using the UNIX
system call interface. - How many people program at this level at this
time? - What are people using now?
- Need to do the same thing for pthreads
- Assembly ? C, pthreads ? ???