Shared Memory Parallel Programming - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Shared Memory Parallel Programming

Description:

Each thread calls pooh(ID,A) for ID = 0 to 3. Parallel Regions ... pooh(ID,A); Each thread executes a copy of the code within the structured block ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 32

Provided by: barbara179

Category:

more less

Transcript and Presenter's Notes

Title: Shared Memory Parallel Programming

1
Shared Memory Parallel Programming

Introduction to OpenMP

2
OpenMP Overview
COMP FLUSH
pragma omp critical
COMP THREADPRIVATE(/ABC/)
CALL OMP_SET_NUM_THREADS(10)

OpenMP An API for Writing Multithreaded
Applications
A set of compiler directives and library routines
for parallel application programmers
Greatly simplifies writing multi-threaded (MT)
programs in Fortran, C and C
Standardizes last 20 years of SMP practice

COMP parallel do shared(a, b, c)
call omp_test_lock(jlok)
call OMP_INIT_LOCK (ilok)
COMP MASTER
COMP ATOMIC
COMP SINGLE PRIVATE(X)
setenv OMP_SCHEDULE dynamic
COMP PARALLEL DO ORDERED PRIVATE (A, B, C)
COMP ORDERED
COMP PARALLEL REDUCTION ( A, B)
COMP SECTIONS
pragma omp parallel for private(A, B)
!OMP BARRIER
COMP PARALLEL COPYIN(/blk/)
COMP DO lastprivate(XX)
Nthrds OMP_GET_NUM_PROCS()
omp_set_lock(lck)
The name OpenMP is the property of the OpenMP
Architecture Review Board.
3
OpenMP Programming Model

Master thread spawns a team of threads as needed
Parallelism is added incrementally until desired
performance is achieved i.e. the sequential
program evolves into a parallel program

Master Thread
Parallel Regions
4
Life is Short, Remember?
Its official OpenMP is easier to use than MPI!
5
How Mainstream Can You Be?

Based firmly upon prior experience (PCF)
Simplified and streamlined existing APIs
High level programming model
Programmer makes strategic decisions
Compiler figures out details
Generally available in standard commercial
compilers
Including Microsoft, now GNU
Research Omni, OpenUH, PCOMP etc.

6
The OpenMP ARB

OpenMP is maintained by the OpenMP Architecture
Review Board (the ARB), which
Interprets OpenMP
Writes new specifications - keeps OpenMP relevant
Works to increase the impact of OpenMP
Members are organizations - not individuals
Current members
Permanent Cray, Fujitsu, HP, IBM, Intel, MS,
NEC, PGI, SGI, Sun
Auxiliary ASCI, cOMPunity, EPCC, KSL, NASA, RWTH
Aachen

www.compunity.org
7
OpenMP Release History
1998
OpenMPC/C 1.0
2005
OpenMP 2.5
OpenMPFortran 1.0
OpenMPFortran 1.1
A single specification for Fortran, C and C
1997
1999
8
OpenMP 2.5

Merged language-specific APIs
Fixed minor problems
Reorganized material
Improved specification of nested parallelism
Internal control variables
Fixed the flush (memory model)

9
Where will OpenMP be Relevant in Future?
Its either multithreading, or a real heat wave.
Simultaneous multithreading, hyperthreading, chip
multithreading, streaming
10
OpenMP Definitions Constructs vs. Regions
in OpenMP
OpenMP constructs occupy a single compilation
unit while a region can span multiple source
files.
poo.f
bar.f
call whoami COMP PARALLEL call
whoami COMP END PARALLEL
subroutine whoami external
omp_get_thread_num integer iam,
omp_get_thread_num iam omp_get_thread_num(
) COMP CRITICAL print,Hello from ,
iam COMP END CRITICAL return end

A Parallel construct
The Parallel region is the text of the construct
plus any code called inside the construct
Orphan constructs can execute outside a parallel
region
11
Parallel Regions

You create threads in OpenMP with the omp
parallel pragma.
For example, To create a 4 thread parallel region

double A1000omp_set_num_threads(4)pragma
omp parallel int ID omp_get_thread_num()
pooh(ID,A)
Runtime function to request a certain number of
threads
Each thread executes a copy of the code within
the structured block
Runtime function returning a thread ID

Each thread calls pooh(ID,A) for ID 0 to 3

The name OpenMP is the property of the OpenMP
Architecture Review Board
12
Parallel Regions

You create threads in OpenMP with the omp
parallel pragma.
For example, To create a 4 thread parallel region

clause to request a certain number of threads
double A1000 pragma omp parallel
num_threads(4) int ID omp_get_thread_num()
pooh(ID,A)
Each thread executes a copy of the code within
the structured block
Runtime function returning a thread ID

Each thread calls pooh(ID,A) for ID 0 to 3

The name OpenMP is the property of the OpenMP
Architecture Review Board
13
Parallel Regions
double A1000omp_set_num_threads(4) pragma
omp parallel int ID
omp_get_thread_num() pooh(ID,
A) printf(all done\n)

Each thread executes the same code redundantly.

double A1000
omp_set_num_threads(4)
A single copy of A is shared between all threads.
pooh(1,A)
pooh(2,A)
pooh(3,A)
pooh(0,A)
printf(all done\n)
Threads wait here for all threads to finish
before proceeding (i.e. a barrier)
The name OpenMP is the property of the OpenMP
Architecture Review Board
14
ExerciseA multi-threaded Hello world program

Write a multithreaded program where each thread
prints hello world.

void main() int ID 0 printf(
hello(d) , ID) printf( world(d) \n,
ID)
15
A multi-threaded Hello world program

Write a multithreaded program where each thread
prints hello world.

include omp.hvoid main() pragma omp
parallel int ID omp_get_thread_num()
printf( hello(d) , ID) printf(
world(d) \n, ID)
OpenMP include file
Sample Output hello(1) hello(0)
world(1) world(0) hello (3) hello(2)
world(3) world(2)
Parallel region with default number of threads
Runtime library function to return a thread ID.
End of the Parallel region
16
Parallel Regions and the if clauseActive vs
inactive parallel regions.

An optional if clause causes the parallel region
to be active only if the logical expression
within the clause evaluates to true.
An if clause that evaluates to false causes the
parallel region to be inactive (i.e. executed by
a team of size one).

double AN pragma omp parallel if
(Ngt1000) int ID omp_get_thread_num()
pooh(ID,A)
The name OpenMP is the property of the OpenMP
Architecture Review Board
17
OpenMP Work-Sharing Constructs

The for work-sharing construct splits up loop
iterations among the threads in a team

pragma omp parallelpragma omp for for
(I0IltNI) NEAT_STUFF(I)
By default, there is a barrier at the end of the
omp for. Use the nowait clause to turn off
the barrier. pragma omp for
nowait nowait is useful between two
consecutive, independent omp for loops.
18
Work Sharing ConstructsA motivating example
for(i0IltNi) ai ai bi
Sequential code
pragma omp parallel int id, i, Nthrds,
istart, iend id omp_get_thread_num() Nthrds
omp_get_num_threads() istart id N /
Nthrds iend (id1) N / Nthrds for(iistart
Iltiendi) ai ai bi
OpenMP parallel region
OpenMP parallel region and a work-sharing
for-construct
pragma omp parallel pragma omp for
schedule(static) for(i0IltNi) ai
ai bi
19
OpenMP For/Do constructThe schedule clause

Affects how loop iterations are mapped onto
threads
schedule(static ,chunk)
Deal-out blocks of iterations of size chunk to
each thread.
schedule(dynamic,chunk)
Each thread grabs chunk iterations off a queue
until all iterations have been handled.
schedule(guided,chunk)
Threads dynamically grab blocks of iterations.
The size of the block starts large and shrinks
down to size chunk as the calculation proceeds.
schedule(runtime)
Schedule and chunk size taken from the
OMP_SCHEDULE environment variable.

20
The schedule clause
Least work at runtime scheduling done at
compile-time
Most work at runtime complex scheduling logic
used at run-time
21
The schedule clause
20 iterations 6 threads Static schedule
3 iterations per thread ? last thread has 5
iterations 4 iterations per thread ? last thread
has 0 iterations !
22
OpenMP Work-Sharing Constructs

The Sections work-sharing construct gives a
different structured block to each thread.

pragma omp parallelpragma omp
sectionspragma omp section X_calculation()p
ragma omp section y_calculation()pragma omp
section z_calculation()
By default, there is a barrier at the end of the
omp sections. Use the nowait clause to turn
off the barrier.
23
OpenMP Work-Sharing Constructs

The master construct denotes a structured block
that is only executed by the master thread. The
other threads just skip it (no synchronization is
implied).

pragma omp parallel private (tmp) do_many_thi
ngs( )pragma omp master
exchange_boundaries( ) pragma
barrier do_many_other_things( )
24
OpenMP Work-Sharing Constructs

The single construct denotes a block of code that
is executed by only one thread.
A barrier is implied at the end of the single
block.

pragma omp parallel private (tmp) do_many_thi
ngs( )pragma omp single
exchange_boundaries( ) do_many_other_things(
)
25
Combined parallel/work-share

OpenMP shortcut Put the parallel and the
work-share on the same line

double resMAX int i pragma omp parallel
pragma omp for for (i0ilt MAX i)
resi huge()
double resMAX int i pragma omp parallel
for for (i0ilt MAX i) resi
huge()
These are equivalent