Title: OpenMP
1OpenMP
2OpenMP
- OpenMP Overview
- Goals
- OpenMP constructs
- Parallel Regions
- Work Sharing Directives
- Data Scope Attribute Clauses
- Synchronization Directives
- Runtime Library Routines
- Environment Variables
- Major Errors
- Future of OpenMP 3.o
3Overview
- Stands for Open specifications for Multi
Processing - A set of API for writing multi threaded
applications - OpenMPs contructs are Made up of compiler
directives. - Used with C/C and Fortran Languages.
4 Overview
- Thread based parallism multiple threads created
and run on the same shared memory - Uses Fork-Join Model A master process and slaves
being forked and then joined again.
5OpenMP Release History
- 1997 OpenMP Fortran 1.0
- 1998 OpenMP C/C 1.0
- 1999 OpenMP Fortran 1.1
- 2000 OpenMP Fortran 2.0
- 2002 OpenMP C/C 2.0
- ?? OpenMP 3.0 ??
6OpenMP
- OpenMP Overview
- Goals
- OpenMP constructs
- Parallel Regions
- Work Sharing Directives
- Data Scope Attribute Clauses
- Synchronization Directives
- Runtime Library Routines
- Environment Variables
- Major Errors
- Future of OpenMP 3.o
7Goals
- Standardization
- Provide a standard among a variety of shared
memory architectures/platforms - Lean and Mean
- Establish a simple and limited set of directives
for programming shared memory machines.
Significant parallelism can be implemented by
using just 3 or 4 directives. -
8Goals
- Ease of Use
- Provide capability to incrementally parallelize a
serial program - Provide the capability to implement both
coarse-grain and fine-grain parallelism - Portability
- Supports Fortran (77, 90, and 95), C, and C
9OpenMP
- OpenMP Overview
- Goals
- OpenMP constructs
- Parallel Regions
- Work Sharing Directives
- Data Scope Attribute Clauses
- Synchronization Directives
- Runtime Library Routines
- Environment Variables
- Major Errors
- Future of OpenMP 3.o
10OpenMP Constructs
- The OpenMP constructs fall into five main
categories - Parallel Region Directive
- Work-sharing Directives
- Data Scope Attribute Clauses
- Synchronization Directives
- Runtime Library Routines
- Environment variables
11OpenMP Constructs
- Format of any OpenMP construct in C/C
- pragma omp directivename clauses..
-
- parallel code()
-
12OpenMP Constructs
- The OpenMP constructs fall into five main
categories - Parallel Region Directive
- Work-sharing Directives
- Data Scope Attribute Clauses
- Synchronization Directives
- Runtime Library Routines
- Environment variables
13Parallel Regions Directive
- Indicates a block of code that will be executed
by multiple threads. - This is the fundamental OpenMP parallel
construct. -
14Parallel Regions Directive
- includeltomp.hgt
- Void main()
- int x
- Setof sequential code()
- pragma omp parallel
-
- setof parallel code()
-
- another set of sequential code()
-
15OpenMP Constructs
- The OpenMP constructs fall into five main
categories - Parallel Region Directive
- Work-sharing Directives
- Data Scope Attribute Clauses
- Synchronization Directives
- Runtime Library Routines
- Environment variables
16Work sharing Directive
- Includes
- For-construct
- Section construct
- Single construct
- For Construct
- Used for splitting work among all available
threads. - Splitting of jobs depends on the SCHEDULE
clause used.
17For Directive
- pragma omp parallel
-
- pragma omp for
- for(int I0IltnI)
- aIbIaI
-
- The same program could be written without the
for construct but with more programming steps.
18For Directive
- pragma omp parallel
-
- int id, i, Nthrds, istart, iend
- id omp_get_thread_num()
- Nthrds omp_get_num_threads()
- istart id N / Nthrds
- iend (id1) N / Nthrds
- for(iistartIltiendi) ai ai bi
-
19For Directive
- pragma omp parallell
- pragma omp for schedule(static)
- for(i0IltNi) ai ai bi
20Schedule Clause
- The schedule clause effects how loop iterations
are mapped onto threads - schedule(static ,chunk)
- Deal-out blocks of iterations of size chunk
to each thread. - schedule(dynamic,chunk)
- Each thread grabs chunk iterations off a
queue until all iterations have been handled.
21Schedule Clause
- schedule(guided,chunk)
- Threads dynamically grab blocks of iterations.
The size of the block starts large and shrinks
down to size chunk as the calculation proceeds. - schedule(runtime)
- Schedule and chunk size taken from the
- OMP_SCHEDULE environment variable.
22Section Directive
- The SECTIONS directive is a non-iterative
work-sharing construct. - It specifies that the enclosed section(s) of code
are to be divided among the threads in the team. - Independent SECTION directives are nested within
a SECTIONS directive.
23Section Directive
- What if number of threads greater than the number
of sections? - What if the number of sections greater than the
threads?
24Section Directive
- pragma omp parallel
-
- pragma omp sections
-
- pragma omp section
- stepsexecutedbyone()
- pragma omp section
- stepsexecutedbyanotherone()
-
-
25Single Directive
- One thread only will execute the single section,
while the others will do nothing. - pragma omp single
26Parallel Regions and work sharing Directives
- A parallel region directive could be combined
- with a work-sharing construct.
- pragma omp parallel for ScheduleClause
- pragma omp parallel sections
-
27OpenMP Constructs
- The OpenMP constructs fall into five main
categories - Parallel Region Directive
- Work-sharing Directives
- Data Scope Attribute Clauses
- Synchronization Directives
- Runtime Library Routines
- Environment variables
28Data Scope Attribute Clauses
- An important consideration for OpenMP programming
is the understanding and use of data scoping - Because OpenMP is based upon the shared memory
programming model, most variables are shared by
default
29Data Scope Attribute Clauses
- The OpenMP Data Scope Attribute Clauses are used
to explicitly define how variables should be
scoped. They include - PRIVATE
- FIRSTPRIVATE
- LASTPRIVATE
- THREADPRIVATE
- COPYIN
- SHARED
- DEFAULT
- REDUCTION
30Data Scope Attribute Clauses
- The SHARED clause declares variables in its list
to be shared among all threads in the team. - The PRIVATE clause declares variables in its list
to be private to each thread. - The FIRSTPRIVATE clause initializes the value of
the variables in its list. It is initialized to
its value prior entering the parallel region
31Data Scope Attribute Clauses
- The LASTPRIVATE copies the value of the last loop
iteration or section to the original variable
object. - Without the LASTPRIVATE clause, the value of the
original private object at the end of the
execution will be UNDEFINED. - The THREADPRIVATE directive is used to make
global file scope variables local to a thread
through the execution of multiple parallel
regions.
32Data Scope Attribute Clauses
- The COPYIN clause used for initialization of
THREADPRIVATE variable. - THREADPRIVATE differs from PRIVATE
- The DEFAULT clause allows the user to specify a
default PRIVATE, SHARED, or NONE scope.
33Data Scope Attribute Clauses
- The REDUCTION clause performs a reduction on the
variables that appear in its list. - A private copy for each list variable is created
for each thread. - At the end of the reduction, the reduction
variable is applied to all private copies of the
shared variable, and the final result is written
to the global shared variable. - They must also be declared SHARED in the
enclosing context.
34- include ltomp.hgt
- main ()
-
- int i, n, chunk
- float a100, b100, result
- n 100 chunk 10 result 0.0
- for (i0 i lt n i) ai i 1.0 bi
i 2.0 - pragma omp parallel for\
- default(shared) private(i) \
- schedule(static,chunk) \
- reduction(result)
- for (i0 i lt n i)
- result result (ai bi)
- printf("Final result f\n",result)
-
35OpenMP Constructs
- The OpenMP constructs fall into five main
categories - Parallel Region Directive
- Work-sharing Directives
- Data Scope Attribute Clauses
- Synchronization Directives
- Runtime Library Routines
- Environment variables
36Synchronization Directives
- To avoid inconsistency between shared variables,
variables must be synchronized between the
threads to insure that the correct result is
always produced. - OpenMP provides a variety of Synchronization
Constructs that control how the execution of each
thread proceeds relative to other team threads.
37Synchronization Constructs
- Barrier
- NoWait
- Critical
- Atomic
- Ordered
- Master
- Flush
38Synchronization Directives
- When a BARRIER directive is reached, a thread
will wait at that point until all other threads
have reached that barrier. - Implicit barriers are applied at
- End parallel regions
- End of work sharing constructs(for,sections,single
) - End of critical sections
39Synchronization Directives
- NoWait is a construct that overcomes the implicit
barriers. - It is used with
- Parallel Regions Directives
- Work sharing Directives
40Synchronization Directives
- The CRITICAL directive specifies a region of code
that must be executed by only one thread at a
time - It blocks all other threads until the current
thread exits that CRITICAL region. - pragma omp critical name
- The optional name enables multiple different
CRITICAL regions to exist - Different CRITICAL regions with the same name are
treated as the same region.
41Synchronization Directives
- The ATOMIC directive specifies that a specific
memory location must be updated atomically,
rather than letting multiple threads attempt to
write to it. - Provides a mini-CRITICAL section.
- The MASTER directive specifies a region that is
to be executed only by the master thread of the
team. All other threads on the team skip this
section of code
42Synchronization Directives
- The FLUSH directive identifies a synchronization
point at which the implementation must provide a
consistent view of memory. Thread-visible
variables are written back to memory at this
point. - FLUSH is implied implicitly with these
directives - critical - upon entry and exit
- Barrier
- ordered - upon entry and exit
- parallel - upon exit
- for - upon exit
- sections - upon exit
- single - upon exit
43Synchronization Directives
- The ORDERED directive specifies that iterations
of the enclosed loop will be executed in the same
order as if they were executed on a serial
processor. - A loop which contains an ORDERED directive, must
be a loop with an ORDERED clause.
44OpenMP Constructs
- The OpenMP constructs fall into five main
categories - Parallel Region Directive
- Work-sharing Directives
- Data Scope Attribute Clauses
- Synchronization Directives
- Runtime Library Routines
- Environment variables
45Runtime Library Routines
- The OpenMP standard defines an API for library
calls that perform a variety of functions - Query the number of threads/processors, set
number of threads to use - General purpose locking routines (semaphores)
- Set execution environment functions nested
parallelism, dynamic adjustment of threads.
46Runtime Library Routines
- sets the number of threads that will be used in
the next parallel region. - void omp_set_num_threads(int num_threads)
- returns the number of threads that are currently
in the team executing the parallel region from
which it is called. - int omp_get_num_threads(void)
- returns the maximum value that can be returned by
a call to the OMP_GET_NUM_THREADS function. -
47- int omp_get_max_threads(void)
- returns the thread number of the thread, within
the team, making this call. This number will be
between 0 and OMP_GET_NUM_THREADS-1. The master
thread of the team is thread 0 - int omp_get_thread_num(void)
- returns the number of processors that are
available to the program. - int omp_get_num_procs(void)
- Used to determine if the section of code which is
executing is parallel or not. - int omp_in_parallel(void)
48Runtime Library Routines
- By default, a program with multiple parallel
regions will use the same number of threads to
execute each region. - This behavior can be changed to allow the
run-time system to dynamically adjust the number
of threads that are created for a given parallel
section. - To enables or disables dynamic adjustment (by the
run time system) of the number of threads
available for execution of parallel regions. - void omp_set_dynamic(int dynamic_threads)
49Runtime Library Routines
- used to determine if dynamic thread adjustment is
enabled or not. - int omp_get_dynamic(void)
- A parallel region nested within another parallel
region results in the creation of a new team,
consisting of one thread, by default. - used to enable or disable nested parallelism.
- void omp_set_nested(int nested)
50Runtime Library Routines
- used to determine if nested parallelism is
enabled or not. - void omp_get_nested ()
- to determine if nested parallelism is enabled or
not. - int omp_get_nested(void)
51Runtime Library Routines
- For the Lock routines/functions
- The lock variable must be accessed only through
the locking routines - The lock variable must have type omp_lock_t or
type omp_nest_lock_t, depending on the function
being used. - initializes a lock associated with the lock
variable. - void omp_init_lock(omp_lock_t lock)
- void omp_nest_init_lock(omp_nest_lock_t lock)
52Runtime Library Routines
- disassociates the given lock variable from any
locks. - void omp_destroy_lock(omp_lock_t lock)
- voidomp_destroy_nest__lock(omp_nest_lock_t lock)
- forces the executing thread to wait until the
specified lock is available. - void omp_set_lock(omp_lock_t lock)
- Void omp_set_nest__lock(omp_nest_lock_t lock)
53OpenMP Constructs
- The OpenMP constructs fall into five main
categories - Parallel Region Directive
- Work-sharing Directives
- Data Scope Attribute Clauses
- Synchronization Directives
- Runtime Library Routines
- Environment variables
54Environment Variables
- OMP_SCHEDULE
- Applies only to for, parallel for directives
which have their schedule clause set to RUNTIME.
- The value of this variable determines how
iterations of the loop are scheduled on
processors. - For example
- setenv OMP_SCHEDULE "guided, 4" setenv
OMP_SCHEDULE "dynamic"
55- OMP_NUM_THREADS
- Sets the maximum number of threads to use
during execution. - For example
- setenv OMP_NUM_THREADS 8
- OMP_DYNAMIC
- Enables or disables dynamic adjustment of the
number of threads available for execution of
parallel regions. - Valid values are TRUE or FALSE.
- For example
- setenv OMP_DYNAMIC TRUE
56- OMP_NESTED
- Enables or disables nested parallelism.
- Valid values are TRUE or FALSE.
- For example
- setenv OMP_NESTED TRUE
57OpenMP
- OpenMP Overview
- Goals
- OpenMP constructs
- Parallel Regions
- Work Sharing Directives
- Data Scope Attribute Clauses
- Synchronization Directives
- Runtime Library Routines
- Environment Variables
- Major Errors
- Future of OpenMP 3.o
58Major Errors
- Race Conditions
- The outcome of a program depends on detailed
timing of the thread in the team. - Deadlock
- Threads lock up waiting on a locked resource that
will never become free.
59Race Conditions
- pragma omp parallel sections
-
- pragma omp section
- BAC
- pragma omp section
- CBA
-
60DeadLock
- omp_init_lock(lcka)
- omp_init_lock(lckb)
- pragma omp parallel sections
- b)
- pragma omp section
- omp_init_lock(lcka)
- omp_init_lock(lckb)
- use a and b()
- omp_destroy_lock(lckb)
- omp_destroy_lock(lcka)
- pragma omp section
- omp_init_lock(lckb)
- omp_init_lock(lcka)
- use b and a()
- omp_destroy_lock(lcka)
- omp_destroy_lock(lckb)
-
61OpenMP
- OpenMP Overview
- Goals
- OpenMP constructs
- Parallel Regions
- Work Sharing Directives
- Data Scope Attribute Clauses
- Synchronization Directives
- Runtime Library Routines
- Environment Variables
- Major Errors
- Future of OpenMP 3.o
62OpenMP 3.0
- Collapsing of Multiple Parallel loops
- Automatic Data Scoping
63Collapsing
- Reduces overhead relative to nested
parallelization - Produces more mistakes
- Nested Parallelization
- pragma omp parallel for
- for(int i0iltni)
- pragma omp forc
- for(int j0jltmj)
- functions()
64Collapsing
- Collapsing
- pragma omp parallel for collapse(2)
-
- for(int i0iltni)
- for(int j0jltmj)
- functions()
-
65Automatic Data Scoping
- Create a standard way to ask the compiler to
figure out data scoping automatically. - pragma omp parallel for default(autoscope)
-
- for(j0jltCOUNTj)
- calculation()
-
66END