Introduction to OpenMP - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to OpenMP

Description:

Use library routines/compiler directives with an existing sequential language ... is efficient, but it is easy to stomp on memory and create race conditions. ... – PowerPoint PPT presentation

Number of Views:168
Avg rating:3.0/5.0
Slides: 23
Provided by: FCS68
Category:

less

Transcript and Presenter's Notes

Title: Introduction to OpenMP


1
Introduction to OpenMP
  • Eric Aubanel
  • Advanced Computational Research Laboratory
  • Faculty of Computer Science, UNB
  • Fredericton, New Brunswick

2
Shared Memory
3
Shared Memory Multiprocessor
4
Distributed vs. DSM
Network - Global address space
Memory
Memory
Memory
Processes
Processes
Processes
5
Parallel Programming Alternatives
  • Use a new programming language
  • Use a existing sequential language modified to
    handle parallelism
  • Use a parallelizing compiler
  • Use library routines/compiler directives with an
    existing sequential language
  • Shared memory (OpenMP) vs. distributed memory
    (MPI)

6
What is Shared Memory Parallelization?
  • All processors can access all the memory in the
    parallel system (one address space).
  • The time to access the memory may not be equal
    for all processors
  • not necessarily a flat memory
  • Parallelizing on a SMP does not reduce CPU time
  • it reduces wallclock time
  • Parallel execution is achieved by generating
    multiple threads which execute in parallel
  • Number of threads (in principle) is independent
    of the number of processors

7
Threads The Basis of SMP Parallelization
  • Threads are not full UNIX processes. They are
    lightweight, independent "collections of
    instructions" that execute within a UNIX process.
  • All threads created by the same process share the
    same address space.
  • a blessing and a curse "inter-thread"
    communication is efficient, but it is easy to
    stomp on memory and create race conditions.
  • Because they are lightweight, they are
    (relatively) inexpensive to create and destroy.
  • Creation of a thread can take three orders of
    magnitude less time than process creation!
  • Threads can be created and assigned to multiple
    processors This is the basis of SMP parallelism!

8
Processes vs. Threads
code
heap
IP
Process
stack
code
stack
heap
IP
Threads
stack
IP
9
Methods of SMP Parallelism
  • 1. Explicit use of threads
  • Pthreads see "Pthreads Programming" from
    O'Reilly Associates, Inc.
  • 2. Using a parallelizing compiler and its
    directives, you can generate pthreads "under the
    covers."
  • can use vendor-specific directives (e.g. !SMP)
  • can use industry-standard directives (e.g. !OMP
    and OpenMP)

10
OpenMP
  • 1997 group of hardware and software vendors
    announced their support for OpenMP, a new API for
    multi-platform shared-memory programming (SMP) on
    UNIX and Microsoft Windows NT platforms.
  • www.openmp.org
  • OpenMP provides comment-line directives, embedded
    in C/C or Fortran source code, for
  • scoping data
  • specifying work load
  • synchronization of threads
  • OpenMP provides function calls for obtaining
    information about threads.
  • e.g., omp_num_threads(), omp_get_thread_num()

11
OpenMP example
Subroutine saxpy(z, a, x, y, n) integer i, n real
z(n), a, x(n), y !omp parallel do do i 1, n
z(i) a x(i) y end do return end
12
OpenMP Threads
  • 1.All OpenMP programs begin as a single process
    the master thread
  • 2.FORK the master thread then creates a
    team of parallel threads
  • 3.Parallel region statements executed
    in parallel among
    the various team threads
  • 4.JOIN threads
    synchronize and terminate, leaving only the
    master thread

13
Private vs Shared Variables
Global shared memory
All data references to global shared memory
Serial execution
z
a
x
y
n
Global shared memory
References to z, a, x, y, n are to global shared
memory
Parallel execution
Each thread has a private copy of i
References to i are to the private copy
14
Division of Work
n 40, 4 threads
Global shared memory
Subroutine saxpy(z, a, x, y, n) integer i, n real
z(n), a, x(n), y !omp parallel do do i 1, n
z(i) a x(i) y end do return end
local memory
i 11, 20
i 21, 30
i 31, 40
i 1, 10
15
Variable Scoping
  • The most difficult part of shared memory
    parallelization.
  • What memory is shared
  • What memory is private (i.e. each processor has
    its own copy)
  • How private memory is treated vis à vis the
    global address space.
  • Variables are shared by default, except for loop
    index in parallel do
  • This must mesh with the Fortran view of memory
  • Global shared by all routines
  • Local local to a given routine
  • saved vs. non-saved variables (through the SAVE
    statement or -save option)

16
Static vs. Automatic Variables
  • Fortran 77 standard allows subprogram local
    variables to become undefined between calls,
    unless saved with a SAVE statement
  • STATIC AUTOMATIC
  • AIX (default) -qnosave
  • IRIX -static -automatic (default)
  • SunOS (default) -statckvar

17
OpenMP Directives in Fortran
  • Line continuation
  • Fixed form
  • !OMP PARALLEL DO
  • !OMPPRIVATE (JMAX)
  • !OMPSHARED(A, B)
  • Free form
  • !OMP PARALLEL DO
  • !OMP PRIVATE (JMAX)
  • !OMP SHARED(A, B)

18
OpenMP in C
  • Same functionality as OpenMP for FORTRAN
  • Differences in syntax
  • pragma omp for
  • Differences in variable scoping
  • variables "visible" when pragma omp parallel
    encountered are shared by default
  • static variables declared within a parallel
    region are also shared
  • heap allocated memory (malloc) is shared (but
    pointer can be private)
  • automatic storage declared within a parallel
    region is private (ie, on the stack)

19
OpenMP Overhead
  • Overhead for parallelization is large (eg. 8000
    cycles for parallel do over 16 processors of SGI
    Origin 2000)
  • size of parallel work construct must be
    significant enough to overcome overhead
  • rule of thumb it takes 10 kFLOPS to amortize
    overhead

20
OpenMP Use
  • How is OpenMP typically used?
  • OpenMP is usually used to parallelize loops
  • Find your most time consuming loops.
  • Split them up between threads.
  • Better scaling can be obtained using OpenMP
    parallel regions, but can be tricky!

21
OpenMP vs. MPI
  • Only for shared memory computers
  • Easy to incrementally parallelize
  • More difficult to write highly scalable programs
  • Small API based on compiler directives and
    limited library routines
  • Same program can be used for sequential and
    parallel execution
  • Shared vs private variables can cause confusion
  • Portable to all platforms
  • Parallelize all or nothing
  • Vast collection of library routines
  • Possible but difficult to use same program for
    serial and parallel execution
  • variables are local to each processor

22
References
  • Parallel Programming in OpenMP, by Chandra et al.
    (Morgan Kauffman)
  • www.openmp.org
  • Multimedia tutorial at Boston University
  • scv.bu.edu/SCV/Tutorials/OpenMP/
  • Lawrence Livemore online tutorial
  • www.llnl.gov/computing/training/
  • European workshop on OpenMP (EWOMP)
  • www.epcc.ed.ac.uk/ewomp2000/
Write a Comment
User Comments (0)
About PowerShow.com