Title: Parallel Programming Models
1Parallel Programming Models
2Parallel Programming Models
- Historically, programming models were designed
for a given class of architectures - vector computers and vector code
- SIMD computers and array operations
- distributed memory computers and message passing
- shared memory and threads
3Parallel Programming Models
- Idea is to make it easy for programmer to get
performance on that architecture - Include primitives that are natural for the
architecture - Programmer should consider how to best use the
primitives in code
4Parallel Programming Models
- But in order to execute code on new architecture
it must be rewritten - Some programming models were written for just one
vendors machines! - Fortunately, industry standards are now widely
available - Pthreads, OpenMP
- MPI
Shared memory
5Parallel Programming Models
- Standards developed for programming a class of
machines - Some adapted for more than one class of machines
- MPI has been most successful in this respect
- OpenMP now also on different kinds of platforms
6Threads
- Process has its own address space
- A process may be executed by a team of threads
- A thread shares its address space with other
threads in same team - But thread stack provides space for data local
(private) to thread - Threads used for shared memory parallel
programming and multitasking
7Threads
- One thread per processor for shared memory
parallel programming - One thread per task for time slicing (possibly on
single processor)
This is our focus
8How Can We Exploit Threads?
- A thread programming model must provide (at
least) the means to - Create and destroy threads
- Distribute the computation among threads
- Coordinate actions of threads on shared data
- Name threads
- (usually) specify which data is shared and which
is private to a thread
9Parallel Programming Models
- Low-level parallel programming
- Programmer must describe parallelism explicitly
- Data and computation to be performed on each
processor specified exactly by coder - For shared memory, create threads and specify all
details of their work and their interactions
10Parallel Programming Models
- High-level parallel programming
- Programmer describes parallelism implicitly
- Details of data and computation to be performed
on each processor determined by compiler - Compiler creates threads and determines details
of their work and interactions
11Parallel Programming Models
- Low-level parallel programming models
- pthreads for shared memory, MPI for distributed
memory - High-level parallel programming
- OpenMP for shared memory, HPF for distributed
memory
12How Does OpenMP Enable Us to Exploit Threads?
- OpenMP provides thread programming model at a
high level. - The user does not need to specify all the details
- Especially with respect to the assignment of work
to threads - Creation of threads
- User makes strategic decisions
- Compiler figures out details
13Automatic Parallelization
- Many compilers now have an automatic
parallelization option for shared memory
platforms - Idea is that compiler detects dependences,
constructs parallel threads that respect them - Works well on very simple programs
- But very hard to do on real programs
- Dynamic improvement (run-time compiling) may help
14Memory Models
- Main difference in modern SMP architectures
- Uniform memory access (UMA) same cost of access
from any processor - realized in physically (true) shared memory
- Non-uniform memory access (NUMA) different cost
of access from different processors - true for physically distributed memory, including
distributed shared memory also for some SMPs
15Models of Memory Management
- Symmetric shared memory memory is shared, same
cost of access from any processor/core - Non-symmetric shared memory memory is shared,
different cost of access from different
processors/cores - Distributed shared memory memory is shared,
different cost of access from different
processors - Distributed memory memory is distributed,
different cost of access from different processors
16Shared Memory
- This means that a variable x, a pointer p, or an
array a refer to the same object, no matter
what processor the reference originates from - Each processor can access a variable in the same
amount of time as any other - Actually, this second statement is not true on
some major platforms today, even for some with
just a few processors - We will later discuss programming techniques that
take access time differences into account
17Shared Memory
-
- All threads access same data space
Shared memory space
a
proc1
proc2
proc3
procN
18More Realistic View of Shared Memory Architecture
Shared memory
a
cache1
cache2
cache3
cacheN
proc1
proc2
proc3
procN
a
19Cache in Shared Memory
- Copies of shared data are held in local cache
- Or even in registers
- Without extra effort, there may be
inconsistencies - Thread 1 updates variable a
- Thread 2 needs to use it
- If thread 1 has not written a back to main
memory, thread 2 will use a stale value - This is the memory consistency problem
20Distributed Memory
- It is no longer the case that a variable a, a
pointer p, or an array a refer to the same
location, independent of the processor a process
executes on - It can be slow for a process on one processor to
access data stored in memory associated with a
different processor
21Distributed Memory
a
a
a
a
proc1
proc2
proc3
procN
network
22Distributed Shared Memory
- A variable a, a pointer p, or an array a refers
to the same location, independent of the
processor a process executes on - It can be slow for a process on one processor to
access data stored in memory associated with a
different processor
23Distributed Shared Memory
mem2
mem3
memN
mem1
cache2
cache1
cache4
cache3
proc1
proc2
proc3
procN
24Important Note
- Software Distributed Shared Memory can provide
the illusion of shared memory on a distributed
memory machine - No matter what the implementation, it
conceptually looks like shared memory - There may be some very large performance
differences
25Programming vs. Hardware
- One can implement a shared memory programming
model - on shared or distributed memory hardware
- (in software or in hardware)
- One can implement a message passing programming
model - on shared or distributed memory hardware
- There may be large performance differences
26Portability of programming models
shared memory programming
distributed memory programming
distr. memory machine
shared memory machine
27Programming Models
- We look at several programming models
- stick to standards
- high-level, implicit parallel programming
- low-level, explicit parallel programming
- Goal understand the model
- Get experience in its usage
28Summary
- Different kinds of architectural parallelism and
memory organization have led to different
programming models - We stick to standards for modern architectures
- There are several ways to find parallelism in a
code - Programmer has to decide which way is best for a
program - We discuss this soon, but first we get started
with the API