Parallel Architecture Models - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

Parallel Architecture Models

Description:

Dual/Quad Pentium, Cray T90, IBM Power3 Node. Distributed Memory. Cray T3E, IBM SP2, Network of Workstations. Distributed-Shared Memory ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 18

Provided by: vishsubr

Category:

more less

Transcript and Presenter's Notes

Title: Parallel Architecture Models

1
Parallel Architecture Models

Shared Memory
Dual/Quad Pentium, Cray T90, IBM Power3 Node
Distributed Memory
Cray T3E, IBM SP2, Network of Workstations
Distributed-Shared Memory
SGI Origin 2000, Convex Exemplar

2
Shared Memory Systems (SMP)
- Any processor can access any memory location at
equal cost (Symmetric Multi-Processor) - Tasks
communicate by writing/reading common
locations - Easier to program - Cannot scale
beyond around 30 PE's (bus bottleneck) - Most
workstation vendors make SMP's today (SGI, Sun,
HP Digital Pentium) -Cray Y-MP, C90, T90
(cross-bar between PE's and memory)
3
Cache Coherence in SMPs
- Each procs cache holds most recently accessed
values - If multiply cached word is modified,
need to make all copies consistent - Bus-based
SMPs use an efficient mechanism snoopy bus -
Snoopy bus monitors all writes marks other
copies invalid - When proc finds invalid cache
word, fetches copy from SM
4
Distributed Memory Systems
M Memory c Cache P Processor NIC Network
Interface Card
Interconnection Network
- Each processor can only access its own memory -
Explicit communication by sending and receiving
messages - More tedious to program - Can scale to
hundreds/thousands of processors - Cache
coherence is not needed - Examples IBM SP-2,
Cray T3E, Workstation Clusters
5
Distributed Shared Memory
- Each processor can directly access any memory
location - Physically distributed memory many
simultaneous accesses - Non-uniform memory access
costs - Examples Convex Exemplar, SGI Origin
2000 - Complex hardware and high cost for cache
coherence - Software DSM systems (e.g.
Treadmarks) implement shared memory abstraction
on top of Distributed Memory Systems
6
Parallel Programming Models

Shared-Address Space Models
BSP (Bulk Synchronous Parallel model)
HPF (High Performance Fortran)
OpenMP
Message Passing
Partitioned address space PVM, MPI Ch.8,
I.Fosters book Designing and Building Parallel
Programs (available online)
Higher Level Programming Environments
PETSc Portable Extensible Toolkit for Scientific
computation
POOMA Parallel Object-Oriented Methods and
Applications

7
OpenMP

Standard sequential Fortran/C model
Single global view of data
Automatic parallelization by compiler
User can provide loop-level directives
Easy to program
Only available on Shared-Memory Machines

8
High Performance Fortran

Global shared address space, similar to
sequential programming model
User provides data mapping directives
User can provide information on loop-level
parallelism
Portable available on all three types of
architectures
Compiler automatically synthesizes
message-passing code if needed
Restricted to dense arrays and regular
distributions
Performance is not consistently good

9
Message Passing

Program is a collection of tasks
Each task can only read/write its own data
Tasks communicate data by explicitly
sending/receiving messages
Need to translate from global shared view to
local partitioned view in porting a sequential
program
Tedious to program/debug
Very good performance

10
Illustrative Example
Real a(n,n),b(n,n) Do k 1,NumIter Do i
2,n-1 Do j 2,n-1 a(i,j)(b(i-1,j)b(i,j-1)
b(i1,j)b(i,j1))/4 End
Do End Do Do i 2,n-1 Do j 2,n-1
b(i,j) a(i,j) End Do End Do End Do
a(20,20)
b(20,20)
11
Example OpenMP
Real a(n,n),b(n,n) comp parallel shared(a,b,k)
private(i,j) Do k 1,NumIter comp do Do i
2,n-1 Do j 2,n-1 a(i,j)(b(i-1,j)b(i,j-1)
b(i1,j)b(i,j1))/4 End Do
End Do comp do Do i 2,n-1 Do j 2,n-1
b(i,j) a(i,j) End Do End Do End Do
Global shared view of data
a(20,20)
b(20,20)
12
Example HPF (1D partition)
Real a(n,n),b(n,n) chpf Distribute a(block,),
b(block,) Do k 1,NumIter chpf independent,
new(i) Do i 2,n-1 Do j 2,n-1
a(i,j)(b(i-1,j)b(i,j-1)
b(i1,j)b(i,j1))/4 End Do End Do chpf
independent , new(i) Do i 2,n-1 Do j
2,n-1 b(i,j) a(i,j) End Do End Do End Do
Global shared view of data
P0
P1
P2
P3
a(20,20)
b(20,20)
13
Example HPF (2D partition)
Real a(n,n),b(n,n) chpf Distribute
a(block,block) chpf Distribute b(block,block) Do
k 1,NumIter chpf independent, new(i) Do i
2,n-1 Do j 2,n-1 a(i,j)(b(i-1,j)b(i,j-1)
b(i1,j)b(i,j1))/4 End Do
End Do chpf independent , new(i) Do i 2,n-1
Do j 2,n-1 b(i,j) a(i,j) End Do End
Do End Do
Global shared view of data
a(20,20)
b(20,20)
14
Message Passing Local View
communication required
bl(5,20)
Global shared view
Local partitioned view
15
Example Message Passing
Real al(NdivP,n),bl(0NdivP1,n) me
get_my_procnum() Do k 1,NumIter if
(meP-1) send(me1,bl(NdivP,1n)) if (me0)
recv(me-1,bl(0,1n)) if (me0)
send(me-1,bl(1,1n)) if (meP-1)
recv(me1,bl(NdivP1,1n)) if (me0) then i12
else i11 if (meP-1) then i2NdivP-1 else
i2NdivP Do i i1,i2 Do j 2,n-1
a(i,j)(b(i-1,j)b(i,j-1)
b(i1,j)b(i,j1))/4 End Do End Do
...
al(5,20)
ghost cells are communicated by message-passing
Local partitioned view with ghost cells
16
Comparison of Models

Program Porting/Development Effort
OpenMP HPF ltlt MPI
Portability across systems
HPF MPI gtgt OpenMP (only shared-memory)
Applicability
MPI OpenMP gtgt HPF (only dense arrays)
Performance
MPI gt OpenMP gtgt HPF

17
PETSc

Higher level parallel programming model
Aims to provide both ease of use and high
performance for numerical PDE solution
Uses efficient message-passing implementation
underneath but
Provides global view of data arrays
System takes care of needed message-passing
Portable across shared distributed memory
systems

Write a Comment

User Comments (0)