Study of OpenMP with MPI for IFS

About This Presentation

Title:

Study of OpenMP with MPI for IFS

Description:

John Hague (IBM UK) & Deborah Salmond (ECMWF) IFS. IFS (Integrated Forecast System) contains : ... Forecast Days/Day for T511. OpenMP MPI. MPI. only. 1024 ... – PowerPoint PPT presentation

Number of Views:130

Avg rating:3.0/5.0

Slides: 27

Provided by: deborah172

Category:

more less

Transcript and Presenter's Notes

Title: Study of OpenMP with MPI for IFS

1
Study of OpenMP with MPI for IFS ECMWFs
production weather model on IBM Nighthawk2
John Hague (IBM UK) Deborah Salmond (ECMWF)
2
IFS

IFS (Integrated Forecast System) contains
Global Weather Forecast model
4D-Var data assimilation
Wave model
Ocean model
which are run operationally by ECMWF
IFS has been parallelised to run on distributed
memory systems using MPI
OpenMP directives have been introduced at high
level to enhance the possibility for
parallelisation on shared memory systems

3
Forecast Days/Day for T511
4
OpenMP MPI
1024
512
MPI only
256
0 400 800
1200 1600 2000
Processors
5
IFS MPI OpenMP

IFS on IBM NightHawk2
With MPI get good speedups with 100s of MPI
tasks
With OpenMP MPI get improved speedups for
more than 500 processors using a few OpenMP
threads

6
IFS MPI OpenMP
!OMP PARALLEL DO J1,NGPTOT, NPROMA CALL
CPG ENDDO
IFS Timestep
transpose FTINV transpose LTINV
DYNAMICS RADIATION SL-TRAJ slcomm SL-INTERP PHYSIC
S
SPECTRAL CALCS
BUFFER COPY MPI
LTDIR transpose FTDIR transpose
Grid-Point space
Spectral Space
7
IFS Details of MPI and OpenMP run

MPI Variables
MP_SHARED_MEMORYyes
MP_WAIT_MODEpoll
MP_EUILIBus

OpenMP Variables
XLSMPOPTSparthrds2stack50000000
spins500000yields
50000

8
IFS MPI OpenMP

Large numbers of threads do not give good
speedups
Typically 2 to 4 threads give best performance
Partly due to less than 100 in parallel regions
Speedup in parallel regions is less than
expected

9
IFS Parallel regions ( T159 )
24 Timesteps
PHYSICS SL-INTERP
SL-TRAJ DYNAMICS
SPECTRAL
10
MPI speedup v OpenMP speedup for IFS parallel
regions (T159)
SL-TRAJ SL-INTERP
DYNAMICS
Total time
SLCOMM
11
Factors which could affect OpenMP speedup

Thread Dispatching Overhead
Loop startup
Memory Bandwidth Limitations
Caused by stores/loads missing L2 and competing
for memory access
L2 cache Differences
Caused by master threads having more data in L2
than other threads
Cache interference
Caused by different threads storing to same
128Byte cache line
Load Imbalance

12
1)
13
(No Transcript)
14
Thread Startup times
15
IFS T159 with 8 MPI and OpenMP
Speedups of IFS parallel regions with 1 and 2
threads
DYNAMICS
Speedup
SLCOMM
msec/call
16
2)
17
s(i)zeros(i) s(i)zero
Store
Total GB/s
Total GB/s
Total number of processors
Total number of processors
18
ssa(i)b(i)c(i)d(i) ssa(i)
Load
Total GB/s
Total GB/s
Total number of processors
Total number of processors
19
IFS T159 with 8 MPI and OpenMP

Speedups of IFS parallel regions with 1 and 2
threads
compared with L2 misses from hardware performance
monitor
effect of Memory bandwith

20
3) L2 Cache Differences

Thread imbalance
Example
Master thread stores array
Parallel loop loads array
L2 misses are different for different threads

21
IFS T159 with 8 MPI and OpenMP
Speedups of IFS parallel regions with 1 and 2
threads compared with difference in L2 misses
22
4)
23
(No Transcript)
24
5)
25
(No Transcript)
26
Conclusions
NO YES YES Not much YES

Write a Comment

User Comments (0)