Study of OpenMP with MPI for IFS - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Study of OpenMP with MPI for IFS

Description:

John Hague (IBM UK) & Deborah Salmond (ECMWF) IFS. IFS (Integrated Forecast System) contains : ... Forecast Days/Day for T511. OpenMP MPI. MPI. only. 1024 ... – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 27
Provided by: deborah172
Category:
Tags: ifs | mpi | openmp | day | forecast | study | uk | weather

less

Transcript and Presenter's Notes

Title: Study of OpenMP with MPI for IFS


1
Study of OpenMP with MPI for IFS ECMWFs
production weather model on IBM Nighthawk2
John Hague (IBM UK) Deborah Salmond (ECMWF)
2
IFS
  • IFS (Integrated Forecast System) contains
  • Global Weather Forecast model
  • 4D-Var data assimilation
  • Wave model
  • Ocean model
  • which are run operationally by ECMWF
  • IFS has been parallelised to run on distributed
    memory systems using MPI
  • OpenMP directives have been introduced at high
    level to enhance the possibility for
    parallelisation on shared memory systems

3
Forecast Days/Day for T511
4
OpenMP MPI
1024
512
MPI only
256
0 400 800
1200 1600 2000
Processors
5
IFS MPI OpenMP
  • IFS on IBM NightHawk2
  • With MPI get good speedups with 100s of MPI
    tasks
  • With OpenMP MPI get improved speedups for
    more than 500 processors using a few OpenMP
    threads

6
IFS MPI OpenMP
!OMP PARALLEL DO J1,NGPTOT, NPROMA CALL
CPG ENDDO
IFS Timestep
transpose FTINV transpose LTINV
DYNAMICS RADIATION SL-TRAJ slcomm SL-INTERP PHYSIC
S
SPECTRAL CALCS
BUFFER COPY MPI
LTDIR transpose FTDIR transpose
Grid-Point space
Spectral Space
7
IFS Details of MPI and OpenMP run
  • MPI Variables
  • MP_SHARED_MEMORYyes
  • MP_WAIT_MODEpoll
  • MP_EUILIBus
  • OpenMP Variables
  • XLSMPOPTSparthrds2stack50000000
  • spins500000yields
    50000

8
IFS MPI OpenMP
  • Large numbers of threads do not give good
    speedups
  • Typically 2 to 4 threads give best performance
  • Partly due to less than 100 in parallel regions
  • Speedup in parallel regions is less than
    expected

9
IFS Parallel regions ( T159 )
24 Timesteps
PHYSICS SL-INTERP
SL-TRAJ DYNAMICS
SPECTRAL
10
MPI speedup v OpenMP speedup for IFS parallel
regions (T159)
SL-TRAJ SL-INTERP
DYNAMICS
Total time
SLCOMM
11
Factors which could affect OpenMP speedup
  • Thread Dispatching Overhead
  • Loop startup
  • Memory Bandwidth Limitations
  • Caused by stores/loads missing L2 and competing
    for memory access
  • L2 cache Differences
  • Caused by master threads having more data in L2
    than other threads
  • Cache interference
  • Caused by different threads storing to same
    128Byte cache line
  • Load Imbalance

12
1)
13
(No Transcript)
14
Thread Startup times
15
IFS T159 with 8 MPI and OpenMP
Speedups of IFS parallel regions with 1 and 2
threads
DYNAMICS
Speedup
SLCOMM
msec/call
16
2)
17
s(i)zeros(i) s(i)zero
Store
Total GB/s
Total GB/s
Total number of processors
Total number of processors
18
ssa(i)b(i)c(i)d(i) ssa(i)
Load
Total GB/s
Total GB/s
Total number of processors
Total number of processors
19
IFS T159 with 8 MPI and OpenMP
  • Speedups of IFS parallel regions with 1 and 2
    threads
  • compared with L2 misses from hardware performance
    monitor
  • effect of Memory bandwith

20
3) L2 Cache Differences
  • Thread imbalance
  • Example
  • Master thread stores array
  • Parallel loop loads array
  • L2 misses are different for different threads

21
IFS T159 with 8 MPI and OpenMP
Speedups of IFS parallel regions with 1 and 2
threads compared with difference in L2 misses
22
4)
23
(No Transcript)
24
5)
25
(No Transcript)
26
Conclusions
NO YES YES Not much YES
Write a Comment
User Comments (0)
About PowerShow.com