Hybrid Programming with OpenMP and MPI

About This Presentation

Title:

Hybrid Programming with OpenMP and MPI

Description:

Each process spawns 4 threads, which carries out OpenMP iterations ... All threads idle except one while inter-node communication is taking place ... – PowerPoint PPT presentation

Number of Views:113

Avg rating:3.0/5.0

Slides: 19

Provided by: Kus1

Category:

more less

Transcript and Presenter's Notes

Title: Hybrid Programming with OpenMP and MPI

1
Hybrid Programming with OpenMP and MPI

Kushal Kedia
Reacting Gas Dynamics Laboratory
18.337J Final Project Presentation, May 13th
2009
kushal_at_mit.edu

2
Motivation Flame dynamics

8 processors on single shared memory, pure OpenMP
2 cycles of 200 Hz flame oscillation (0.01
seconds of real time) takes approximately 5 days!

3
MPI? OpenMP?

MPI Message Passing Interface
Cluster of Computers

OpenMP Open Multi Processing
Desktop

Symmetric Multiprocessing (SMP)
4
Modern Clusters (multiple SMPs)
5
MPI OpenMP

Pros
Portable to distributed and shared memory
machines.
Scales beyond one node
No data placement problem
Cons
Difficult to develop and debug
High latency, low bandwidth
Explicit communication
Difficult load balancing

Pros
Easy to implement parallelism
Low latency, high bandwidth
Implicit Communication
Dynamic load balancing
Cons
Only on shared memory machine
Scale within one node
Possible data placement problem
No specific thread order

6
Why Hybridization? best from both the paradigms

introducing MPI into OpenMP applications can help
scale across multiple SMP nodes
introducing OpenMP into MPI applications can help
make more efficient use of the shared memory on
SMP nodes, thus mitigating the need for explicit
intra-node communication
introducing MPI and OpenMP during the
design/coding of a new application can help
maximize efficiency, performance, and scaling

7
Problem Statement
Steady-State Heat Transfer Like Problem
8
Solution
9
Grid and parallel Decomposition
10
MPI Scenario

Each chunk of the grid goes to a separate
processor
Explicit communication calls are made after every
iteration, irrespective of the processor being on
the same SMP node or different

11
Hybrid Scenario

A single MPI process on each SMP node
Each process spawns 4 threads, which carries out
OpenMP iterations
Master thread of each SMP node communicates after
every iteration

12
Schematic Hybrid Scenario
13
Computational resources

Pharos cluster (a shared and distributed memory
architecture), used by the Reacting Gas Dynamics
Laboratory (Dept. of Mechanical Engineering, MIT)
is used for the parallel simulations.
Pharos consists of about 60 Intel Xeon
-Harpertown nodes, each consisting of dual
quad-core CPUs (8 processors) of 2.66 GHz speed.

14
Fixed grid size of 2500 x 2500
15
Constant of processors 20
16
Hybrid Program slower than pure MPI!
Seen with many problems. Cases with Hybrid
programs faster than pure MPI are few, but work
very well if suitable
17
Why? Possible arguments