Performance Oriented MPI - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Performance Oriented MPI

Description:

Performance Oriented MPI. Jeffrey M. Squyres. Andrew Lumsdaine. NERSC/LBNL and U. Notre Dame ... MPI includes many performance-oriented features ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 39
Provided by: andre499
Learn more at: https://www.nersc.gov
Category:

less

Transcript and Presenter's Notes

Title: Performance Oriented MPI


1
Performance Oriented MPI
  • Jeffrey M. Squyres
  • Andrew Lumsdaine
  • NERSC/LBNL and U. Notre Dame

2
Overview
  • Overview and History of MPI
  • Performance Oriented Point to Point
  • Collectives, Data Types
  • Diagnostics and Tuning
  • Rules of Thumb and Gotchas

3
Scope of This Talk
  • Beginning to intermediate user
  • General principles and rules of thumb
  • When and where performance might be available
  • Omit (advanced) low-level issues

4
Overview and History of MPI
  • Library (not language) specification
  • Goals
  • Portability
  • Efficiency
  • Functionality (small and large)
  • Safety (communicators)
  • Conservative (current best practices)

5
Performance in MPI
  • MPI includes many performance-oriented features
  • These features are only potentially
    high-performance
  • The standard seeks not to preclude performance,
    it does not mandate it
  • Progress might only be made during MPI function
    calls

6
(Potential) Performance Features
  • Non-blocking operations
  • Persistent operations
  • Collective operations
  • MPI Datatypes

7
Basic Point to Point
  • Six function MPI includes
  • MPI_Send()
  • MPI_Recv()
  • These are useful, but there is more

8
Basic Point to Point
  • MPI_Comm_rank(MPI_COMM_WORLD, rank)
  • if (rank 0)
  • MPI_Send(work, 1, MPI_INT, dest, TAG,
    MPI_COMM_WORLD)
  • else
  • MPI_Recv(result, 1, MPI_INT, src, TAG,
    MPI_COMM_WORLD, status)

9
Non-Blocking Operations
  • MPI_Isend()
  • MPI_Irecv()
  • I is for immediate
  • Paired with MPI_Test()/MPI_Wait()

10
Non-Blocking Operations
  • MPI_Comm_rank(comm,rank)
  • if (rank 0)
  • MPI_Isend(sendbuf,count,MPI_REAL,1,tag,comm,r
    equest)
  • / Do some computation /
  • MPI_Wait(request,status)
  • else
  • MPI_Irecv(recvbuf,count,MPI_REAL,0,tag,comm,r
    equest)
  • / Do some computation /
  • MPI_Wait(request,status)

11
Persistent Operations
  • MPI_Send_Init()
  • MPI_Recv_init()
  • Creates a request but does not start it
  • MPI_Start() begins the communication
  • A single request can be re-used with multiple
    calls to MPI_Start()

12
Persistent Operations
  • MPI_Comm_rank(comm, rank)
  • if (rank 0)
  • MPI_Send_init(sndbuf, count, MPI_REAL, 1, tag,
    comm, request)
  • else
  • MPI_Recv_init(rcvbuf, count, MPI_REAL, 0, tag,
    comm, request)
  • / /
  • for (i 0 i lt n i)
  • MPI_Start(request)
  • / Do some work /
  • MPI_Wait(request, status)

13
Collective Operations
  • May be layered on point to point
  • May use tree communication patterns for
    efficiency
  • Synchronization! (No non-blocking collectives)

14
Collective Operations
  • MPI_Reduce(mypi, pi, 1, MPI_DOUBLE, MPI_SUM,
    0, comm)

O(P)
O(log P)
15
MPI Datatypes
  • May allow MPI to send a message directly from
    memory
  • May avoid copying/packing
  • (General) high performance implementations not
    widely available

network
copy
16
Quiz MPI_Send()
  • After I call MPI_Send()
  • The recipient has received the message
  • I have sent the message
  • I can write to the message buffer without
    corrupting the message
  • I can write to the message buffer

17
Sidenote MPI_Ssend()
  • MPI_Ssend() has the (perhaps) expected semantics
  • When MPI_Ssend() returns, the recipient has
    received the message
  • Useful for debugging (replace MPI_Send() with
    MPI_Ssend())

18
Quiz MPI_Isend()
  • After I call MPI_Isend()
  • The recipient has started to receive the message
  • I have started to send the message
  • I can write to the message buffer without
    corrupting the message
  • None of the above (I must call MPI_Test() or
    MPI_Wait())

19
Quiz MPI_Isend()
  • True or False
  • I can overlap communication and computation by
    putting some computation between MPI_Isend() and
    MPI_Test()/MPI_Wait()
  • False (in many/most cases)

20
Communication is Still Computation
  • A CPU, usually the main one, must do the
    communication work
  • Part of your process (inside MPI calls)
  • Another process on main CPU
  • Another thread on main CPU
  • Another processor

21
No Free Lunch
  • Part of your process (most common)
  • Fast but no overlap
  • Another process (daemons)
  • Overlap, but slow (extra copies)
  • Another thread (rare)
  • Overlap and fast, but difficult
  • Another processor (emerging)
  • Overlap and fast, but more hardware
  • E.g., Myri/gm, VIA

22
How Do I Get Performance?
  • Minimize time spent communicating
  • Minimize data copies
  • Minimize synchronization
  • I.e., time waiting for communication

23
Minimizing Communication Time
  • Bandwidth
  • Latency

24
Minimizing Latency
  • Collect small messages together (if you can)
  • One 1024-byte message instead of 1024 one-byte
    messages
  • Minimize other overhead (e.g., copying)
  • Overlap with computation (if you can)

25
Example Domain Decomposition
26
Naïve Approach
  • while (!done)
  • exchange(D, neighbors, myrank)
  • dored(D)
  • exchange(D, neighbors, myrank)
  • doblack(D)
  • void exchange(Array D, int neighbors, int
    myrank)
  • for (i 0 i lt 4 i)
  • MPI_send()
  • for (i 0 i lt 4 i)
  • MPI_recv()

27
Naïve Approach
  • Deadlock! (Maybe)
  • Can fix with careful coordination of receiving
    versus sending on alternate processes
  • But this can still serialize

28
MPI_Sendrecv()
  • while (!done)
  • exchange(D, neighbors, myrank)
  • dored(D)
  • exchange(D, neighbors, myrank)
  • doblack(D)
  • void exchange(Array D, int neighbors, int
    myrank)
  • for (i 0 i lt 4 i)
  • MPI_Sendrecv()

29
Immediate Operations
  • while (!done)
  • exchange(D, neighbors, myrank)
  • dored(D)
  • exchange(D, neighbors, myrank)
  • doblack(D)
  • void exchange(Array D, int neighbors, int
    myrank)
  • for (i 0 i lt 4 i)
  • MPI_Isend()
  • MPI_Irecv()
  • MPI_Waitall()

30
Receive Before Sending
  • while (!done)
  • exchange(D, neighbors, myrank)
  • dored(D)
  • exchange(D, neighbors, myrank)
  • doblack(D)
  • void exchange(Array D, int neighbors, int
    myrank)
  • for (i 0 i lt 4 i)
  • MPI_Irecv()
  • for (i 0 i lt 4 i)
  • MPI_Isend()
  • MPI_Waitall()

31
Persistent Operations
  • for (i 0 i lt 4 i)
  • MPI_Recv_init()
  • MPI_Send_init()
  • while (!done)
  • exchange(D, neighbors, myrank)
  • dored(D)
  • exchange(D, neighbors, myrank)
  • doblack(D)
  • void exchange(Array D, int neighbors, int
    myrank)
  • MPI_Startall()
  • MPI_Waitall()

32
Overlapping
  • while (!done)
  • MPI_Startall() / Start exchanges /
  • do_inner_red(D) / Internal computation
    /
  • for (i 0 i lt 4 i)
  • MPI_Wait_any() / As information
    arrives /
  • do_received_red(D) / Process /
  • MPI_Startall()
  • do_inner_black(D)
  • for (i 0 i lt 4 i)
  • MPI_Wait_any()
  • do_received_black(D)

33
Advanced Overlap
  • MPI_Startall() / Start all receives
    /
  • / /
  • while (!done)
  • MPI_Startall() / Start sends /
  • do_inner_red(D) / Internal
    computation /
  • for (i 0 i lt 4 i)
  • MPI_Wait_any() / Wait on receives
    /
  • if (received)
  • do_received_red(D) / Process /
  • MPI_Start() / Restart receive /
  • / Repeat for black /

34
MPI Data Types
  • MPI_Type_vector
  • MPI_Type_struct
  • Etc.
  • MPI_Pack might be better

network
copy
35
Minimizing Synchronization
  • At synchronization point (e.g., with collective
    communication) all processes must arrive at
    collective call
  • Can spend lots of time waiting
  • This is often an algorithmic issue
  • E.g., check for convergence every 5 iterations
    instead of every iteration

36
Gotchas
  • MPI_Probe
  • Guarantees extra memory copy
  • MPI_Any_source
  • Can cause additional (internal) looping
  • MPI_All_to_all
  • All pairs must communicate
  • Synchronization (avoid in general)

37
Diagnostic Tools
  • Totalview
  • Prism
  • Upshot
  • XMPI

38
Summary
  • Receive before sending
  • Collect small messages together
  • Overlap (if possible)
  • Use immediate operations
  • Use persistent operations
  • Use diagnostic tools
Write a Comment
User Comments (0)
About PowerShow.com