CSE 160 - PowerPoint PPT Presentation

About This Presentation
Title:

CSE 160

Description:

CSE 160 Lecture 16 MPI Concepts, Topology and Synchronization So Far We have concentrated on generic message passing algorithms We have used PVM to implement ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 31
Provided by: Philip277
Learn more at: https://cseweb.ucsd.edu
Category:
Tags: cse | life

less

Transcript and Presenter's Notes

Title: CSE 160


1
CSE 160 Lecture 16
  • MPI Concepts, Topology and Synchronization

2
So Far
  • We have concentrated on generic message passing
    algorithms
  • We have used PVM to implement codes (simpler on
    workstations)
  • MPI is a defacto standard for message passing and
    introduces some important concepts

3
Message Addressing
  • Identify an endpoint
  • Use a tag to distinguish a particular message
  • pvm_send(dest, tag)
  • MPI_SEND(COMM, dest, tag, buf, len, type)
  • Receiving
  • recv(src, tag) recv(,tag) recv (src, )
    recv(,)
  • What if you want to build a library that uses
    message passing? Is (src, tag) safe in all
    instances?

4
Level O Issues
  • Basic Pt-2-Pt Message Passing is straightforward,
    but how does one
  • Make it go fast
  • Eliminate extra memory copies
  • Take advantage of specialized hardware
  • Move complex data structures (packing)
  • Receive from one-of-many (wildcarding)
  • Synchronize a group of tasks
  • Recover from errors
  • Start tasks
  • Build safe libraries
  • Monitor tasks

5
MPI-1 addresses many of the level 0 issues
  • (but not all)

6
A long history of research efforts in message
passing
  • P4
  • Chameleon
  • Parmacs
  • TCGMSG
  • CHIMP
  • NX (Intel i860, Paragon)
  • PVM
  • And these begot MPI

7
So What is MPI
  • It is a standard message passing API
  • Specifies many variants of send/recv
  • 9 send interface calls
  • Eg., synchronous send, asynchronous send, ready
    send, asynchronous ready send
  • Plus other defined APIs
  • Process topologies
  • Group operations
  • Derived Data types
  • Profiling API (standard way to instrument MPI
    code)
  • Implemented and optimized by machine vendors
  • Should you use it? Absolutely!

8
So Whats Missing in MPI-1?
  • Process control
  • How do you start 100 tasks?
  • How do you kill/signal/monitor remote tasks
  • I/O
  • Addressed in MPI-2
  • Fault-tolerance
  • One MPI process dies, the rest eventually hang
  • Interoperability
  • No standard set for running a single parallel job
    across architectures (eg. Cannot split
    computation between x86 Linux and Alpha)

9
Communication Envelope
  • (src, tag) is enough to distinguish messages
    however, there is a problem
  • How does one write safe libraries that also use
    message passing?
  • Tags used in libraries can be the same as tags
    used in calling code ? messages could get
    confused
  • Message envelope add another, system-assigned
    tag, to a message
  • This label defines a message envelope

10
Envelopes
Tag 100, Src A
Tag 100, Src A
Tag 100, Src B
Tag 100, Src B
Tag 200, Src A
Tag 200, Src A
Envelope ?
Envelope ?
Context tag messages received relative to
context
11
Communicators
  • Messages ride inside of message envelopes and
    receives can only match candidate messages
    relative to an envelope.
  • MPI implements these envelopes as communicators.
  • Communicators are more than just and envelope
    it also encompasses a group of processes
  • Communicator context group of tasks

12
MPI_COMM_WORLD
  • MPI programs are static they can never grow or
    shrink
  • There is one default communicator called
    MPI_COMM_WORLD
  • All processes are members of COMM_WORLD
  • Each process is labeled 0 N, for a communicator
    of size N1
  • Similar to a PVM group.

13
How do you build new groups?
  • All new communicators are derived from existing
    communicators
  • All communicators have MPI_COMM_WORLD as an
    ancestor
  • New communicators are formed synchronously
  • All members of a new communicator call the
    construction routine at the same time.
  • MPI_COMM_DUP(), MPI_COMM_SPLIT()

14
Communicators as groups
1
0
2
3
5
4
15
Running MPI Programs
  • The MPI-1 Standard does not specify how to run an
    MPI program, just as the Fortran standard does
    not specify how to run a Fortran program.
  • In general, starting an MPI program is dependent
    on the implementation of MPI you are using, and
    might require various scripts, program arguments,
    and/or environment variables.
  • mpiexec ltargsgt is part of MPI-2, as a
    recommendation, but not a requirement
  • You can use mpiexec for MPICH and mpirun for
    other systems

16
Finding Out About the Environment
  • Two important questions that arise early in a
    parallel program are
  • How many processes are participating in this
    computation?
  • Which one am I?
  • MPI provides functions to answer these questions
  • MPI_Comm_size reports the number of processes.
  • MPI_Comm_rank reports the rank, a number between
    0 and size-1, identifying the calling process

17
MPI Datatypes
  • The data in a message to sent or received is
    described by a triple (address, count, datatype),
    where
  • An MPI datatype is recursively defined as
  • predefined, corresponding to a data type from the
    language (e.g., MPI_INT, MPI_DOUBLE_PRECISION)
  • a contiguous array of MPI datatypes
  • a strided block of datatypes
  • an indexed array of blocks of datatypes
  • an arbitrary structure of datatypes
  • There are MPI functions to construct custom
    datatypes, such an array of (int, float) pairs,
    or a row of a matrix stored columnwise.

18
MPI Tags
  • Messages are sent with an accompanying
    user-defined integer tag, to assist the receiving
    process in identifying the message.
  • Messages can be screened at the receiving end by
    specifying a specific tag, or not screened by
    specifying MPI_ANY_TAG as the tag in a receive.
  • Some non-MPI message-passing systems have called
    tags message types. MPI calls them tags to
    avoid confusion with datatypes.

19
Why Datatypes?
  • Since all data is labeled by type, an MPI
    implementation can support communication between
    processes on machines with very different memory
    representations and lengths of elementary
    datatypes (heterogeneous communication).
  • Specifying application-oriented layout of data in
    memory
  • reduces memory-to-memory copies in the
    implementation
  • allows the use of special hardware
    (scatter/gather) when available
  • Theoretically simplifies the programmers life
  • How would you build Datatypes in PVM
    sequences of pvm_pack()

20
Introduction to Collective Operations in MPI
  • Collective operations are called by all processes
    in a communicator.
  • MPI_BCAST distributes data from one process (the
    root) to all others in a communicator.
  • How does this differ from pvm_mcast(),
    pvm_broadcast()?
  • MPI_REDUCE combines data from all processes in
    communicator and returns it to one process.
  • In many numerical algorithms, SEND/RECEIVE can be
    replaced by BCAST/REDUCE, improving both
    simplicity and efficiency.

21
Example PI in C -1
  • include "mpi.h"
  • include ltmath.hgt
  • int main(int argc, char argv)
  • int done 0, n, myid, numprocs, i, rcdouble
    PI25DT 3.141592653589793238462643double mypi,
    pi, h, sum, x, aMPI_Init(argc,argv)MPI_Comm_
    size(MPI_COMM_WORLD,numprocs)MPI_Comm_rank(MPI_
    COMM_WORLD,myid)while (!done) if (myid
    0) printf("Enter the number of intervals
    (0 quits) ") scanf("d",n)
    MPI_Bcast(n, 1, MPI_INT, 0, MPI_COMM_WORLD)
    if (n 0) break

22
Example PI in C - 2
  • h 1.0 / (double) n sum 0.0 for (i
    myid 1 i lt n i numprocs) x h
    ((double)i - 0.5) sum 4.0 / (1.0 xx)
    mypi h sum MPI_Reduce(mypi, pi, 1,
    MPI_DOUBLE, MPI_SUM, 0,
    MPI_COMM_WORLD) if (myid 0) printf("pi
    is approximately .16f, Error is .16f\n",
    pi, fabs(pi - PI25DT))MPI_Finalize()
  • return 0

23
Sources of Deadlocks
  • Send a large message from process 0 to process 1
  • If there is insufficient storage at the
    destination, the send must wait for the user to
    provide the memory space (through a receive)
  • What happens with
  • This is called unsafe because it depends on the
    availability of system buffers (pvm does
    buffering)

24
Some Solutions to the unsafe Problem
  • Order the operations more carefully
  • Use non-blocking operations

25
Extending the Message-Passing Interface
  • Dynamic Process Management
  • Dynamic process startup
  • Dynamic establishment of connections
  • One-sided communication
  • Put/get
  • Other operations
  • Parallel I/O
  • Other MPI-2 features
  • Generalized requests
  • Bindings for C/ Fortran-90 interlanguage issues

26
When to use MPI
  • Portability and Performance
  • Irregular Data Structures
  • Building Tools for Others
  • Libraries
  • Need to Manage memory on a per processor basis

27
When not to use MPI
  • Regular computation matches HPF
  • But see PETSc/HPF comparison (ICASE 97-72)
  • Solution (e.g., library) already exists
  • http//www.mcs.anl.gov/mpi/libraries.html
  • Require Fault Tolerance
  • Sockets
  • Distributed Computing
  • CORBA, DCOM, etc.

28
Synchronization
  • Messages serve two purposes
  • Moving data
  • Synchronizing processes
  • When does synchronization occur for
  • Blocking send?
  • Non-blocking send?
  • Dependencies on implementation

29
Common Synchronization Constructs
  • Barrier
  • All processes stop, No information passed
  • Broadcast
  • All processes get information from the root
  • How does this differ from a barrier
  • Ideas on how to implement an MPI_BCAST?
  • Reduction
  • Data is reduced (summed, multiplied, counted, )
  • Ideas on how to implement MPI_REDUCE?

30
Synchronization Trees
  • The topology of the messages that implement a
    reduction is important.
  • LogP lecture illustrated a tree that wasnt
    linear and wasnt binary
  • Major idea was to maintain concurrency
  • MPI has many operations as synchronizing so that
    implementers have a choice in optimization.
Write a Comment
User Comments (0)
About PowerShow.com