Standard - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Standard

Description:

Standard – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 54
Provided by: lri
Category:
Tags: jeton | standard

less

Transcript and Presenter's Notes

Title: Standard


1
Standard
  • Description
  • Performance

2
Contents
  • Introduction to MPI
  • Message passing
  • Different type of communication
  • MPI functionalities
  • MPI structures
  • Basic functions
  • Data types
  • Contexts and tags
  • Groups and communication domains
  • Communication functions
  • Point to point communications
  • Asynchronous communications
  • Global communications
  • MPI-2
  • One-sided communications
  • I/O

3
Message passing (1)
  • Problem
  • We have N nodes
  • All nodes connected by network
  • ? How to use the global computer gathering the N
    nodes ?

4
Message passing (2)
  • One answer message passing
  • Execute one process per processor
  • Exchange explicitly data between processors
  • Synchronize explicitly the different processes
  • Two types of data transfer
  • Only one process initiate the communication one
    sided
  • The two processes cooperate for the
    communication cooperative

5
Two types of data transfer
  • one sided communications
  • No Rendez-vous protocol
  • No warning about reading or writing actions
    inside local memory for a process
  • Costly synchronization
  • Functions prototypes
  • put(remote_process, data)
  • get(remote_process, data)
  • Cooperatives Communications
  • The communication involves the two processes
  • Implicit synchronization in the simple case
  • Functions prototypes
  • send(destination, data)
  • recv(source, data)

put()
get()
send()
recv()
6
MPI (Message Passing Interface)
  • Standard developed by academics and industrial
    partners
  • Objective to specify a portable message passing
    library
  • Imply an execution environment for launching and
    connecting together all the processes
  • Allow
  • Synchronous and asynchronous communications
  • Global communications
  • Separated communication domains

7
Contents
  • Introduction to MPI
  • Message passing
  • Different type of communication
  • MPI functionalities
  • MPI structures
  • Basic functions (exemple HelloWorld_MPI.c)
  • Data types
  • Contexts and tags
  • Groups and communication domains
  • Communication functions
  • Point to point communications
  • Asynchronous communications
  • Global communications
  • MPI-2
  • One-sided communications
  • I/O

8
MPI Programming Structure
  • Follows the SPMD programming model
  • All processes are launched at the same time
  • Same program for every processors
  • Can differentiate processors roles by a rank
    number

Non parallel section
Parallel section initialization
Multinode parallel section (MPI)
Parallel section termination
Remark Most implementations advise to limit this
program part to the exit call
?
9
Basic functions
  • MPI environment initialization
  • C MPI_Init(argc, char argv)
  • Fortran call MPI_Init(ierror)
  • MPI Environment termination (program are
    recommended to exit after this function call)
  • C MPI_Finalize()
  • Fortran call MPI_Finalize(ierror)
  • Getting the process rank
  • C MPI_Comm_rank(MPI_COMM_WORLD, rank)
  • Fortran call MPI_Comm_rank(MPI_COMM_WORLD,
    rank, ierror)
  • Getting the total number of processes
  • C MPI_Comm_size(MPI_COMM_WORLD, size)
  • Fortran call MPI_comm_size(MPI_COMM_WORLD,
    size, ierror)

10
HelloWorld_MPI.c
  • include ltstdio.hgt
  • include ltmpi.hgt
  • void main(int argc, char argv)
  • int rang, nprocs
  • MPI_Init(argc, argv)
  • MPI_Comm_rank(MPI_COMM_WORLD, rang)
  • MPI_Comm_size(MPI_COMM_WORLD, nprocs)
  • printf(hello, I am d (Of d processes)\n,
    rang, nprocs)
  • MPI_Finalize()

11
MPI data types
12
User data types
  • By default MPI exchanges data using vector of
    MPI data
  • It is possible to create data types to simplify
    communication operations (simplifying buffer and
    linearization operations)
  • User data types replace the obsolete MPI_PACK
    type
  • A user type consists in a sequence of basic types
    and a sequence of offsets for fitting the memory
  • creation MPI_Type_commit(type)
  • Destruction MPI_Type_free(type)

13
Contexts and tags
  • Need to distinguish different messages in
    reception
  • Context allow to distinguish between a
    point-to-point communication and a global
    communication
  • Every message is sent in a within a context, and
    must be received in the same context
  • Context is automatically managed by MPI
  • The communication tags allow to identify one
    communication among multiple ones
  • When communication are made asynchronously, this
    tags allow to sort them
  • For reception operations, we can received the
    next message by specifying the MPI_ANY_TAG
    keyword
  • Tag management is up to the MPI programmer

14
Communication domains
  • Nodes can be grouped in a communication domain
    called communicator
  • Every process as a rank number per group it is
    involved in
  • MPI_COMM_WORLD is the default communication
    domain gathering all processes and created at the
    initialization.
  • More generally, All operations can only be made
    on a single set of processes specified by their
    communicator
  • Each domain constitutes an distinct specific
    context for communications

15
Split a communicator (1/2) groups
  • To create a new domain, first you have to create
    a new group of processes
  • int MPI_Comm_group(MPI_Comm comm, MPI_Group
    group)
  • int MPI_Group_incl(MPI_Group group, int rsize,
    int ranks, MPI_Group newgroup)
  • int MPI_Group_excl(MPI_Group group, int rsize,
    int ranks, MPI_Group newgroup)
  • Set of operations on the groups
  • int MPI_Group_union(MPI_Group g1, MPI_Group g2,
    MPI_Group gr)
  • int MPI_Group_intersection(MPI_Group g1,
    MPI_Group g2, MPI_Group gr)
  • int MPI_Group_difference(MPI_Group g1, MPI_Group
    g2, MPI_Group gr)
  • Destruction of a group
  • int MPI_Group_free(MPI_Group group)

16
Split a communicator (2/2) communicators
  • Associating a communicator to a group
  • int MPI_Comm_create(MPI_Comm comm, MPI_Group
    group, MPI_Comm newcomm)
  • Dividing a domain in sub-domains
  • int MPI_Comm_split(MPI_Comm comm, int color, int
    key, MPI_Comm newcomm)
  • MPI_Comm_split is a collective operation on the
    initial communicator comm
  • Every process gives its color, Every process of
    the same color are then in the same newcomm
  • The MPI_UNDEFINED color allows for a process to
    not be part of the new communicator
  • Every process gives its key, Processes of the
    same color are ranked by these keys
  • A group is implicitly created for each new
    communicator created this way
  • Communicators destruction
  • int MPI_Comm_free(MPI_Comm comm)

17
Contents
  • Introduction to MPI
  • Message passing
  • Different type of communication
  • MPI functionalities
  • MPI structures
  • Basic functions
  • Data types
  • Contexts and tags
  • Groups and communication domains
  • Communication functions
  • Point to point communications (exemple Jeton.c)
  • Asynchronous communications
  • Global communications (exemple trace.c)
  • MPI-2
  • One-sided communications
  • I/O

18
Point-to-point communications
  • Send and receive data between a pair of processes
  • The two processes initiates the communication,
    one sends the data, the other asks for the
    reception
  • Communications are identified by tags
  • The type and the size of the data must be
    specified

19
Basic communication functions
  • Synchronous sending (between the computation
    process and the action of sending)
  • int MPI_Send(void buf, int count, MPI_Datatype
    datatype, int dest, int tag, MPI_Comm comm)
  • The tag allow unique identifying of messages
  • Synchronous data reception
  • int MPI_Recv(void buf, int count, MPI_Datatype
    datatype, int source, int tag, MPI_Comm comm,
    MPI_Status status)
  • The tag must be identical to the tag sent
  • MPI_ANY_SOURCE can be specified to receive from
    anyone

20
Jeton.c
21
Synchronism and asynchronism (1)
  • To solve some deadlocks, and to allow le
    recouvrement des communications par le calcul,
    one can use non blocking functions
  • In this case, the communication scheme is the
    following
  • Initialization of the non blocking communication
    (by either the two or one of the process)
  • The communication (non blocking or blocking) is
    called by other process
  • computation
  • Termination of the communication (Blocking
    operation until the communication is performed)

22
Synchronism and asynchronism (2)
  • Non blocking functions
  • int MPI_Isend(void buf, int count, MPI_Datatype
    datatype, int dest, int tag, MPI_Comm comm,
    MPI_Request request)
  • int MPI_Irecv(void buf, int count, MPI_Datatype
    datatype, int source, int tag, MPI_Comm comm,
    MPI_Request request)
  • The request field is used to know the state of a
    non blocking communication. To wait for its
    termination, one can call the following function
  • int MPI_Wait(MPI_Request request, MPI_Status
    status)

23
Synchronism and asynchronism (3)
  • Data can be exchanged by blocking or non blocking
    functions. There are multiple functions to manage
    how the send and the receive operation are
    coupled
  • To fix the communication mode, you use prefix
    (MPI_Send)
  • Synchronous send (S) finished when the
    coresponding receive is posted (hard coupled to
    the reception, without buffers)
  • Buffered send (B) a buffer is created, the
    send operation ends when the user buffer is
    copied to the system buffer (not coupled to the
    reception)
  • Standard send () the send ends when the
    emission buffer is empty (MPI implementation
    decides for buffering or coupling to reception)
  • Ready send (R) User assures that reception
    request is already posted when calling this
    function (coupled to the reception without
    buffer)

24
Collective or global operations
  • To simplify communication operation involving
    multiple processes, one can use collective
    operations on a communicator
  • Typical operations
  • reductions
  • Data exchange
  • Broadcast
  • Scatter
  • Gather
  • All-to-All
  • Explicit synchronization

25
Reductions (1)
  • A reduction is an arithmetic operation on the
    distributed data made by a set of processors
  • Prototype
  • C int MPI_Reduce(void sendbuf, void recvbuf,
    int count, MPI_Datatype datatype, MPI_Op op, int
    root, MPI_Comm communicator)
  • Fortran MPI_Reduce(sendbuf, recvbuf, count,
    datatype, op, root, communicator, ierror)
  • Using MPI_Reduce(), only the root processor gets
    the result
  • With MPI_AllReduce(), all processes get the result

26
Reductions (2)
  • Available operations

27
Broadcast
  • A broadcast operation allows to distribute the
    same data to all processes
  • One-to-all communication, from a specified
    process root to all processes of a communicator
  • Prototypes
  • C int MPI_Bcast(void buffer, int count,
    MPI_Datatype datatype, int root, MPI_Comm comm)
  • Fortran MPI_Bcast(buffer, count, datatype,
    root, communicator, ierror)

0
0
1
2
3
np-1
1
2
3
np-1
buffer
root 1
28
Scatter
  • One-to-all operation, different data are sent to
    each receiver process according to their rank
  • Prototypes
  • C int MPI_Scatter(void sendbuf, int
    sendcount, MPI_Datatype sendtype, void recvbuf,
    int recvcount, MPI_Datatype recvtype, int root,
    MPI_Comm communicator)
  • Fortran MPI_Scatter(sendbuf, sendcount,
    sendtype, recvbuf, recvcount, recvtype, root,
    communicator, ierror)
  • The send parameters are used by only the sender
    process

0
1
2
3
np-1
0
1
2
3
np-1
sendbuf
recvbuf
root 2
29
Gather
  • All-to-one operation, different data are received
    by a receiver process
  • Prototypes
  • C int MPI_Gather(void sendbuf, int sendcount,
    MPI_Datatype sendtype, void recvbuf, int
    recvcount, MPI_Datatype recvtype, int root,
    MPI_Comm communicator)
  • Fortran MPI_Gather(sendbuf, sendcount,
    sendtype, recvbuf, recvcount, recvtype, root,
    communicator, ierror)
  • The receive parameters are only used by the
    receiver process

0
1
2
3
np-1
0
1
2
3
np-1
sendbuf
recvbuf
root 3
30
All-to-All
  • All-to-all operation, different data are sent to
    each process, according to their rank
  • Prototypes
  • C int MPI_AlltoAll(void sendbuf, int
    sendcount, MPI_Datatype sendtype, void recvbuf,
    int recvcount, MPI_Datatype recvtype, int root,
    MPI_Comm communicator)
  • Fortran MPI_Alltoall(sendbuf, sendcount,
    sendtype, recvbuf, recvcount, recvtype, root,
    communicator, ierror)

0
1
2
3
np-1
0
1
2
3
np-1
sendbuf
recvbuf
31
Explicit Synchronization
  • Synchronization barrier All processes of a
    communicator waits for the last process to enter
    the barrier before continuing their execution
  • For computer with material barrier available
    (such as SGI and Cray T3E), the MPI barrier is
    slower than these material barrier
  • Prototype
  • C int MPI_Barrier (MPI_Comm communicator)
  • Fortran MPI_Barrier(Communicator, IERROR)

32
Matrix trace (1)
  • Computing the trace of a matrix An
  • The matrix trace is the sum of the diagonal
    element (square matrix)
  • One can easily see that the sum can be made on
    multiple processor, ending by using a reduction
    to compute the complete trace

33
Matrix trace (2.1)
34
Matrix trace (2.2)
35
Contents
  • Introduction to MPI
  • Message passing
  • Different type of communication
  • MPI functionalities
  • MPI structures
  • Basic functions
  • Data types
  • Contexts and tags
  • Groups and communication domains
  • Communication functions
  • Point to point communications
  • Asynchronous communications
  • Global communications
  • MPI-2
  • One-sided communications
  • I/O

36
One-sided communications (1/2)
  • No synchronization during communications
  • Allow simulated shared memory implementation
    (Remote Memory Access)
  • Defining the part of memory other processes can
    access
  • MPI_Win_create()
  • MPI_Win_free()
  • One-sided communication functions
  • MPI_Put()
  • MPI_Get()
  • MPI_Accumulate()
  • Operations MPI_SUM, MPI_LAND, MPI_REPLACE

37
One-sided communications (2/2)
  • Active synchronization function
  • MPI_Win_fence()
  • Take a win window of memory as parameter
  • Collective operation (barrier) on all processes
    of the group MPI_Win_group(win)
  • Act as a synchronization barrier which ends every
    RMA transfer using the window win
  • Passive synchronization function
  • MPI_Win_lock() and MPI_Win_unlock()
  • Classical mutex functions
  • The communications initiator is the only
    responsible for the synchronization
  • When MPI_Win_unlock() returns, every transfer
    operation is finished

38
Parallel Input/Output
  • Need for intelligent management of I/O is
    mandatory for parallel applications
  • MPI-IO is a set of functions for optimised I/O
  • Extending classical file access functions
  • Collective synchronization for accessing file
  • File offset shared or individual
  • Blocking or non blocking read
  • View (for accessing non sequential memory zone)
  • Similar syntax as MPI communication functions

39
Dynamic allocation of processes
  • Dynamic change of the number of processes
  • Spawning new processes during execution
  • The MPI_Comm_spawn() function allow to create a
    new set of processes on other processors
  • An inter-communicator links the domain of the
    parent to the new domain gathering the new
    processes
  • The MPI_Intercom_merge() function allows the
    merge of a unique communicator from an
    inter-communicator
  • MPI-2 allows dynamic MPMD style using the
    function MPI_Comm_spawn_multiple()
  • MPI_Comm_get_attr (MPI_UNIVERSE_SIZE) is used to
    know the maximum possible number of MPI processes
  • Process destruction
  • No explicit exit() function of MPI process
  • For exiting a MPI process, its communicator
    MPI_COMM_WORLD must contain only finalizing
    processes
  • All inter-communicator must be closed before
    finalization

40
Remarks and conclusion
  • MPI has become, thanks to the distributed
    computing community, a standard library for
    message passing
  • The MPI-2 breaks the classic message passing SPMD
    model of MPI-1
  • Numbers of implementation exist, on most of
    architectures
  • Lots of Documentations and publications are
    available

41
Some pointers
  • MPI standard official site
  • http//www-unix.mcs.anl.gov/mpi/
  • The MPI forum
  • http//www.mpi-forum.org/
  • Book MPI, The Complete Reference (Marc Snir et
    al.)
  • http//www.netlib.org/utk/papers/mpi-book/mpi-book
    .html

42
Standard
  • Description
  • Performance

43
Contents
  • MPI implementation
  • Performance metrics
  • High performance networks
  • Communication type / 0-copy

44
MPI implementation
  • LAM-MPI
  • Optimised for collective operations
  • MPICH
  • Easy writing of new low level driver
  • Open-MPI
  • Try to combine performance and ease of the two
    prior ones
  • Conform to MPI-2
  • IBM / NEC / FUJITSU
  • Complete and performant implementation of MPI-2
  • Target specific architecture

45
Performance metrics
  • Comparison criteria
  • Latency
  • bandwidth
  • Collective operation
  • Overlapping capabilities
  • Real applications
  • Measuring tools
  • Round Trip Time (ping-pong)
  • NetPipe
  • NAS benchmarks
  • CG
  • LU
  • BT
  • FT

46
High performance networks (1/3) Technologies
  • Myrinet
  • Connexionless reliable api
  • Registered buffers
  • Fully programmable DMA NIC processor
  • Up to full-duplex 2Gb/s bandwidth with Myrinet
    2000
  • SCINet
  • Torus topology based network with static routing
  • No need to register buffers
  • Very small latency (suitable for RMA)
  • Up to 2Gb/s
  • Gigabit Ethernet
  • No need to registered buffers
  • DMA operations
  • High latency
  • Up to 1Gb/s and 10Gb/s bandwidth
  • Infiniband
  • Reliable Connexion mode and Unreliable Datagram
    mode
  • Registered buffers
  • Queued DMA operations

47
High performance networks (2/3) Technologies
  • Myrinet
  • Socket-GM
  • MPICH-GM
  • SCINet
  • No functional socket API
  • SCI-MPICH
  • Gigabit Ethernet
  • Have to use socket interface
  • Infiniband
  • IoIP
  • LAM-MPI, MPICH, MPI/pro etc

48
High performance networks (3/3) Technologies
49
Eager vs Rendez-vous (1/2)
  • Eager protocol
  • Message is sent without control
  • Better latency
  • Copied in a buffer if the receiver has not posted
    the reception yet
  • Memory consuming for long messages
  • Used only for long messages (lt64KB)
  • Rendez-vous protocol
  • Sender and receiver are synchronized
  • High latency
  • 0-copy
  • Better bandwidth
  • Reduce the memory consumption

50
Eager vs Rendez-vous (2/2)
51
Communication types
52
High performance networks and 0-copy
Latence Myrinet 8µs Latence MPICH-GM
33µs Latence MPICH-Vdummy 94µs
53
Conclusion
  • Many MPI implementation with similar performance
  • Multiple measures criteria and multiple tools
  • Latency, bandwidth
  • Benchmarks and microbenchmarks
  • Real applications
  • High performance networks lead to consider small
    performance details
  • Network bandwidth equals the memory bandwidth
  • Latency smaller than some OS operations
  • Performance relies on good programming
  • Performance results can vary a lot according to
    the type of communication employed
  • Asynchronism is mandatory
  • Bad programming results in bad performance
  • 0-copy can be mandatory
Write a Comment
User Comments (0)
About PowerShow.com