Title: Grouping Data For Communication
1Grouping Data For Communication
- Chapter 6 Parallel Programming with MPI, Peter
S. Pacheco
2Introduction
- We know that communication (sending messages) is
the most costly operation - We wish to minimize the number of messages
sent/received - The Trapezoid program requires three parameters
to be sent - We know that MPI_Bcast() is usually better than
MPI_Send()
3The Count Parameter
- MPI-Send, MPI_Recv, MPI_Bcast, and MPI_Reduce all
have a count parameter and a datatype. - Use of count requires contiguous memory
- Given int myArray100
4The Count Parameter
- We can send half of the vector to a second
process usingif(my_rank 0)
MPI_Send(myArray50,50,MPI_INT,
1,0,MPI_COMM_WORLD)else
MPI_Recv(myArray50,50,MPI_INT
,0,0,MPI_COMM_WORLD
,status)
5The Count Parameter
- What if we have int a int b int c
- Can we use?MPI_Send(a,3,MPI_INT,)
6The Count Parameter
- Will not work!!!
- WHY??????
- The C language specification does not specify
that a, b, and c will be stored in contiguous
memory - This may work on one system and not work on
another - POOR PROGRAMMING STYLE TO USE!
7Derived Types and MPI_Type_struct
- How about we pack them into a struct and send the
struct as follows - typedef struct int a int b int
c INDATA_T - With variable INDATA_T indata
8Derived Types and MPI_Type_struct
- Now we can try to use
- MPI_Bcast(indata, 1, INDATA_T, 0,
MPI_COMM_WORLD) - Nice try, wont work!
- I must use one of the MPI_XXX types with all my
send routines - All is not lost, we can use MPI_Datatype
9Derived Types and MPI_Type_struct
- Let use assume we have float a float b
int n / from Trap.c / - And the following addresses and values
10Derived Types and MPI_Type_struct
- In order to send a, b, and n in the same message
we need - There are three elements to send
- A. The first element is a float
- B. The second is a float
- C. The third is an int
- The address of each element as
- The first has address a
- The second has address b
- The last has address n
11Derived Types and MPI_Type_struct
- We now need relative addresses or displacements
of b and n from a and only provide the address of
a. - According to the table, a has address 24 (a
24). - This will mean be is 40-24 or 16 bytes beyond a.
- n is 48-24 or 24 bytes past a
12Derived Types and MPI_Type_struct
- This means process 0 can send by specifying1.
There are three elements to send2. a. The first
element is a float b. The second is a float
c. The third is an int3. a. First element has
offset 0 b. The second has offset 16 c.
The last has offset 244. The beginning of
message has a
13Derived Types and MPI_Type_struct
- Notice that once we know the actual address of a,
we can calculate all others. - This is the underlying principle for all MPI
derived data types, give all information except
the first address. - A general MPI datatype is a sequence
- (t0,d0),(t1,d1), ,(tn-1,dn-1)
14Derived Types and MPI_Type_struct
- Each t is an MPI_XXX data type
- Each d is a displacement in bytes
- So we would need
- (MPI_FLOAT,0),(MPI_FLOAT,16), (MPI_INT,24)
- The following code shows this in practice
- http//www.cs.sdstate.edu/hamerg/csc750/ppmpi_c/c
hap06/get_data3.c
15Derived Types and MPI_Type_struct
- Notice the call to MPI_Type_struct()
- This is where we pass all the info to create the
derived type - This sure is a lot of work!
- Must be a better way!!!
16Other Derived Datatype Constructors
- There are three additional types that we can used
in many cases to make life easier - MPI_Type_contiguous, MPI_Type_vector, and
MPI_Type_indexed - The first is used with contiguous elements in an
array
17Other Derived Datatype Constructors
- The second is used for equally spaced elements in
an array - The last builds a type whose entries are
arbitrary entries of an array - An example using MPI_Type_Vector float
A1010 - We know that C uses row major storage
18Other Derived Datatype Constructors
- This means A23 is preceded by A22 and
followed by A24 - So to send the third row of the array if
(my_rank0) MPI_Send(A20, 10,
MPI_FLOAT, ) - This works because the row is contiguous
- If we wish to send the third column, it no longer
works
19Other Derived Datatype Constructors
- MPI_Type_vector to the rescue
- The displacement of successive elements is
constant - A12 is 10 floats past A02 and A22
will be another 10 floats beyond A12 - This remains constant so we can use
MPI_Type_vector
20Other Derived Datatype Constructors
- Syntaxint MPI_Type_vector( int count
/ in / int block_length / in
/ int stride / in / MPI_Datatype
elem_type / in / MPI_Datatype new_mpi_t
/ out /)
21Other Derived Datatype Constructors
- The parameter count is the total number of
elements in the type - Block_length is the number of entries in each
element - Stride is the number of elements of type
new_mpi_t between successive elements - Elem_type is the type of elements composing the
derived type - new_mpi_t is the new derived type
22Other Derived Datatype Constructors
- http//www.cs.sdstate.edu/hamerg/csc750/ppmpi_c/c
hap06/send_col.c - Note that column_mpi_t can be used to send any
column of A - Just use any A0j to send appropriate column
- Notice finally that this can be used for any
10x10 array of floats
23Other Derived Datatype Constructors
- The other two constructors have the following
syntaxint MPI_Type_contiguous(
int count, / in / MPI_Datatype old_type,
/ in / MPI_Datatype new_mpi_t /
out /)
24Other Derived Datatype Constructors
- Int MPI_Type_indexed( int count,
int block_lengths, int discplacements,
MPI_Datatype old_type, MPI_Datatype
new_mpi_t) - In MPI_Type_contiguous, one simply specifies that
the derived type will consist of count elements
of type old_type
25Other Derived Datatype Constructors
- In MPI_Type_indexed , the derived type consists
of count elements of type old_type - The ith element consists of block_lengthi
entries, and it is displaced displacementi
units of old_type from the beginning of the type - Displacements are not measured in bytes
26Other Derived Datatype Constructors
- As an example, lets send the upper triangular
portion of a square matrix on process 0 to
process 1 - http//www.cs.sdstate.edu/hamerg/csc750/ppmpi_c/c
hap06/send_triangle.c - Notice that we could not use MPI_Type_vector
because each row has different length
27Type Matching
- Rules for type matching
- Does send_mpi_t have to be the same as
recv_mpi_t? - We must compute the type signature
- The type signature is the sequence of types in
(t0,d0),(t1,d1), ,(tn-1,dn-1) - t0, t1, , tn
28Type Matching
- Basic rule is that the type signatures between
sender and receiver must be compatible - Given MPI_Send is t0, t1, , tn-1
- And MPI_Recv is u0, u1, , um-1
- Then n must be less than or equal to m and ti
must equal ui for all i
29Type Matching
- We can use this to send a row of a matrix to a
column on another process - http//www.cs.sdstate.edu/hamerg/csc750/ppmpi_c/c
hap06/send_col_to_row.c
30Pack/Unpack
- An alternative approach to grouping data is
provided by the functions MPI_Pack and MPI_Unpack - Lets look at the following example
http//www.cs.sdstate.edu/hamerg/csc750/ppmpi_c/c
hap06/get_data4.c - In this version of Get_data, process 0 uses
MPI_Pack to copy a to the buffer and then append
b, and now n.
31Pack/Unpack
- The receiving side now unpacks the data from the
buffer - Note the use of the MPI_PACKED datatype in the
MPI_Bcast call
32Pack/Unpack
- Syntax of MPI_PACKint MPI_PACK(
void pack_data, / in / int in_count,
/ in / MPI_Datatype datatype,
/ in / void buffer, /out/
int buffer_size, / in /
int position / in/out/
MPI_Comm comm / in /)
33Pack/Unpack
- The parameter pack_data references the data to be
buffered. - It should consist of in_count elements
- Each should have the type datatype
- The parameter position is an in/out parameter
- On input, the data referenced by pack_data is
copied into memory starting at address buffer
position
34Pack/Unpack
- On return, position references the first
location in buffer after the data that was copied - The parameter buffer_size contains the size in
bytes of memory referenced by buffer - comm is the communicator that will be using buffer
35Pack/Unpack
- The syntax of MPI_Unpack isint MPI_Unpack(
void buffer, / in / int size, / in /
int position, / in/out /
void unpack_data, / out / int count, /
in / MPI_Datatype datatype, / in /
MPI_Comm comm / in /)
36Pack/Unpack
- The parameter buffer references the data to be
unpacked - It consists of size bytes
- The parameter position is again an in/out
parameter - When MPI_Unpack is called, the data starting at
address buffer postion is copied into the
memory referenced by unpack_data
37Pack/Unpack
- On return, postion references the first location
in buffer after the data that was just copied - MPI_Unpack will copy count elements having the
type datatype into unpack_data - The communicator used is comm
38Pack in a Picture
- pack_data
- buffer
- pack_data
- buffer
position
position
39Unpack in a Picture
- pack_data
- buffer
- pack_data
- buffer
position
position
40Deciding Which Method to Use
- If the data to be sent is stored in consecutive
entries of an array, then one should use the
count and datatype parameters of the
communication function(s) - No additional overhead
- If there are a large number of items that are
noncontiguous, then building a new type is
prefereable
41Deciding Which Method to Use
- If data are all the same type and stored at
regular intervals in memory (one column of a
matrix), then a derived type will again be most
efficient - MPI_Type_indexed should be used when irregularly
spaced and of same type
42Deciding Which Method to Use
- Finally, if data is heterogeneous, then we will
use MPI_Pack/Unpack. - If this must be done numerous times it is better
to build a derived type - Derived type incurs overhead once, whereas
MPI_Pack incurs overhead each time it is used
43Deciding Which Method to Use
- On a parallel machine (nCube) running mpich, it
takes 12 msec to create the type in Get_data3 - Using MPI_Pack/Unpack in Get_data4 only requires
2 msec - Remember that while process 0 is packing all
others are idle!! - Actual cost ration is about 31
44Deciding Which Method to Use
- There are a couple of situations where
Pack/Unpack is preferable - You may be able to avoid system buffering with
pack, since the data is explicitly stored in a
user-defined buffer - Each time we copy data, we incur costs
- User memory to system memory to network card
memory, etc.
45Deciding Which Method to Use
- We can send variable length messages when using
Pack/Unpack - Send the number of items as the first element of
the buffer and then pack the elements - Think of a sparse matrix
- Send two arrays, one with column subscripts and
the other the data values
46Deciding Which Method to Use
- http//www.cs.sdstate.edu/hamerg/csc750/ppmpi_c/c
hap06/sparse_row.c
47Empty Slide!!