CSCI-4320/6360: Parallel Programming - PowerPoint PPT Presentation

About This Presentation
Title:

CSCI-4320/6360: Parallel Programming

Description:

CSCI-4320/6360: Parallel Programming & Computing Tues./Fri. 12-1:20 p.m. MPI File I/O Prof. Chris Carothers Computer Science Department MRC 309a chrisc_at_cs.rpi.edu – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 41
Provided by: DaveH166
Learn more at: http://cmes.colorado.edu
Category:

less

Transcript and Presenter's Notes

Title: CSCI-4320/6360: Parallel Programming


1
CSCI-4320/6360 Parallel Programming
ComputingTues./Fri. 12-120 p.m.MPI File I/O
  • Prof. Chris Carothers
  • Computer Science Department
  • MRC 309a
  • chrisc_at_cs.rpi.edu
  • www.cs.rpi.edu/chrisc/COURSES/PARALLEL/SPRING-201
    0
  • Adapted from people.cs.uchicago.edu/asiegel/cour
    ses/cspp51085/.../mpi-io.ppt

2
Common Ways of Doing I/O in Parallel Programs
  • Sequential I/O
  • All processes send data to rank 0, and 0 writes
    it to the file

3
Pros and Cons of Sequential I/O
  • Pros
  • parallel machine may support I/O from only one
    process (e.g., no common file system)
  • Some I/O libraries (e.g. HDF-4, NetCDF, PMPIO)
    not parallel
  • resulting single file is handy for ftp, mv
  • big blocks improve performance
  • short distance from original, serial code
  • Cons
  • lack of parallelism limits scalability,
    performance (single node bottleneck)

4
Another Way
  • Each process writes to a separate file
  • Pros
  • parallelism, high performance
  • Cons
  • lots of small files to manage
  • LOTS OF METADATA stress parallel filesystem
  • difficult to read back data from different number
    of processes

5
What is Parallel I/O?
  • Multiple processes of a parallel program
    accessing data (reading or writing) from a common
    file

FILE
P(n-1)
P0
P1
P2
6
Why Parallel I/O?
  • Non-parallel I/O is simple but
  • Poor performance (single process writes to one
    file) or
  • Awkward and not interoperable with other tools
    (each process writes a separate file)
  • Parallel I/O
  • Provides high performance
  • Can provide a single file that can be used with
    other tools (such as visualization programs)

7
Why is MPI a Good Setting for Parallel I/O?
  • Writing is like sending a message and reading is
    like receiving.
  • Any parallel I/O system will need a mechanism to
  • define collective operations (MPI communicators)
  • define noncontiguous data layout in memory and
    file (MPI datatypes)
  • Test completion of nonblocking operations (MPI
    request objects)
  • i.e., lots of MPI-like machinery

8
MPI-IO Background
  • Marc Snir et al (IBM Watson) paper exploring MPI
    as context for parallel I/O (1994)
  • MPI-IO email discussion group led by J.-P. Prost
    (IBM) and Bill Nitzberg (NASA), 1994
  • MPI-IO group joins MPI Forum in June 1996
  • MPI-2 standard released in July 1997
  • MPI-IO is Chapter 9 of MPI-2

9
Using MPI for Simple I/O
Each process needs to read a chunk of data from a
common file
10
Using Individual File Pointers
includeltstdio.hgt includeltstdlib.hgt include
"mpi.h" define FILESIZE 1000 int main(int argc,
char argv) int rank, nprocs MPI_File
fh MPI_Status status int bufsize, nints
int bufFILESIZE MPI_Init(argc, argv)
MPI_Comm_rank(MPI_COMM_WORLD, rank)
MPI_Comm_size(MPI_COMM_WORLD, nprocs)
bufsize FILESIZE/nprocs nints
bufsize/sizeof(int) MPI_File_open(MPI_COMM_W
ORLD, "datafile", MPI_MODE_RDONLY,
MPI_INFO_NULL, fh) MPI_File_seek(fh, rank
bufsize, MPI_SEEK_SET) MPI_File_read(fh, buf,
nints, MPI_INT, status) MPI_File_close(fh)
11
Using Explicit Offsets
includeltstdio.hgt includeltstdlib.hgt include
"mpi.h" define FILESIZE 1000 int main(int argc,
char argv) int rank, nprocs MPI_File
fh MPI_Status status int bufsize, nints
int bufFILESIZE MPI_Init(argc, argv)
MPI_Comm_rank(MPI_COMM_WORLD, rank)
MPI_Comm_size(MPI_COMM_WORLD, nprocs)
bufsize FILESIZE/nprocs nints
bufsize/sizeof(int) MPI_File_open(MPI_COMM_W
ORLD, "datafile", MPI_MODE_RDONLY, MPI_INFO_NULL,
fh) MPI_File_read_at(fh, rankbufsize, buf,
nints, MPI_INT, status) MPI_File_close(fh)
12
Function Details
MPI_File_open(MPI_Comm comm, char file, int
mode, MPI_Info info, MPI_File fh) (note mode
MPI_MODE_RDONLY, MPI_MODE_RDWR,
MPI_MODE_WRONLY,
MPI_MODE_CREATE, MPI_MODE_EXCL,
MPI_MODE_DELETE_ON_CLOSE, MPI_MODE_UNIQUE_OPEN,
MPI_MODE_SEQUENTIAL,
MPI_MODE_APPEND) MPI_File_close(MPI_File
fh) MPI_File_read(MPI_File fh, void buf, int
count, MPI_Datatype type, MPI_Status
status) MPI_File_read_at(MPI_File fh, int
offset, void buf, int count,
MPI_Datatype type, MPI_Status
status) MPI_File_seek(MPI_File fh, MPI_Offset
offset, in whence) (note whence
MPI_SEEK_SET, MPI_SEEK_CUR, or MPI_SEEK_END) MPI_
File_write(MPI_File fh, void buf, int count,
MPI_Datatype datatype, MPI_Status
status) MPI_File_write_at( same as read_at
) (Note Many other functions to get/set
properties (see Gropp et al))
13
Writing to a File
  • Use MPI_File_write or MPI_File_write_at
  • Use MPI_MODE_WRONLY or MPI_MODE_RDWR as the flags
    to MPI_File_open
  • If the file doesnt exist previously, the flag
    MPI_MODE_CREATE must also be passed to
    MPI_File_open
  • We can pass multiple flags by using bitwise-or
    in C, or addition in Fortran

14
MPI Datatype Interlude
  • Datatypes in MPI
  • Elementary MPI_INT, MPI_DOUBLE, etc
  • everything weve used to this point
  • Contiguous
  • Next easiest sequences of elementary types
  • Vector
  • Sequences separated by a constant stride

15
MPI Datatypes, cont
  • Indexed more general
  • does not assume a constant stride
  • Struct
  • General mixed types (like C structs)

16
Creating simple datatypes
  • Lets just look at the simplest types contiguous
    and vector datatypes.
  • Contiguous example
  • Lets create a new datatype which is two ints
    side by side. The calling sequence is
  • MPI_Type_contiguous(int count, MPI_Datatype
    oldtype, MPI_Datatype newtype)
  • MPI_Datatype newtype
  • MPI_Type_contiguous(2, MPI_INT, newtype)
  • MPI_Type_commit(newtype) / required /

17
Using File Views
  • Processes write to shared file
  • MPI_File_set_view assigns regions of the file to
    separate processes

18
File Views
  • Specified by a triplet (displacement, etype, and
    filetype) passed to MPI_File_set_view
  • displacement number of bytes to be skipped from
    the start of the file
  • etype basic unit of data access (can be any
    basic or derived datatype)
  • filetype specifies which portion of the file is
    visible to the process
  • This is a collective operation and so all
    processors/ranks must use the same data rep,
    etypes in the group determined when the file was
    open..

19
File Interoperability
  • Users can optionally create files with a portable
    binary data representation
  • datarep parameter to MPI_File_set_view
  • native - default, same as in memory, not portable
  • internal - impl. defined representation providing
    an impl. defined level of portability
  • external32 - a specific representation defined in
    MPI, (basically 32-bit big-endian IEEE format),
    portable across machines and MPI implementations

20
File View Example
MPI_File thefile for (i0 iltBUFSIZE i)
bufi myrank BUFSIZE i MPI_File_open(MPI_
COMM_WORLD, "testfile", MPI_MODE_CREATE
MPI_MODE_WRONLY, MPI_INFO_NULL,
thefile) MPI_File_set_view(thefile, myrank
BUFSIZE, MPI_INT, MPI_INT, "native",
MPI_INFO_NULL) MPI_File_write(thefi
le, buf, BUFSIZE, MPI_INT,
MPI_STATUS_IGNORE) MPI_File_close(thefile)
21
Ways to Write to a Shared File
  • MPI_File_seek
  • MPI_File_read_at
  • MPI_File_write_at
  • MPI_File_read_shared
  • MPI_File_write_shared
  • Collective operations

like Unix seek
combine seek and I/O for thread safety
use shared file pointer good when order doesnt
matter
22
Collective I/O in MPI
  • A critical optimization in parallel I/O
  • Allows communication of big picture to file
    system
  • Framework for 2-phase I/O, in which communication
    precedes I/O (can use MPI machinery)
  • Basic idea build large blocks, so that
    reads/writes in I/O system will be large

Small individual requests
Large collective access
23
Collective I/O
  • MPI_File_read_all, MPI_File_read_at_all, etc
  • _all indicates that all processes in the group
    specified by the communicator passed to
    MPI_File_open will call this function
  • Each process specifies only its own access
    information -- the argument list is the same as
    for the non-collective functions

24
Collective I/O
  • By calling the collective I/O functions, the user
    allows an implementation to optimize the request
    based on the combined request of all processes
  • The implementation can merge the requests of
    different processes and service the merged
    request efficiently
  • Particularly effective when the accesses of
    different processes are noncontiguous and
    interleaved

25
Collective non-contiguousMPI-IO examples
define mpi.h define FILESIZE 1048576 define
INTS_PER_BLK 16 int main(int argc, char
argv) int buf, rank, nprocs, nints,
bufsize MPI_File fh MPI_Datatype
filetype MPI_Init(argc, argv)
MPI_Comm_rank(MPI_COMM_WORLD, rank)
MPI_Comm_size(MPI_COMM_WORLD, nprocs)
bufsize FILESIZE/nprocs buf (int )
malloc(bufsize) nints bufsize/sizeof(int)
MPI_File_open(MPI_COMM_WORLD, filename,
MPI_MODE_RD_ONLY, MPI_INFO_NULL, fh)
MPI_Type_vector(nints/INTS_PER_BLK, INTS_PER_BLK,
INTS_PER_BLKnprocs, MPI_INT, filetype)
MPI_Type_commit(filetype) MPI_File_set_view(fh
, INTS_PER_BLKsizeof(int)rank, MPI_INT,
filetype, native, MPI_INFO_NULL)
MPI_File_read_all(fh, buf, nints, MPI_INT,
MPI_STATUS_IGNORE) MPI_Type_free(filetype)
free(buf) MPI_Finalize() return(0)
26
More on MPI_Read_all
  • Note that the _all version has the same argument
    list
  • Difference is that all processes involved in
    MPI_Open must call this the read
  • Contrast with the non-all version where any
    subset may or may not call it
  • Allows for many optimizations

27
Split Collective I/O
  • A restricted form of nonblocking collective I/O
  • Only one active nonblocking collective operation
    allowed at a time on a file handle
  • Therefore, no request object necessary

MPI_File_write_all_begin(fh, buf, count,
datatype) // available on Blue Gene/L, but may
not improve // performance for (i0 ilt1000
i) / perform computation
/ MPI_File_write_all_end(fh, buf, status)
28
Passing Hints to the Implementation
MPI_Info info MPI_Info_create(info) / no.
of I/O devices to be used for file striping
/ MPI_Info_set(info, "striping_factor",
"4") / the striping unit in bytes
/ MPI_Info_set(info, "striping_unit",
"65536") MPI_File_open(MPI_COMM_WORLD,
"/pfs/datafile", MPI_MODE_CREATE
MPI_MODE_RDWR, info, fh) MPI_Info_free(info)
29
Examples of Hints (used in ROMIO)
  • striping_unit
  • striping_factor
  • cb_buffer_size
  • cb_nodes
  • ind_rd_buffer_size
  • ind_wr_buffer_size
  • start_iodevice
  • pfs_svr_buf
  • direct_read
  • direct_write

MPI-2 predefined hints
New Algorithm Parameters
Platform-specific hints
30
I/O Consistency Semantics
  • The consistency semantics specify the results
    when multiple processes access a common file and
    one or more processes write to the file
  • MPI guarantees stronger consistency semantics if
    the communicator used to open the file accurately
    specifies all the processes that are accessing
    the file, and weaker semantics if not
  • The user can take steps to ensure consistency
    when MPI does not automatically do so

31
Example 1
  • File opened with MPI_COMM_WORLD. Each process
    writes to a separate region of the file and reads
    back only what it wrote.
  • MPI guarantees that the data will be read
    correctly

32
Example 2
  • Same as example 1, except that each process wants
    to read what the other process wrote (overlapping
    accesses)
  • In this case, MPI does not guarantee that the
    data will automatically be read correctly

Process 0
Process 1
/ incorrect program / MPI_File_open(MPI_COMM_WOR
LD,) MPI_File_write_at(off0,cnt100) MPI_Barrier
MPI_File_read_at(off100,cnt100)
/ incorrect program / MPI_File_open(MPI_COMM_WOR
LD,) MPI_File_write_at(off100,cnt100) MPI_Barri
er MPI_File_read_at(off0,cnt100)
  • In the above program, the read on each process is
    not guaranteed to get the data written by the
    other process!

33
Example 2 contd.
  • The user must take extra steps to ensure
    correctness
  • There are three choices
  • set atomicity to true
  • close the file and reopen it
  • ensure that no write sequence on any process is
    concurrent with any sequence (read or write) on
    another process/MPI rank
  • Can hurt performance.

34
Example 2, Option 1Set atomicity to true
35
Example 2, Option 2Close and reopen file
Process 0
Process 1
MPI_File_open(MPI_COMM_WORLD,) MPI_File_write_at(
off0,cnt100) MPI_File_close MPI_Barrier MPI_File
_open(MPI_COMM_WORLD,) MPI_File_read_at(off100,c
nt100)
MPI_File_open(MPI_COMM_WORLD,) MPI_File_write_at(
off100,cnt100) MPI_File_close MPI_Barrier MPI_Fi
le_open(MPI_COMM_WORLD,) MPI_File_read_at(off0,c
nt100)
36
Example 2, Option 3
  • Ensure that no write sequence on any process is
    concurrent with any sequence (read or write) on
    another process
  • a sequence is a set of operations between any
    pair of open, close, or file_sync functions
  • a write sequence is a sequence in which any of
    the functions is a write operation

37
Example 2, Option 3
38
General Guidelines for Achieving High I/O
Performance
  • Buy sufficient I/O hardware for the machine
  • Use fast file systems, not NFS-mounted home
    directories
  • Do not perform I/O from one process only
  • Make large requests wherever possible
  • For noncontiguous requests, use derived datatypes
    and a single collective I/O call

39
Optimizations
  • Given complete access information, an
    implementation can perform optimizations such as
  • Data Sieving Read large chunks and extract what
    is really needed
  • Collective I/O Merge requests of different
    processes into larger requests
  • Improved prefetching and caching

40
Summary
  • MPI-IO has many features that can help users
    achieve high performance
  • The most important of these features are the
    ability to specify noncontiguous accesses, the
    collective I/O functions, and the ability to pass
    hints to the implementation
  • Users must use the above features!
  • In particular, when accesses are noncontiguous,
    users must create derived datatypes, define file
    views, and use the collective I/O functions
Write a Comment
User Comments (0)
About PowerShow.com