Title: MPI-I/O for EQM APPLICATIONS
1MPI-I/O for EQM APPLICATIONS
- David Cronk
- Innovative Computing Lab
- University of Tennessee
- June 20, 2001
2Outline
- Introduction
- What is parallel I/O
- Why do we need parallel I/O
- What is MPI-I/O
- MPI-I/O
- Derived data types and file views
3OUTLINE (cont)
- MPI-I/O (cont)
- Data access
- Non-collective access
- Collective access
- Split collective access
- Examples
- LBMPI - Bob Maier (ARC)
- CE-QUAL-IC - Victor Parr (UT-Austin)
4INTRODUCTION
- What is parallel I/O?
- Multiple processes accessing a single file
5INTRODUCTION
- What is parallel I/O?
- Multiple processes accessing a single file
- Often, both data and file access is
non-contiguous - Ghost cells cause non-contiguous data access
- Block or cyclic distributions cause
non-contiguous file access
6Non-Contiguous Access
File layout
Local Mem
7INTRODUCTION
- What is parallel I/O?
- Multiple processes accessing a single file
- Often, both data and file access is
non-contiguous - Ghost cells cause non-contiguous data access
- Block or cyclic distributions cause
non-contiguous file access - Want to access data and files with as few I/O
calls as possible
8INTRODUCTION (cont)
- Why use parallel I/O?
- Many users do not have time to learn the
complexities of I/O optimization
9INTRODUCTION (cont)
Integer dim parameter (dim10000) Integer4
out_array(dim)
OPEN (fh,filename,UNFORMATTED) WRITE(fh)
(out_array(I), I1,dim)
rl 4dim OPEN (fh, filename, DIRECT,
RECLrl) WRITE (fh, REC1) out_array
10INTRODUCTION (cont)
- Why use parallel I/O?
- Many users do not have time to learn the
complexities of I/O optimization - Use of parallel I/O can simplify coding
- Single read/write operation vs. multiple
read/write operations
11INTRODUCTION (cont)
- Why use parallel I/O?
- Many users do not have time to learn the
complexities of I/O optimization - Use of parallel I/O can simplify coding
- Single read/write operation vs. multiple
read/write operations - Parallel I/O potentially offers significant
performance improvement over traditional
approaches
12INTRODUCTION (cont)
- Traditional approaches
- Each process writes to a separate file
- Often requires an additional post-processing step
- Without post-processing, restarts must use same
number of processor - Result sent to a master processor, which collects
results and writes out to disk - Each processor calculates position in file and
writes individually
13INTRODUCTION (cont)
- What is MPI-I/O?
- MPI-I/O is a set of extensions to the original
MPI standard - This is an interface specification It does NOT
give implementation specifics - It provides routines for file manipulation and
data access - Calls to MPI-I/O routines are portable across a
large number of architectures
14DERIVED DATATYPES VIEWS
- Derived datatypes are not part of MPI-I/O
- They are used extensively in conjunction with
MPI-I/O - A filetype is really a datatype expressing the
access pattern of a file - Filetypes are used to set file views
15DERIVED DATATYPES VIEWS
- Non-contiguous memory access
- MPI_TYPE_CREATE_SUBARRAY
- NDIMS - number of dimensions
- ARRAY_OF_SIZES - number of elements in each
dimension of full array - ARRAY_OF_SUBSIZES - number of elements in each
dimension of sub-array - ARRAY_OF_STARTS - starting position in full array
of sub-array in each dimension - ORDER - MPI_ORDER_(C or FORTRAN)
- OLDTYPE - datatype stored in full array
- NEWTYPE - handle to new datatype
16NONCONTIGUOUS MEMORY ACCESS
0,101
0,0
1,1
1,100
101,1
100,100
101,101
101,0
17NONCONTIGUOUS MEMORY ACCESS
- INTEGER sizes(2), subsizes(2), starts(2), dtype,
ierr - sizes(1) 102
- sizes(2) 102
- subsizes(1) 100
- subsizes(2) 100
- starts(1) 1
- starts(2) 1
- CALL MPI_TYPE_CREATE_SUBARRAY(2,sizes,subsizes,sta
rts, MPI_ORDER_FORTRAN,MPI_REAL8,dtype,ierr)
18NONCONTIGUOUS FILE ACCESS
- MPI_FILE_SET_VIEW(
- FH,
- DISP,
- ETYPE,
- FILETYPE,
- DATAREP,
- INFO,
- IERROR)
19NONCONTIGUOUS FILE ACCESS
- The file has holes in it from the processors
perspective - multi-dimensional array access
20NONCONTIGUOUS FILE ACCESS
- The file has holes in it from the processors
perspective - multi-dimensional array access
- MPI_TYPE_CREATE_SUBARRAY()
21Distributed array access
(0,0)
(0,199)
(199,0)
(199,199)
22Distributed array access
Sizes(1) 200 sizes(2) 200 subsizes(1)
100 subsizes(2) 100 starts(1) 0 starts(2)
0 CALL MPI_TYPE_CREATE_SUBARRAY(2, SIZES,
SUBSIZES, STARTS, MPI_ORDER_FORTRAN, MPI_INT,
FILETYPE, IERR) CALL MPI_TYPE_COMMIT(FILETYPE,
IERR) CALL MPI_FILE_SET_VIEW(FH, 0, MPI_INT,
FILETYPE, NATIVE, MPI_INFO_NULL, IERR)
23NONCONTIGUOUS FILE ACCESS
- The file has holes in it from the processors
perspective - multi-dimensional array distributed with a block
distribution - Irregularly distributed arrays
24Irregularly distributed arrays
- MPI_TYPE_CREATE_INDEXED_BLOCK
- COUNT - Number of blocks
- LENGTH - Elements per block
- MAP - Array of displacements
- OLD - Old datatype
- NEW - New datatype
25Irregularly distributed arrays
0 1 2 4 7
11 12 15 20
22
26Irregularly distributed arrays
CALL MPI_TYPE_CREATE_INDEXED_BLOCK (10, 1,
FILE_MAP, MPI_INT, FILETYPE, IERR) CALL
MPI_TYPE_COMMIT (FILETYPE, IERR) DISP 0 CALL
MPI_FILE_SET_VIEW (FH, DISP, MPI_INT, FILETYPE,
native, MPI_INFO_NULL, IERR)
27DATA ACCESS
28COLLECTIVE I/O
Memory layout on 4 processor
29EXAMPLE 1
- Bob Maier - ARC
- Production level Fortran code
- Challenge problem
- Every X iterations, write a re-start file
- At conclusion write output file
- On SP w/512 Processors, 12 hrs computation, 12
hrs I/O.
30EXAMPLE 1 (cont)
- Conceptually, four 3-dim. Arrays
- Implemented with a single 4-dim array
- Improved cache-hit ratio
- Uses ghost cells
- Write out to 4 separate files
- Block-Block data distribution
- Mem access is completely non-contiguous
31EXAMPLE 1 (cont)
32EXAMPLE 1 - Solution
Set up array with size of file set up array
with subsize of file set up array with size of
local arrays set up array with subsize of
memory set up array with starting positions in
file set up array with starting positions in
memory disp 0 call mpi_type_create_subarray(3,
file_sizes, file_subsizes, file_starts,
MPI_ORDER_FORTRAN, MPI_REAL8, file_type,
ierr) call mpi_type_commit (file_type, ierr)
do vars1,4 mem_starts(1) vars-1 call
mpi_type_create_subarray(4, mem_sizes,
mem_subsizes, mem_starts, MPI_ORDER_FORTRAN,
MPI_REAL8, mem_type, ierr) call mpi_type_commit
(mem_type, ierr) call mpi_file_open () call
mpi_file_set_view (fh, disp, MPI_REAL8,
file_type, native, ) call mpi_file_write_all
(fh, Z, 1, mem_type, ) call mpi_file_close
(fh, ierr) enddo
33LBMPI - PERFORMANCEOriginal 5204 Seconds
34LBMPI - PERFORMANCE Original 5204 Seconds
35EXAMPLE 2
- Victor Parr - UTA
- Production level Fortran code performing EPA
simulations (CE-QUAL-ICM Message Passing code ) - Typical production run performs a 10 year
simulation dumping output for every simulation
month - Irregular grid and irregular data distribution
- High ratio of ghost cells
36EXAMPLE 2 (cont)
header
37EXAMPLE 2 - CURRENT
- Each processor writes all output (including ghost
cells) to a process specific file - Post processor reads in process specific files
- Determines if value is from resident cell
- places resident values in appropriate position in
a global output array - writes out global array to global output file
38EXAMPLE 2 - SOLUTION
1 2 4 7 9 1011 14
20 24
32 63 7 21 44 2 77 31 55 19
39EXAMPLE 2 - SOLUTION
DONE FOR EACH OUTPUT call mpi_file_set_view (fh,
disp, memtype, filetype, native, MPI_INFO_NULL,
ierr) call mpi_file_write_all (fh, buf, 1,
memtype, status, ierr) disp disp total
number of bytes written by all processes
DONE ONCE create mem_map create
file_map sort file_map permute mem_map to
match file_map call mpi_type_create_indexed_block
(num, 1, mem_map, MPI_DOUBLE_PRECISION, memtype,
ierr) call mpi_type_commit (memtype, ierr) call
mpi_type_create_indexed_block (num, 1, file_map,
MPI_DOUBLE_PRECISION, filetype, ierr) call
mpi_type_commit (filetype, ierr) disp size of
initial header in bytes
40CONCLUSIONS
- MPI-I/O potentially offers significant
improvement in I/O performance - This improvement can be attained with minimal
effort on part of the user - Simpler programming with fewer calls to I/O
routines - Easier program maintenance due to simple API